[GH-ISSUE #4989] Failed to acquire semaphore" error="context canceled" #65191

Closed
opened 2026-05-03 19:58:08 -05:00 by GiteaMirror · 12 comments
Owner

Originally created by @travisgu on GitHub (Jun 12, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4989

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

I am using embedding with AnythingLLM for RAG. I found the embedding service always failed for several minutes calls. The error log is showing this every time. I am not sure why the context was cancelled. Please kindly help.

Below is the debug log:

[GIN] 2024/06/11 - 11:09:07 | 200 |         2m28s |   10.100.34.236 | POST     "/api/embeddings"
time=2024-06-11T11:09:07.830+08:00 level=DEBUG source=sched.go:304 msg="context for request finished"
time=2024-06-11T11:09:07.830+08:00 level=DEBUG source=sched.go:255 msg="after processing request finished event" modelPath=C:\Users\admin_env\.ollama\models\blobs\sha256-ada9f88e89df0ea53c31fabf8b1e7c8c0c22fa95ab3a3cad4cdd86103ce9f3d3 refCount=119
DEBUG [update_slots] kv cache rm [p0, end) | p0=0 slot_id=0 task_id=14500 tid="17876" timestamp=1718075347
DEBUG [update_slots] slot released | n_cache_tokens=52 n_ctx=2048 n_past=52 n_system_tokens=0 slot_id=0 task_id=14500 tid="17876" timestamp=1718075350 truncated=false
DEBUG [log_server_request] request | method="POST" params={} path="/embedding" remote_addr="127.0.0.1" remote_port=51069 status=200 tid="16400" timestamp=1718075350
DEBUG [process_single_task] slot data | n_idle_slots=1 n_processing_slots=0 task_id=14503 tid="17876" timestamp=1718075350
DEBUG [launch_slot_with_data] slot is processing task | slot_id=0 task_id=14504 tid="17876" timestamp=1718075350
[GIN] 2024/06/11 - 11:09:10 | 200 |         2m30s |   10.100.34.236 | POST     "/api/embeddings"
time=2024-06-11T11:09:10.289+08:00 level=DEBUG source=sched.go:304 msg="context for request finished"
time=2024-06-11T11:09:10.290+08:00 level=DEBUG source=sched.go:255 msg="after processing request finished event" modelPath=C:\Users\admin_env\.ollama\models\blobs\sha256-ada9f88e89df0ea53c31fabf8b1e7c8c0c22fa95ab3a3cad4cdd86103ce9f3d3 refCount=118
DEBUG [update_slots] kv cache rm [p0, end) | p0=0 slot_id=0 task_id=14504 tid="17876" timestamp=1718075350
time=2024-06-11T11:09:11.910+08:00 level=ERROR source=server.go:836 msg="Failed to acquire semaphore" error="context canceled"
time=2024-06-11T11:09:11.910+08:00 level=DEBUG source=sched.go:304 msg="context for request finished"
time=2024-06-11T11:09:11.911+08:00 level=INFO source=routes.go:401 msg="embedding generation failed: context canceled"
time=2024-06-11T11:09:11.911+08:00 level=DEBUG source=sched.go:255 msg="after processing request finished event" modelPath=C:\Users\admin_env\.ollama\models\blobs\sha256-ada9f88e89df0ea53c31fabf8b1e7c8c0c22fa95ab3a3cad4cdd86103ce9f3d3 refCount=117
[GIN] 2024/06/11 - 11:09:11 | 500 |         2m32s |   10.100.34.236 | POST     "/api/embeddings"
time=2024-06-11T11:09:11.911+08:00 level=ERROR source=server.go:836 msg="Failed to acquire semaphore" error="context canceled"
time=2024-06-11T11:09:11.911+08:00 level=DEBUG source=sched.go:304 msg="context for request finished"
time=2024-06-11T11:09:11.911+08:00 level=INFO source=routes.go:401 msg="embedding generation failed: context canceled"
time=2024-06-11T11:09:11.911+08:00 level=ERROR source=server.go:836 msg="Failed to acquire semaphore" error="context canceled"
[GIN] 2024/06/11 - 11:09:11 | 500 |         2m32s |   10.100.34.236 | POST     "/api/embeddings"
time=2024-06-11T11:09:11.911+08:00 level=DEBUG source=sched.go:304 msg="context for request finished"
time=2024-06-11T11:09:11.911+08:00 level=DEBUG source=sched.go:255 msg="after processing request finished event" modelPath=C:\Users\admin_env\.ollama\models\blobs\sha256-ada9f88e89df0ea53c31fabf8b1e7c8c0c22fa95ab3a3cad4cdd86103ce9f3d3 refCount=116
time=2024-06-11T11:09:11.911+08:00 level=DEBUG source=sched.go:255 msg="after processing request finished event" modelPath=C:\Users\admin_env\.ollama\models\blobs\sha256-ada9f88e89df0ea53c31fabf8b1e7c8c0c22fa95ab3a3cad4cdd86103ce9f3d3 refCount=115
time=2024-06-11T11:09:11.911+08:00 level=INFO source=routes.go:401 msg="embedding generation failed: context canceled"

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.1.41

Originally created by @travisgu on GitHub (Jun 12, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4989 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? I am using embedding with AnythingLLM for RAG. I found the embedding service always failed for several minutes calls. The error log is showing this every time. I am not sure why the context was cancelled. Please kindly help. Below is the debug log: ``` [GIN] 2024/06/11 - 11:09:07 | 200 | 2m28s | 10.100.34.236 | POST "/api/embeddings" time=2024-06-11T11:09:07.830+08:00 level=DEBUG source=sched.go:304 msg="context for request finished" time=2024-06-11T11:09:07.830+08:00 level=DEBUG source=sched.go:255 msg="after processing request finished event" modelPath=C:\Users\admin_env\.ollama\models\blobs\sha256-ada9f88e89df0ea53c31fabf8b1e7c8c0c22fa95ab3a3cad4cdd86103ce9f3d3 refCount=119 DEBUG [update_slots] kv cache rm [p0, end) | p0=0 slot_id=0 task_id=14500 tid="17876" timestamp=1718075347 DEBUG [update_slots] slot released | n_cache_tokens=52 n_ctx=2048 n_past=52 n_system_tokens=0 slot_id=0 task_id=14500 tid="17876" timestamp=1718075350 truncated=false DEBUG [log_server_request] request | method="POST" params={} path="/embedding" remote_addr="127.0.0.1" remote_port=51069 status=200 tid="16400" timestamp=1718075350 DEBUG [process_single_task] slot data | n_idle_slots=1 n_processing_slots=0 task_id=14503 tid="17876" timestamp=1718075350 DEBUG [launch_slot_with_data] slot is processing task | slot_id=0 task_id=14504 tid="17876" timestamp=1718075350 [GIN] 2024/06/11 - 11:09:10 | 200 | 2m30s | 10.100.34.236 | POST "/api/embeddings" time=2024-06-11T11:09:10.289+08:00 level=DEBUG source=sched.go:304 msg="context for request finished" time=2024-06-11T11:09:10.290+08:00 level=DEBUG source=sched.go:255 msg="after processing request finished event" modelPath=C:\Users\admin_env\.ollama\models\blobs\sha256-ada9f88e89df0ea53c31fabf8b1e7c8c0c22fa95ab3a3cad4cdd86103ce9f3d3 refCount=118 DEBUG [update_slots] kv cache rm [p0, end) | p0=0 slot_id=0 task_id=14504 tid="17876" timestamp=1718075350 time=2024-06-11T11:09:11.910+08:00 level=ERROR source=server.go:836 msg="Failed to acquire semaphore" error="context canceled" time=2024-06-11T11:09:11.910+08:00 level=DEBUG source=sched.go:304 msg="context for request finished" time=2024-06-11T11:09:11.911+08:00 level=INFO source=routes.go:401 msg="embedding generation failed: context canceled" time=2024-06-11T11:09:11.911+08:00 level=DEBUG source=sched.go:255 msg="after processing request finished event" modelPath=C:\Users\admin_env\.ollama\models\blobs\sha256-ada9f88e89df0ea53c31fabf8b1e7c8c0c22fa95ab3a3cad4cdd86103ce9f3d3 refCount=117 [GIN] 2024/06/11 - 11:09:11 | 500 | 2m32s | 10.100.34.236 | POST "/api/embeddings" time=2024-06-11T11:09:11.911+08:00 level=ERROR source=server.go:836 msg="Failed to acquire semaphore" error="context canceled" time=2024-06-11T11:09:11.911+08:00 level=DEBUG source=sched.go:304 msg="context for request finished" time=2024-06-11T11:09:11.911+08:00 level=INFO source=routes.go:401 msg="embedding generation failed: context canceled" time=2024-06-11T11:09:11.911+08:00 level=ERROR source=server.go:836 msg="Failed to acquire semaphore" error="context canceled" [GIN] 2024/06/11 - 11:09:11 | 500 | 2m32s | 10.100.34.236 | POST "/api/embeddings" time=2024-06-11T11:09:11.911+08:00 level=DEBUG source=sched.go:304 msg="context for request finished" time=2024-06-11T11:09:11.911+08:00 level=DEBUG source=sched.go:255 msg="after processing request finished event" modelPath=C:\Users\admin_env\.ollama\models\blobs\sha256-ada9f88e89df0ea53c31fabf8b1e7c8c0c22fa95ab3a3cad4cdd86103ce9f3d3 refCount=116 time=2024-06-11T11:09:11.911+08:00 level=DEBUG source=sched.go:255 msg="after processing request finished event" modelPath=C:\Users\admin_env\.ollama\models\blobs\sha256-ada9f88e89df0ea53c31fabf8b1e7c8c0c22fa95ab3a3cad4cdd86103ce9f3d3 refCount=115 time=2024-06-11T11:09:11.911+08:00 level=INFO source=routes.go:401 msg="embedding generation failed: context canceled" ``` ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.1.41
GiteaMirror added the bugneeds more info labels 2026-05-03 19:58:09 -05:00
Author
Owner

@dhiltgen commented on GitHub (Jun 13, 2024):

We should improve the log message, but the semaphore is used to track parallel requests. The "context canceled" indicates the client gave up waiting for the request to get handled.

What do you have OLLAMA_NUM_PARALLEL set to? The current default is 1, so only 1 request can be handled at a time.

<!-- gh-comment-id:2166403153 --> @dhiltgen commented on GitHub (Jun 13, 2024): We should improve the log message, but the semaphore is used to track parallel requests. The "context canceled" indicates the client gave up waiting for the request to get handled. What do you have `OLLAMA_NUM_PARALLEL` set to? The current default is 1, so only 1 request can be handled at a time.
Author
Owner

@travisgu commented on GitHub (Jun 17, 2024):

We should improve the log message, but the semaphore is used to track parallel requests. The "context canceled" indicates the client gave up waiting for the request to get handled.

What do you have OLLAMA_NUM_PARALLEL set to? The current default is 1, so only 1 request can be handled at a time.

Thanks for the explanation. The OLLAMA_NUM_PARALLEL is using the default value 1.
For only one GPU, is it ok to increase the OLLAMA_NUM_PARALLEL value?

<!-- gh-comment-id:2172051027 --> @travisgu commented on GitHub (Jun 17, 2024): > We should improve the log message, but the semaphore is used to track parallel requests. The "context canceled" indicates the client gave up waiting for the request to get handled. > > What do you have `OLLAMA_NUM_PARALLEL` set to? The current default is 1, so only 1 request can be handled at a time. Thanks for the explanation. The OLLAMA_NUM_PARALLEL is using the default value 1. For only one GPU, is it ok to increase the OLLAMA_NUM_PARALLEL value?
Author
Owner

@dhiltgen commented on GitHub (Jun 17, 2024):

Concurrency/parallelism is currently experimental (opt-in) but will eventually be enabled by default. Increasing parallelism will increase the context size, and therefore the VRAM consumed by the model. You can experiment with different values and see the impact via ollama ps to find a good balance for your model and GPU capabilities.

<!-- gh-comment-id:2173693877 --> @dhiltgen commented on GitHub (Jun 17, 2024): Concurrency/parallelism is currently experimental (opt-in) but will eventually be enabled by default. Increasing parallelism will increase the context size, and therefore the VRAM consumed by the model. You can experiment with different values and see the impact via `ollama ps` to find a good balance for your model and GPU capabilities.
Author
Owner

@debrupf2946 commented on GitHub (Aug 3, 2024):

Hello,I am also facing the same issue
MODEL: Gemma2 8B
ENV: Colab Free version T4 15GB
I used ollama serve and run gemma2

time=2024-08-03T13:14:48.828Z level=INFO source=server.go:623 msg="llama runner started in 19.86 seconds"
[GIN] 2024/08/03 - 13:14:53 | 200 | 26.103765445s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2024/08/03 - 13:14:53 | 200 | 26.245598995s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2024/08/03 - 13:14:57 | 500 | 30.006172901s |       127.0.0.1 | POST     "/api/chat"
time=2024-08-03T13:14:57.667Z level=ERROR source=server.go:711 msg="Failed to acquire semaphore" error="context canceled"
[GIN] 2024/08/03 - 13:14:57 | 500 | 30.007202645s |       127.0.0.1 | POST     "/api/chat"
time=2024-08-03T13:14:57.668Z level=ERROR source=server.go:711 msg="Failed to acquire semaphore" error="context canceled"
[GIN] 2024/08/03 - 13:14:57 | 500 | 30.008433237s |       127.0.0.1 | POST     "/api/chat"
time=2024-08-03T13:14:57.668Z level=ERROR source=server.go:711 msg="Failed to acquire semaphore" error="context canceled"
[GIN] 2024/08/03 - 13:14:57 | 500 | 30.008151611s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2024/08/03 - 13:14:57 | 500 | 30.007932916s |       127.0.0.1 | POST     "/api/chat"
time=2024-08-03T13:14:57.667Z level=ERROR source=server.go:711 msg="Failed to acquire semaphore" error="context canceled"
[GIN] 2024/08/03 - 13:14:57 | 500 | 30.008841187s |       127.0.0.1 | POST     "/api/chat"
time=2024-08-03T13:14:57.693Z level=ERROR source=server.go:711 msg="Failed to acquire semaphore" error="context canceled"
[GIN] 2024/08/03 - 13:14:57 | 500 | 29.945590467s |       127.0.0.1 | POST     "/api/chat"
time=2024-08-03T13:14:57.695Z level=ERROR source=server.go:711 msg="Failed to acquire semaphore" error="context canceled"
[GIN] 2024/08/03 - 13:14:57 | 500 | 29.857985567s |       127.0.0.1 | POST     "/api/chat" 

When I am conterverting llama_index documents to embeddings

Causing this error

embedding nodes:  19%
 113/584 [07:50<16:18,  2.08s/it]
---------------------------------------------------------------------------
ReadTimeout                               Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py](https://localhost:8080/#) in map_httpcore_exceptions()
     68     try:
---> 69         yield
     70     except Exception as exc:

56 frames
[/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py](https://localhost:8080/#) in handle_async_request(self, request)
    372         with map_httpcore_exceptions():
--> 373             resp = await self._pool.handle_async_request(req)
    374 

[/usr/local/lib/python3.10/dist-packages/httpcore/_async/connection_pool.py](https://localhost:8080/#) in handle_async_request(self, request)
    215             await self._close_connections(closing)
--> 216             raise exc from None
    217 

[/usr/local/lib/python3.10/dist-packages/httpcore/_async/connection_pool.py](https://localhost:8080/#) in handle_async_request(self, request)
    195                     # Send the request on the assigned connection.
--> 196                     response = await connection.handle_async_request(
    197                         pool_request.request

Can some one please help me out?

<!-- gh-comment-id:2266713602 --> @debrupf2946 commented on GitHub (Aug 3, 2024): Hello,I am also facing the same issue MODEL: Gemma2 8B ENV: Colab Free version T4 15GB I used `ollama serve and run gemma2` ``` time=2024-08-03T13:14:48.828Z level=INFO source=server.go:623 msg="llama runner started in 19.86 seconds" [GIN] 2024/08/03 - 13:14:53 | 200 | 26.103765445s | 127.0.0.1 | POST "/api/chat" [GIN] 2024/08/03 - 13:14:53 | 200 | 26.245598995s | 127.0.0.1 | POST "/api/chat" [GIN] 2024/08/03 - 13:14:57 | 500 | 30.006172901s | 127.0.0.1 | POST "/api/chat" time=2024-08-03T13:14:57.667Z level=ERROR source=server.go:711 msg="Failed to acquire semaphore" error="context canceled" [GIN] 2024/08/03 - 13:14:57 | 500 | 30.007202645s | 127.0.0.1 | POST "/api/chat" time=2024-08-03T13:14:57.668Z level=ERROR source=server.go:711 msg="Failed to acquire semaphore" error="context canceled" [GIN] 2024/08/03 - 13:14:57 | 500 | 30.008433237s | 127.0.0.1 | POST "/api/chat" time=2024-08-03T13:14:57.668Z level=ERROR source=server.go:711 msg="Failed to acquire semaphore" error="context canceled" [GIN] 2024/08/03 - 13:14:57 | 500 | 30.008151611s | 127.0.0.1 | POST "/api/chat" [GIN] 2024/08/03 - 13:14:57 | 500 | 30.007932916s | 127.0.0.1 | POST "/api/chat" time=2024-08-03T13:14:57.667Z level=ERROR source=server.go:711 msg="Failed to acquire semaphore" error="context canceled" [GIN] 2024/08/03 - 13:14:57 | 500 | 30.008841187s | 127.0.0.1 | POST "/api/chat" time=2024-08-03T13:14:57.693Z level=ERROR source=server.go:711 msg="Failed to acquire semaphore" error="context canceled" [GIN] 2024/08/03 - 13:14:57 | 500 | 29.945590467s | 127.0.0.1 | POST "/api/chat" time=2024-08-03T13:14:57.695Z level=ERROR source=server.go:711 msg="Failed to acquire semaphore" error="context canceled" [GIN] 2024/08/03 - 13:14:57 | 500 | 29.857985567s | 127.0.0.1 | POST "/api/chat" ``` When I am conterverting llama_index documents to embeddings Causing this error ``` embedding nodes:  19%  113/584 [07:50<16:18,  2.08s/it] --------------------------------------------------------------------------- ReadTimeout Traceback (most recent call last) [/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py](https://localhost:8080/#) in map_httpcore_exceptions() 68 try: ---> 69 yield 70 except Exception as exc: 56 frames [/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py](https://localhost:8080/#) in handle_async_request(self, request) 372 with map_httpcore_exceptions(): --> 373 resp = await self._pool.handle_async_request(req) 374 [/usr/local/lib/python3.10/dist-packages/httpcore/_async/connection_pool.py](https://localhost:8080/#) in handle_async_request(self, request) 215 await self._close_connections(closing) --> 216 raise exc from None 217 [/usr/local/lib/python3.10/dist-packages/httpcore/_async/connection_pool.py](https://localhost:8080/#) in handle_async_request(self, request) 195 # Send the request on the assigned connection. --> 196 response = await connection.handle_async_request( 197 pool_request.request ``` Can some one please help me out?
Author
Owner

@sohooo commented on GitHub (Aug 6, 2024):

Seeing the same with codestral:22b on Geforce 4090:

►  ollama ps                                                          
NAME            ID              SIZE    PROCESSOR       UNTIL         
codestral:22b   9cd926908d85    47 GB   48%/52% CPU/GPU 20 minutes ago

Model info:

►  curl -s http://localhost:11434/api/show -d '{ "name": "codestral:22b" }' | jq '.details'
{
  "parent_model": "",
  "format": "gguf",
  "family": "llama",
  "families": [
    "llama"
  ],
  "parameter_size": "22.2B",
  "quantization_level": "Q4_K_M"
}

Ollama env:

►  grep Env /etc/systemd/system/ollama.service
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_MODELS=/opt/models"
Environment="OLLAMA_ORIGINS=*"
Environment="OLLAMA_NUM_PARALLEL=3"
Environment="OLLAMA_MAX_LOADED_MODELS=2"
Environment="OLLAMA_MAX_QUEUE=512"
Environment="OLLAMA_KEEP_ALIVE=3600"
<!-- gh-comment-id:2271290211 --> @sohooo commented on GitHub (Aug 6, 2024): Seeing the same with `codestral:22b` on Geforce 4090: ```bash ► ollama ps NAME ID SIZE PROCESSOR UNTIL codestral:22b 9cd926908d85 47 GB 48%/52% CPU/GPU 20 minutes ago ``` Model info: ```bash ► curl -s http://localhost:11434/api/show -d '{ "name": "codestral:22b" }' | jq '.details' { "parent_model": "", "format": "gguf", "family": "llama", "families": [ "llama" ], "parameter_size": "22.2B", "quantization_level": "Q4_K_M" } ``` Ollama env: ```bash ► grep Env /etc/systemd/system/ollama.service Environment="OLLAMA_HOST=0.0.0.0:11434" Environment="OLLAMA_MODELS=/opt/models" Environment="OLLAMA_ORIGINS=*" Environment="OLLAMA_NUM_PARALLEL=3" Environment="OLLAMA_MAX_LOADED_MODELS=2" Environment="OLLAMA_MAX_QUEUE=512" Environment="OLLAMA_KEEP_ALIVE=3600" ```
Author
Owner

@dhiltgen commented on GitHub (Aug 6, 2024):

We should improve the error message, however in this scenario "context canceled" most likely means the client gave up and closed the connection before the server could process it. If you have parallelism set at the default, and have sufficient VRAM, it will allow 4 concurrent connections. Our default queue depth is 512, so that means there can be up to 512 queued up requests waiting for one of those 4 "slots" to get their request processed. If the requests take a little while, and your timeout on the client side is ~short, then it's possible this is just normal behavior when saturating the system.

Can you share how many concurrent requests you're making, and if the system is actually hung/stuck, or are requests making it through?

<!-- gh-comment-id:2271622019 --> @dhiltgen commented on GitHub (Aug 6, 2024): We should improve the error message, however in this scenario "context canceled" most likely means the client gave up and closed the connection before the server could process it. If you have parallelism set at the default, and have sufficient VRAM, it will allow 4 concurrent connections. Our default queue depth is 512, so that means there can be up to 512 queued up requests waiting for one of those 4 "slots" to get their request processed. If the requests take a little while, and your timeout on the client side is ~short, then it's possible this is just normal behavior when saturating the system. Can you share how many concurrent requests you're making, and if the system is actually hung/stuck, or are requests making it through?
Author
Owner

@peanutpaste commented on GitHub (Aug 7, 2024):

I encountered the same problem. After Ollama ran for a while, it stopped responding, no matter if I switched models or reloaded them. Only after I killed the process and restarted it, it could continue to work for a few minutes, and then the problem would reappear.

<!-- gh-comment-id:2272951485 --> @peanutpaste commented on GitHub (Aug 7, 2024): I encountered the same problem. After Ollama ran for a while, it stopped responding, no matter if I switched models or reloaded them. Only after I killed the process and restarted it, it could continue to work for a few minutes, and then the problem would reappear.
Author
Owner

@dhiltgen commented on GitHub (Aug 7, 2024):

We found and fixed a bug in v0.3.4 on how embeddings were batched, so this should be resolved now.

<!-- gh-comment-id:2273874343 --> @dhiltgen commented on GitHub (Aug 7, 2024): We found and fixed a bug in v0.3.4 on how embeddings were batched, so this should be resolved now.
Author
Owner

@giladrom commented on GitHub (Oct 27, 2024):

Using the latest (0.3.14) and getting the same error when running embeddings (llama3.2):

time=2024-10-27T07:58:52.255Z level=ERROR source=routes.go:422 msg="embedding generation failed" error="context canceled"
[GIN] 2024/10/27 - 07:58:52 | 500 |         1m40s |  X.X.X.X | POST     "/api/embed"

This happens whenever I try to run anything more than a moderately-sized PDF (over 3MB in size). The 500 error always happens after 1m40s.

<!-- gh-comment-id:2439898303 --> @giladrom commented on GitHub (Oct 27, 2024): Using the latest (0.3.14) and getting the same error when running embeddings (llama3.2): ```time=2024-10-27T07:58:52.252Z level=ERROR source=server.go:890 msg="Failed to acquire semaphore" error="context canceled" time=2024-10-27T07:58:52.255Z level=ERROR source=routes.go:422 msg="embedding generation failed" error="context canceled" [GIN] 2024/10/27 - 07:58:52 | 500 | 1m40s | X.X.X.X | POST "/api/embed" ``` This happens whenever I try to run anything more than a moderately-sized PDF (over 3MB in size). The 500 error always happens after 1m40s.
Author
Owner

@AndrewWebDev commented on GitHub (Jan 19, 2025):

Using "ollama version is 0.5.7" got the same errors for ~500kb txt file:
level=ERROR source=routes.go:478 msg="embedding generation failed" error="context canceled"
| 500 | 5m0s | 127.0.0.1 | POST "/api/embed"

Maybe I am doing something wrong, but increasing "OLLAMA_KEEP_ALIVE" and "OLLAMA_LOAD_TIMEOUT" doesn't help

Information

core

  • n8nVersion: 1.73.1
  • platform: docker (self-hosted)
  • nodeJsVersion: 20.18.0
  • database: postgres
  • executionMode: regular
  • concurrency: -1
  • license: community

storage

  • success: all
  • error: all
  • progress: false
  • manual: true
  • binaryMode: memory

pruning

  • enabled: true
  • maxAge: 336 hours
  • maxCount: 10000 executions
<!-- gh-comment-id:2600360189 --> @AndrewWebDev commented on GitHub (Jan 19, 2025): Using "ollama version is 0.5.7" got the same errors for ~500kb txt file: level=ERROR source=routes.go:478 msg="embedding generation failed" error="context canceled" | 500 | 5m0s | 127.0.0.1 | POST "/api/embed" Maybe I am doing something wrong, but increasing "OLLAMA_KEEP_ALIVE" and "OLLAMA_LOAD_TIMEOUT" doesn't help ### Information ### core - n8nVersion: 1.73.1 - platform: docker (self-hosted) - nodeJsVersion: 20.18.0 - database: postgres - executionMode: regular - concurrency: -1 - license: community ### storage - success: all - error: all - progress: false - manual: true - binaryMode: memory ### pruning - enabled: true - maxAge: 336 hours - maxCount: 10000 executions
Author
Owner

@Offshore21 commented on GitHub (Jan 19, 2025):

Send me list commands.
To care OLLAMA

On Sat, Jan 18, 2025 at 9:41 PM Andrii Badekha @.***>
wrote:

Using "ollama version is 0.5.7" got the same errors for ~500kb txt file:
level=ERROR source=routes.go:478 msg="embedding generation failed"
error="context canceled"
| 500 | 5m0s | 127.0.0.1 | POST "/api/embed"

Maybe I am doing something wrong, but increasing "OLLAMA_KEEP_ALIVE" and
"OLLAMA_LOAD_TIMEOUT" doesn't help
Information core

  • n8nVersion: 1.73.1
  • platform: docker (self-hosted)
  • nodeJsVersion: 20.18.0
  • database: postgres
  • executionMode: regular
  • concurrency: -1
  • license: community

storage

  • success: all
  • error: all
  • progress: false
  • manual: true
  • binaryMode: memory

pruning

  • enabled: true
  • maxAge: 336 hours
  • maxCount: 10000 executions


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/4989#issuecomment-2600360189,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/BLYS6N3NVUCITM5BTLWOP2L2LLYE5AVCNFSM6AAAAABJFNQEEWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMBQGM3DAMJYHE
.
You are receiving this because you are subscribed to this thread.Message
ID: @.***>

<!-- gh-comment-id:2600372861 --> @Offshore21 commented on GitHub (Jan 19, 2025): Send me list commands. To care OLLAMA On Sat, Jan 18, 2025 at 9:41 PM Andrii Badekha ***@***.***> wrote: > Using "ollama version is 0.5.7" got the same errors for ~500kb txt file: > level=ERROR source=routes.go:478 msg="embedding generation failed" > error="context canceled" > | 500 | 5m0s | 127.0.0.1 | POST "/api/embed" > > Maybe I am doing something wrong, but increasing "OLLAMA_KEEP_ALIVE" and > "OLLAMA_LOAD_TIMEOUT" doesn't help > Information core > > - n8nVersion: 1.73.1 > - platform: docker (self-hosted) > - nodeJsVersion: 20.18.0 > - database: postgres > - executionMode: regular > - concurrency: -1 > - license: community > > storage > > - success: all > - error: all > - progress: false > - manual: true > - binaryMode: memory > > pruning > > - enabled: true > - maxAge: 336 hours > - maxCount: 10000 executions > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/4989#issuecomment-2600360189>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/BLYS6N3NVUCITM5BTLWOP2L2LLYE5AVCNFSM6AAAAABJFNQEEWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMBQGM3DAMJYHE> > . > You are receiving this because you are subscribed to this thread.Message > ID: ***@***.***> >
Author
Owner

@AndrewWebDev commented on GitHub (Jan 19, 2025):

Hi, thank you for quick reaction

Found in another issue description for nomic-embed-text model
https://github.com/ollama/ollama/issues/7288#issuecomment-2591709109

My issue could be due to wrong chunk strategy, but I am not sure why as we have recursive splitter for this purpose
Let me dive deep into it, sorry for bothering!

<!-- gh-comment-id:2600379491 --> @AndrewWebDev commented on GitHub (Jan 19, 2025): Hi, thank you for quick reaction Found in another issue description for [nomic-embed-text](https://ollama.com/library/nomic-embed-text) model https://github.com/ollama/ollama/issues/7288#issuecomment-2591709109 My issue could be due to wrong chunk strategy, but I am not sure why as we have recursive splitter for this purpose Let me dive deep into it, sorry for bothering!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#65191