[GH-ISSUE #9316] How to fix this failed to fix semaphore error? is it related to CONCURRENCY_TASK_LIMIt #6079

Closed
opened 2026-04-12 17:24:32 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @Jaykumaran on GitHub (Feb 24, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9316

What is the issue?

Hi,
I'm trying to fastgraphrag with ollama models. It uses instructor.from_openai service. Seems there is some problem with using ollama models.

I'm referring to : f2d90a3e23/fast_graphrag/_llm/_llm_openai.py (L60)

or="context canceled"
[GIN] 2025/02/24 - 18:08:16 | 500 |          3m0s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2025/02/24 - 18:08:16 | 500 |          3m0s |       127.0.0.1 | POST     "/v1/chat/completions"
time=2025-02-24T18:08:16.792+05:30 level=ERROR source=server.go:690 msg="Failed to acquire semaphore" error="context canceled"
[GIN] 2025/02/24 - 18:08:16 | 500 |          3m0s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2025/02/24 - 18:08:16 | 500 |          3m0s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2025/02/24 - 18:08:16 | 500 |          3m0s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2025/02/24 - 18:08:16 | 500 |          3m0s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2025/02/24 - 18:08:16 | 500 |          3m0s |       127.0.0.1 | POST     "/v1/chat/completions"
time=2025-02-24T18:10:48.036+05:30 level=ERROR source=server.go:690 msg="Failed to acquire semaphore" error="context canceled"
time=2025-02-24T18:10:48.036+05:30 level=ERROR source=server.go:690 msg="Failed to acquire semaphore" error="context canceled"
time=2025-02-24T18:10:48.036+05:30 level=ERROR source=server.go:690 msg="Failed to acquire semaphore" error="context canceled"
time=2025-02-24T18:10:48.036+05:30 level=ERROR source=server.go:690 msg="Failed to acquire semaphore" error="context canceled"
time=2025-02-24T18:10:48.036+05:30 level=ERROR source=server.go:690 msg="Failed to acquire semaphore" error="context canceled"
[GIN] 2025/02/24 - 18:10:48 | 500 |         2m30s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2025/02/24 - 18:10:48 | 500 |         2m30s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2025/02/24 - 18:10:48 | 500 |         2m30s |       127.0.0.1 | POST     "/v1/chat/completions"
time=2025-02-24T18:10:48.036+05:30 level=ERROR source=server.go:690 msg="Failed to acquire semaphore" error="context canceled"
[GIN] 2025/02/24 - 18:10:48 | 500 |         2m30s |       127.0.0.1 | POST     "/v1/chat/completions"
time=2025-02-24T18:10:48.036+05:30 level=ERROR source=server.go:690 msg="Failed to acquire semaphore" error="context canceled"
[GIN] 2025/02/24 - 18:10:48 | 500 |         2m54s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2025/02/24 - 18:10:48 | 500 |         2m30s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2025/02/24 - 18:10:48 | 500 |         2m30s |       127.0.0.1 | POST     "/v1/chat/completions"
[GIN] 2025/02/24 - 18:10:48 | 500 |         2m30s |       127.0.0.1 | POST     "/v1/chat/completions"

I'm trying to run:

working_dir = "./WORKING_DIR/carol/ollama"

grag = GraphRAG(
    working_dir=working_dir,
    # n_checkpoints=2,
    domain = DOMAIN,
    example_queries="\n".join(EXAMPLE_QUERIES),
    entity_types=ENTITY_TYPES,
    config=GraphRAG.Config(
        llm_service=OpenAILLMService(
            model = "llama3.1:8b",
            base_url="http://localhost:11434/v1",
            api_key="ollama",
            mode = instructor.Mode.JSON,
            client="openai",
            
        ),
        embedding_service=OpenAIEmbeddingService(
            model = "mxbai-embed-large" , # mxbai-embed-large
            base_url="http://localhost:11434/v1",
            api_key="ollama",
            embedding_dim=768,  # for mxbai-embed-large - 1024
            # client="openai"
            
        )
        
    )
)


with open("./book.txt") as f:
    grag.insert(f.read())
    
    
print(grag.query("Who is Scrooge?").response)

When i set max_concurrent=int(os.getenv("CONCURRENT_TASK_LIMIT", 1024) as 1 things works as expected.

But then there is no point in using it as async operation.

Looking out for some insights on this.

Image

Relevant log output


OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @Jaykumaran on GitHub (Feb 24, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9316 ### What is the issue? Hi, I'm trying to fastgraphrag with ollama models. It uses `instructor.from_openai` service. Seems there is some problem with using ollama models. I'm referring to : https://github.com/circlemind-ai/fast-graphrag/blob/f2d90a3e232b80c50efb22667dd16cc2ac6e97de/fast_graphrag/_llm/_llm_openai.py#L60 ``` or="context canceled" [GIN] 2025/02/24 - 18:08:16 | 500 | 3m0s | 127.0.0.1 | POST "/v1/chat/completions" [GIN] 2025/02/24 - 18:08:16 | 500 | 3m0s | 127.0.0.1 | POST "/v1/chat/completions" time=2025-02-24T18:08:16.792+05:30 level=ERROR source=server.go:690 msg="Failed to acquire semaphore" error="context canceled" [GIN] 2025/02/24 - 18:08:16 | 500 | 3m0s | 127.0.0.1 | POST "/v1/chat/completions" [GIN] 2025/02/24 - 18:08:16 | 500 | 3m0s | 127.0.0.1 | POST "/v1/chat/completions" [GIN] 2025/02/24 - 18:08:16 | 500 | 3m0s | 127.0.0.1 | POST "/v1/chat/completions" [GIN] 2025/02/24 - 18:08:16 | 500 | 3m0s | 127.0.0.1 | POST "/v1/chat/completions" [GIN] 2025/02/24 - 18:08:16 | 500 | 3m0s | 127.0.0.1 | POST "/v1/chat/completions" time=2025-02-24T18:10:48.036+05:30 level=ERROR source=server.go:690 msg="Failed to acquire semaphore" error="context canceled" time=2025-02-24T18:10:48.036+05:30 level=ERROR source=server.go:690 msg="Failed to acquire semaphore" error="context canceled" time=2025-02-24T18:10:48.036+05:30 level=ERROR source=server.go:690 msg="Failed to acquire semaphore" error="context canceled" time=2025-02-24T18:10:48.036+05:30 level=ERROR source=server.go:690 msg="Failed to acquire semaphore" error="context canceled" time=2025-02-24T18:10:48.036+05:30 level=ERROR source=server.go:690 msg="Failed to acquire semaphore" error="context canceled" [GIN] 2025/02/24 - 18:10:48 | 500 | 2m30s | 127.0.0.1 | POST "/v1/chat/completions" [GIN] 2025/02/24 - 18:10:48 | 500 | 2m30s | 127.0.0.1 | POST "/v1/chat/completions" [GIN] 2025/02/24 - 18:10:48 | 500 | 2m30s | 127.0.0.1 | POST "/v1/chat/completions" time=2025-02-24T18:10:48.036+05:30 level=ERROR source=server.go:690 msg="Failed to acquire semaphore" error="context canceled" [GIN] 2025/02/24 - 18:10:48 | 500 | 2m30s | 127.0.0.1 | POST "/v1/chat/completions" time=2025-02-24T18:10:48.036+05:30 level=ERROR source=server.go:690 msg="Failed to acquire semaphore" error="context canceled" [GIN] 2025/02/24 - 18:10:48 | 500 | 2m54s | 127.0.0.1 | POST "/v1/chat/completions" [GIN] 2025/02/24 - 18:10:48 | 500 | 2m30s | 127.0.0.1 | POST "/v1/chat/completions" [GIN] 2025/02/24 - 18:10:48 | 500 | 2m30s | 127.0.0.1 | POST "/v1/chat/completions" [GIN] 2025/02/24 - 18:10:48 | 500 | 2m30s | 127.0.0.1 | POST "/v1/chat/completions" ``` I'm trying to run: ``` working_dir = "./WORKING_DIR/carol/ollama" grag = GraphRAG( working_dir=working_dir, # n_checkpoints=2, domain = DOMAIN, example_queries="\n".join(EXAMPLE_QUERIES), entity_types=ENTITY_TYPES, config=GraphRAG.Config( llm_service=OpenAILLMService( model = "llama3.1:8b", base_url="http://localhost:11434/v1", api_key="ollama", mode = instructor.Mode.JSON, client="openai", ), embedding_service=OpenAIEmbeddingService( model = "mxbai-embed-large" , # mxbai-embed-large base_url="http://localhost:11434/v1", api_key="ollama", embedding_dim=768, # for mxbai-embed-large - 1024 # client="openai" ) ) ) with open("./book.txt") as f: grag.insert(f.read()) print(grag.query("Who is Scrooge?").response) ``` When i set `max_concurrent=int(os.getenv("CONCURRENT_TASK_LIMIT", 1024) as 1` things works as expected. But then there is no point in using it as `async` operation. Looking out for some insights on this. ![Image](https://github.com/user-attachments/assets/b8c5bf5e-5620-43ec-bc8a-2db07e82e5f5) ### Relevant log output ```shell ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the bug label 2026-04-12 17:24:32 -05:00
Author
Owner

@rick-github commented on GitHub (Feb 24, 2025):

Increase OLLAMA_NUM_PARALLEL to allow more concurrent completions. However, the embedding interface doesn't support parallel completions, so you would need to run multiple servers to achieve that. I suspect that the reason for the semaphore failures is that there are many pending requests and the client has a 3 minute timeout, and by the time ollama gets around to processing the queued requests, the client has closed the connection.

<!-- gh-comment-id:2678495232 --> @rick-github commented on GitHub (Feb 24, 2025): Increase `OLLAMA_NUM_PARALLEL` to allow more concurrent completions. However, the embedding interface doesn't support parallel completions, so you would need to run multiple servers to achieve that. I suspect that the reason for the semaphore failures is that there are many pending requests and the client has a 3 minute timeout, and by the time ollama gets around to processing the queued requests, the client has closed the connection.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#6079