[GH-ISSUE #4159] Concurrency Issue: 'Server Busy' Errors After Updating Ollama #64622

Closed
opened 2026-05-03 18:21:47 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @mynhinguyentruong on GitHub (May 5, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4159

Originally assigned to: @jmorganca on GitHub.

What is the issue?

Issue Summary:
Yesterday, I conducted a test where I spun up 30 concurrent goroutines and sent them as POST requests to Ollama locally. The process worked smoothly, and I received responses within approximately 2 minutes.

go ProcessPromptWithOllama() x30 times

Problem Description:
However, after updating Ollama today, I encountered the following errors:

\"error\":\"server busy, please try again. maximum pending requests exceeded\"}
\"error\":\"unexpected server status: llm busy - no slots available\"}
Reproducibility:
The issue is consistently reproducible after the Ollama update. It occurs regardless of the specific endpoint or payload used in the POST requests.

Expected Behavior:
I expected the updated Ollama to handle the concurrent requests as efficiently as it did before the update, without encountering any server overload issues.

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.1.33

Originally created by @mynhinguyentruong on GitHub (May 5, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4159 Originally assigned to: @jmorganca on GitHub. ### What is the issue? Issue Summary: Yesterday, I conducted a test where I spun up 30 concurrent goroutines and sent them as POST requests to Ollama locally. The process worked smoothly, and I received responses within approximately 2 minutes. `go ProcessPromptWithOllama()` x30 times Problem Description: However, after updating Ollama today, I encountered the following errors: `\"error\":\"server busy, please try again. maximum pending requests exceeded\"}` `\"error\":\"unexpected server status: llm busy - no slots available\"}` Reproducibility: The issue is consistently reproducible after the Ollama update. It occurs regardless of the specific endpoint or payload used in the POST requests. Expected Behavior: I expected the updated Ollama to handle the concurrent requests as efficiently as it did before the update, without encountering any server overload issues. ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.1.33
GiteaMirror added the bug label 2026-05-03 18:21:47 -05:00
Author
Owner

@jmorganca commented on GitHub (May 5, 2024):

Hi @mynhinguyentruong, I'm so sorry about this issue – we'll get this fixed so more requests can be queued

<!-- gh-comment-id:2094540519 --> @jmorganca commented on GitHub (May 5, 2024): Hi @mynhinguyentruong, I'm so sorry about this issue – we'll get this fixed so more requests can be queued
Author
Owner

@dhiltgen commented on GitHub (May 5, 2024):

Dup of #4124

<!-- gh-comment-id:2094568568 --> @dhiltgen commented on GitHub (May 5, 2024): Dup of #4124
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#64622