[GH-ISSUE #1956] Handle Multiple parallel request #1129

Closed
opened 2026-04-12 10:52:27 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @lauvindra on GitHub (Jan 12, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/1956

Does Ollama uses some kind of scheduling algorithm to manage high concurrent request? can you explain this

Originally created by @lauvindra on GitHub (Jan 12, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/1956 Does Ollama uses some kind of scheduling algorithm to manage high concurrent request? can you explain this
Author
Owner

@easp commented on GitHub (Jan 13, 2024):

It queues the requests and processes them serially.

<!-- gh-comment-id:1890730054 --> @easp commented on GitHub (Jan 13, 2024): It queues the requests and processes them serially.
Author
Owner

@pdevine commented on GitHub (Jan 15, 2024):

We'll add in better support for scheduling in the future, but as @easp mentioned, it just blocks all the other clients on a request and then those clients race to get fulfilled next. Definitely not ideal.

<!-- gh-comment-id:1892845874 --> @pdevine commented on GitHub (Jan 15, 2024): We'll add in better support for scheduling in the future, but as @easp mentioned, it just blocks all the other clients on a request and then those clients race to get fulfilled next. Definitely not ideal.
Author
Owner

@pdevine commented on GitHub (Jan 26, 2024):

Going to close this as a dupe of #358

<!-- gh-comment-id:1912840355 --> @pdevine commented on GitHub (Jan 26, 2024): Going to close this as a dupe of #358
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#1129