[GH-ISSUE #1356] Concurrency and multiple calls #47222

Closed
opened 2026-04-28 03:26:35 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @enriquesouza on GitHub (Dec 3, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1356

Hi, I would like to know if running Ollama and making multiple calls is possible. I would love to add a server and use it for my users.

Therefore, when testing it, I saw it is waiting until a process finishes when I use the liteLLM proxy.

Is it possible?

Originally created by @enriquesouza on GitHub (Dec 3, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1356 Hi, I would like to know if running Ollama and making multiple calls is possible. I would love to add a server and use it for my users. Therefore, when testing it, I saw it is waiting until a process finishes when I use the liteLLM proxy. Is it possible?
Author
Owner

@austin-starks commented on GitHub (Dec 3, 2023):

Plus one. I love Ollama but I'm failing to see how I could deploy it as a cloud server.

<!-- gh-comment-id:1837311633 --> @austin-starks commented on GitHub (Dec 3, 2023): Plus one. I love Ollama but I'm failing to see how I could deploy it as a cloud server.
Author
Owner

@easp commented on GitHub (Dec 4, 2023):

Right now you'd need to start multiple ollama servers on different ports and put them behind a reverse proxy.

I don't have any inside knowledge, but I'd expect this to change since Llama.cpp, which Ollama uses, has added support for batched requests, which is much more efficient than load balancing among separate instances.

<!-- gh-comment-id:1839038614 --> @easp commented on GitHub (Dec 4, 2023): Right now you'd need to start multiple ollama servers on different ports and put them behind a reverse proxy. I don't have any inside knowledge, but I'd expect this to change since Llama.cpp, which Ollama uses, has added support for batched requests, which is much more efficient than load balancing among separate instances.
Author
Owner

@enriquesouza commented on GitHub (Dec 11, 2023):

Thanks. I used it with docker and docker swarm, and then it solved with 10 instances running, therefore it is very limited yet. I hope a new version can fix that limitation.

<!-- gh-comment-id:1850509782 --> @enriquesouza commented on GitHub (Dec 11, 2023): Thanks. I used it with docker and docker swarm, and then it solved with 10 instances running, therefore it is very limited yet. I hope a new version can fix that limitation.
Author
Owner

@pdevine commented on GitHub (Jan 26, 2024):

Going to close this as a dupe of #358

<!-- gh-comment-id:1912843604 --> @pdevine commented on GitHub (Jan 26, 2024): Going to close this as a dupe of #358
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#47222