[GH-ISSUE #9787] Make ollama can run multiple models parallel #6400

Closed
opened 2026-04-12 17:55:32 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @nhantran0506 on GitHub (Mar 15, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9787

I've noticed that ollama can support run parallel but the logic seem to be odd. For example if I want to run llama3.2 parallel, would it be faster to just load 2 or more llama3.2 then run it seperately ? For this problem Ollama currently load 1 model and make it run "parallel".

Originally created by @nhantran0506 on GitHub (Mar 15, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9787 I've noticed that ollama can support run parallel but the logic seem to be odd. For example if I want to run llama3.2 parallel, would it be faster to just load 2 or more llama3.2 then run it seperately ? For this problem Ollama currently load 1 model and make it run "parallel".
GiteaMirror added the feature request label 2026-04-12 17:55:32 -05:00
Author
Owner

@rick-github commented on GitHub (Mar 16, 2025):

ollama doesn't run more than one copy of a model at once. To run concurrent requests, set OLLAMA_NUM_PARALLEL. ollama will load the model and create a separate context for parallel request.

<!-- gh-comment-id:2727360050 --> @rick-github commented on GitHub (Mar 16, 2025): ollama doesn't run more than one copy of a model at once. To run concurrent requests, set [`OLLAMA_NUM_PARALLEL`](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-does-ollama-handle-concurrent-requests). ollama will load the model and create a separate context for parallel request.
Author
Owner

@Feynman1999 commented on GitHub (Mar 17, 2025):

ollama doesn't run more than one copy of a model at once. To run concurrent requests, set OLLAMA_NUM_PARALLEL. ollama will load the model and create a separate context for parallel request.

when set OLLAMA_NUM_PARALLEL to 2, for example。
Will the processing speed of each request slow down? But overall, setting it to 2 will increase the efficiency of the system. Am i right?

<!-- gh-comment-id:2729260128 --> @Feynman1999 commented on GitHub (Mar 17, 2025): > ollama doesn't run more than one copy of a model at once. To run concurrent requests, set [`OLLAMA_NUM_PARALLEL`](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-does-ollama-handle-concurrent-requests). ollama will load the model and create a separate context for parallel request. when set OLLAMA_NUM_PARALLEL to 2, for example。 Will the processing speed of each request slow down? But overall, setting it to 2 will increase the efficiency of the system. Am i right?
Author
Owner

@rick-github commented on GitHub (Mar 17, 2025):

Yes, individual requests will complete slower but the overall tokens/second of the system increases.

Image

<!-- gh-comment-id:2729598549 --> @rick-github commented on GitHub (Mar 17, 2025): Yes, individual requests will complete slower but the overall tokens/second of the system increases. ![Image](https://github.com/user-attachments/assets/1c6c3b30-93ea-46c7-95fc-681d83776267)
Author
Owner

@yazon commented on GitHub (Nov 4, 2025):

You can use FlexLLama to handle multiple models: https://github.com/yazon/flexllama

<!-- gh-comment-id:3484205699 --> @yazon commented on GitHub (Nov 4, 2025): You can use FlexLLama to handle multiple models: https://github.com/yazon/flexllama
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#6400