[GH-ISSUE #11977] configure OLLAMA_NUM_PARALLEL per model #70011

Closed
opened 2026-05-04 20:03:59 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @Robinsane on GitHub (Aug 20, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11977

Hi Ollama community

I'd love for the ability to bring up multiple models in VRAM, with different OLLAMA_NUM_PARALLEL capabilities.
If I want 1 model to allow for e.g. 3 parallel requests, it sort of triples the reserved context. Also tripling context for models where I will not need parallel capabilities.

Previously I solved this by running different containers, each with their specific OLLAMA_NUM_PARALLEL.
However since the changes in V0.11.5, it seems every single Ollama instance reserves / uses 146MiB of VRAM on each GPU. This makes my previous solution also lose available VRAM.

Screenshot of VRAM usage with 2 models, each in their own Ollama container:
Image

Originally created by @Robinsane on GitHub (Aug 20, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11977 Hi Ollama community I'd love for the ability to bring up multiple models in VRAM, with different OLLAMA_NUM_PARALLEL capabilities. If I want 1 model to allow for e.g. 3 parallel requests, it sort of triples the reserved context. Also tripling context for models where I will not need parallel capabilities. Previously I solved this by running different containers, each with their specific OLLAMA_NUM_PARALLEL. However since the changes in V0.11.5, it seems every single Ollama instance reserves / uses 146MiB of VRAM on each GPU. This makes my previous solution also lose available VRAM. Screenshot of VRAM usage with 2 models, each in their own Ollama container: <img width="825" height="315" alt="Image" src="https://github.com/user-attachments/assets/41ab81f1-1896-4cb5-8f7b-ffd40db06365" />
GiteaMirror added the feature request label 2026-05-04 20:03:59 -05:00
Author
Owner

@Robinsane commented on GitHub (Aug 20, 2025):

relevant issue:
https://github.com/ollama/ollama/issues/4170

<!-- gh-comment-id:3204755019 --> @Robinsane commented on GitHub (Aug 20, 2025): relevant issue: https://github.com/ollama/ollama/issues/4170
Author
Owner

@rick-github commented on GitHub (Aug 20, 2025):

#9546

<!-- gh-comment-id:3204825269 --> @rick-github commented on GitHub (Aug 20, 2025): #9546
Author
Owner

@pdevine commented on GitHub (Aug 20, 2025):

Definitely appreciate the issue (and thank you @rick-github for the PR). I'm going to close it as a dupe of #4170

cc @jessegross @mxyng

<!-- gh-comment-id:3207584694 --> @pdevine commented on GitHub (Aug 20, 2025): Definitely appreciate the issue (and thank you @rick-github for the PR). I'm going to close it as a dupe of #4170 cc @jessegross @mxyng
Author
Owner

@jessegross commented on GitHub (Aug 29, 2025):

@Robinsane We have reduced the size of these additional allocations in 0.11.8.

<!-- gh-comment-id:3237807637 --> @jessegross commented on GitHub (Aug 29, 2025): @Robinsane We have reduced the size of these additional allocations in 0.11.8.
Author
Owner

@Robinsane commented on GitHub (Aug 29, 2025):

@Robinsane We have reduced the size of these additional allocations in 0.11.8.

@jessegross
I greatly appreciate both your commit and comment, thank you!

<!-- gh-comment-id:3238000768 --> @Robinsane commented on GitHub (Aug 29, 2025): > [@Robinsane](https://github.com/Robinsane) We have reduced the size of these additional allocations in 0.11.8. @jessegross I greatly appreciate both your commit and comment, thank you!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#70011