[GH-ISSUE #8842] non global OLLAMA_NUM_PARALLEL #52240

Open
opened 2026-04-28 22:36:41 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @Robinsane on GitHub (Feb 5, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8842

Hello

First of all, I love Ollama, all contributors are doing a lovely job, Thank you!

When setting OLLAMA_NUM_PARALLEL to 2, this means LLM's will be able to handle 2 requests at the same time.
This then means double the (V)RAM is required, since there can be 2 different contexts at the same time.

My issue is the following:
For some models, I'd like to configure OLLAMA_NUM_PARALLEL = 2 .
At the same time, I'd like models with huge context windows to work sequentially only, so that I don't need an extra e.g. 16 - 64 GB for a context window that's barely / never used.

Thank you for your time reading this feature request ^.^

Originally created by @Robinsane on GitHub (Feb 5, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8842 Hello First of all, I love Ollama, all contributors are doing a lovely job, Thank you! When setting OLLAMA_NUM_PARALLEL to 2, this means LLM's will be able to handle 2 requests at the same time. This then means double the (V)RAM is required, since there can be 2 different contexts at the same time. My issue is the following: For some models, I'd like to configure OLLAMA_NUM_PARALLEL = 2 . At the same time, I'd like models with huge context windows to work sequentially only, so that I don't need an extra e.g. 16 - 64 GB for a context window that's barely / never used. Thank you for your time reading this feature request ^.^
GiteaMirror added the feature request label 2026-04-28 22:36:41 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#52240