[GH-ISSUE #645] Allow global Ollama settings configuration #285

Open
opened 2026-04-12 09:49:40 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @BruceMacD on GitHub (Sep 29, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/645

In some cases (specifically related to hardware usually) it makes sense to have some global Ollama configuration rather than binding the setting to the Modelfile.

For example if I am running many different servers with different hardware capabilities I don't want to create and load Modelfiles for each machine to set num_thread, I want to set it once.

Originally created by @BruceMacD on GitHub (Sep 29, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/645 In some cases (specifically related to hardware usually) it makes sense to have some global Ollama configuration rather than binding the setting to the Modelfile. For example if I am running many different servers with different hardware capabilities I don't want to create and load Modelfiles for each machine to set `num_thread`, I want to set it once.
GiteaMirror added the feature request label 2026-04-12 09:49:40 -05:00
Author
Owner

@liqiang-fit2cloud commented on GitHub (Mar 6, 2024):

Need this!

<!-- gh-comment-id:1979938918 --> @liqiang-fit2cloud commented on GitHub (Mar 6, 2024): Need this!
Author
Owner

@cipriancraciun commented on GitHub (Nov 20, 2025):

This option would be very helpful especially if one wants to override the number of CPU's Ollama has detected.

I believe that resource consumption is an administrative issue, thus one should be able to enforce certain limits, as opposed to leaving the API clients to dictate these.


For example I have an Intel Core i7 12700 which has 20 cores, of which 16 are performance and 4 are "efficiency" ones. Ollama chooses to use only 8 cores, however without explicit CPU pinning the OS sometimes does schedule Ollama threads on the efficiency cores, thus negating any potential benefit Ollama might expect from not using efficiency cores at all.

As such I want to force Ollama to just use either 8+4 (12 cores) or ignore hyper-threading and just run on all 20 cores.

At the moment this is possible only if the application that makes the API request supports overriding num_thread, and even if it does, it's cumbersome to change it everywhere while experimenting.

<!-- gh-comment-id:3559063596 --> @cipriancraciun commented on GitHub (Nov 20, 2025): This option would be very helpful especially if one wants to override the number of CPU's Ollama has detected. I believe that resource consumption is an administrative issue, thus one should be able to enforce certain limits, as opposed to leaving the API clients to dictate these. ---- For example I have an Intel Core i7 12700 which has 20 cores, of which 16 are performance and 4 are "efficiency" ones. Ollama chooses to use only 8 cores, however without explicit CPU pinning the OS sometimes does schedule Ollama threads on the efficiency cores, thus negating any potential benefit Ollama might expect from not using efficiency cores at all. As such I want to force Ollama to just use either 8+4 (12 cores) or ignore hyper-threading and just run on all 20 cores. At the moment this is possible only if the application that makes the API request supports overriding `num_thread`, and even if it does, it's cumbersome to change it everywhere while experimenting.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#285