[GH-ISSUE #8999] Ollama cannot utilize multiple physical CPUs. #52357

Open
opened 2026-04-28 23:05:36 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @xiangyanggong on GitHub (Feb 11, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8999

Originally assigned to: @mxyng on GitHub.

What is the issue?

There is no GPU in my environment, only two physical CPUs. Each CPU has 48 cores. However, after I ran the model, I noticed the parameter "--threads 48". When I run the model, I find that Ollama only uses one physical CPU. Is this due to the characteristics of Ollama, or does it require some other configurations?

Relevant log output

/usr/local/lib/ollama/runners/cpu_avx2/ollama_llama_server runner --model /usr/share/ollama/.ollama/models/blobs/sha256-4cd576d9aa16961244012223abf01445567b061f1814b57dfef699e4cf8df339 --ctx -size 8192 --batch-size 512 --threads 48 --no-mmap --parallel 4 --port 39029

OS

Linux

GPU

No response

CPU

Intel

Ollama version

0.5.7

Originally created by @xiangyanggong on GitHub (Feb 11, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8999 Originally assigned to: @mxyng on GitHub. ### What is the issue? There is no GPU in my environment, only two physical CPUs. Each CPU has 48 cores. However, after I ran the model, I noticed the parameter "--threads 48". When I run the model, I find that Ollama only uses one physical CPU. Is this due to the characteristics of Ollama, or does it require some other configurations? ### Relevant log output ```shell /usr/local/lib/ollama/runners/cpu_avx2/ollama_llama_server runner --model /usr/share/ollama/.ollama/models/blobs/sha256-4cd576d9aa16961244012223abf01445567b061f1814b57dfef699e4cf8df339 --ctx -size 8192 --batch-size 512 --threads 48 --no-mmap --parallel 4 --port 39029 ``` ### OS Linux ### GPU _No response_ ### CPU Intel ### Ollama version 0.5.7
GiteaMirror added the bug label 2026-04-28 23:05:36 -05:00
Author
Owner

@rick-github commented on GitHub (Feb 11, 2025):

By default, ollama uses only the performance cores in the CPUs. You can override this by setting num_thread.

<!-- gh-comment-id:2650562843 --> @rick-github commented on GitHub (Feb 11, 2025): By default, ollama uses only the performance cores in the CPUs. You can override this by setting [`num_thread`](https://github.com/ollama/ollama/blob/main/docs/api.md#request-8:~:text=use_mlock%22%3A%20false%2C%0A%20%20%20%20%22-,num_thread,-%22%3A%208%0A%20%20%7D%0A%7D).
Author
Owner

@SmallBlueE commented on GitHub (Feb 11, 2025):

By default, ollama uses only the performance cores in the CPUs. You can override this by setting num_thread.

If I set ollama as linux service, could you know how to set this parameter 'num_thread'? Environment=??

<!-- gh-comment-id:2650625671 --> @SmallBlueE commented on GitHub (Feb 11, 2025): > By default, ollama uses only the performance cores in the CPUs. You can override this by setting [`num_thread`](https://github.com/ollama/ollama/blob/main/docs/api.md#request-8:~:text=use_mlock%22%3A%20false%2C%0A%20%20%20%20%22-,num_thread,-%22%3A%208%0A%20%20%7D%0A%7D). If I set ollama as linux service, could you know how to set this parameter 'num_thread'? Environment=??
Author
Owner

@rick-github commented on GitHub (Feb 11, 2025):

Either set it in the API call ("options":{"num_thread":96}) or create a new model:

echo FROM deepseek-r1:7b > Modelfile
echo PARAMETER num_thread 96 >> Modelfile
ollama create deepseek:t96
$ ollama run deepseek-r1:7b ''
$ ps who cmd p$(pidof ollama_llama_server) | sed -e 's/.*\(--thread\)/\1/'
--threads 8 --parallel 1 --port 46763
$ ollama run deepseek:t96 ''
$ ps who cmd p$(pidof ollama_llama_server) | sed -e 's/.*\(--thread\)/\1/'
--threads 96 --parallel 1 --port 41649

<!-- gh-comment-id:2650672329 --> @rick-github commented on GitHub (Feb 11, 2025): Either set it in the API call (`"options":{"num_thread":96}`) or create a new model: ```console echo FROM deepseek-r1:7b > Modelfile echo PARAMETER num_thread 96 >> Modelfile ollama create deepseek:t96 ``` ```console $ ollama run deepseek-r1:7b '' $ ps who cmd p$(pidof ollama_llama_server) | sed -e 's/.*\(--thread\)/\1/' --threads 8 --parallel 1 --port 46763 $ ollama run deepseek:t96 '' $ ps who cmd p$(pidof ollama_llama_server) | sed -e 's/.*\(--thread\)/\1/' --threads 96 --parallel 1 --port 41649 ```
Author
Owner

@SmallBlueE commented on GitHub (Feb 11, 2025):

Either set it in the API call ("options":{"num_thread":96}) or create a new model:

echo FROM deepseek-r1:7b > Modelfile
echo PARAMETER num_thread 96 >> Modelfile
ollama create deepseek:t96
$ ollama run deepseek-r1:7b ''
$ ps who cmd p$(pidof ollama_llama_server) | sed -e 's/.(--thread)/\1/'
--threads 8 --parallel 1 --port 46763
$ ollama run deepseek:t96 ''
$ ps who cmd p$(pidof ollama_llama_server) | sed -e 's/.
(--thread)/\1/'
--threads 96 --parallel 1 --port 41649

Can you give us a template of Modefile for Deepseek-r1-671b-Q8_0, this model has 15 gguf file. How to merge it into one gguf file correctly for Ollama? Cat ..15 file > 1 file ?

<!-- gh-comment-id:2650697845 --> @SmallBlueE commented on GitHub (Feb 11, 2025): > Either set it in the API call (`"options":{"num_thread":96}`) or create a new model: > > echo FROM deepseek-r1:7b > Modelfile > echo PARAMETER num_thread 96 >> Modelfile > ollama create deepseek:t96 > $ ollama run deepseek-r1:7b '' > $ ps who cmd p$(pidof ollama_llama_server) | sed -e 's/.*\(--thread\)/\1/' > --threads 8 --parallel 1 --port 46763 > $ ollama run deepseek:t96 '' > $ ps who cmd p$(pidof ollama_llama_server) | sed -e 's/.*\(--thread\)/\1/' > --threads 96 --parallel 1 --port 41649 Can you give us a template of Modefile for Deepseek-r1-671b-Q8_0, this model has 15 gguf file. How to merge it into one gguf file correctly for Ollama? Cat ..15 file > 1 file ?
Author
Owner
<!-- gh-comment-id:2650699828 --> @rick-github commented on GitHub (Feb 11, 2025): https://unsloth.ai/blog/deepseekr1-dynamic#:~:text=llama%2Dgguf%2Dsplit%20%2D%2D-,merge,-%5C%0A%20%20DeepSeek%2DR1%2DGGUF
Author
Owner

@Diksha06122 commented on GitHub (Apr 28, 2025):

Try this its worked for me:

sudo docker run -d --name ollama
--cpus=16
--cpuset-cpus="0-15"
-p 11434:11434
ollama/ollama

<!-- gh-comment-id:2834810124 --> @Diksha06122 commented on GitHub (Apr 28, 2025): Try this its worked for me: sudo docker run -d --name ollama \ --cpus=16 \ --cpuset-cpus="0-15" \ -p 11434:11434 \ ollama/ollama
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#52357