[GH-ISSUE #3319] Only half CPUs Running on, whatever on Windows Server, Windows 10/11 or Ubuntu Linux [CPU to Run Models] #2041

Closed
opened 2026-04-12 12:16:08 -05:00 by GiteaMirror · 9 comments
Owner

Originally created by @OPDEV001 on GitHub (Mar 24, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3319

What is the issue?

ATTENTION, I only use CPU to run Models.

I have setup Ollama successfully on following environments, listing below:

  1. Physical with Windows 11
  2. Windows Server 2022 on VMware
  3. Windows 10/11 on VMware
  4. Ubuntu Linux on VMware
  5. Physical Machine with Windows Server 2022
    But I found all environment have a same issue, only half CPUs running when ollama working. For example, it will take 4 CPUs when you give 8 CPUs, or it will take 8 CPUs when you give 16 CPUs.

Anybody has the same issue or check with your environment and pay more attention.

Thanks,

What did you expect to see?

I expect to use all of CPUs if provided.

Steps to reproduce

Follow the guide line to build your Ollama, and you will see this problem.

Are there any recent changes that introduced the issue?

A full-newly environment and follow the official guide line.

OS

Linux, Windows

Architecture

amd64

Platform

No response

Ollama version

129

GPU

Intel

GPU info

I running on CPU

CPU

Intel

Other software

no other Software, only full-newly environment specially for Ollama.

But most environment on VMware. If I test other press software on VMware, the CPUs works as normal as fully-taken.

Originally created by @OPDEV001 on GitHub (Mar 24, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3319 ### What is the issue? ATTENTION, I only use CPU to run Models. I have setup Ollama successfully on following environments, listing below: 1) Physical with Windows 11 2) Windows Server 2022 on VMware 3) Windows 10/11 on VMware 4) Ubuntu Linux on VMware 5) Physical Machine with Windows Server 2022 But I found all environment have a same issue, only half CPUs running when ollama working. For example, it will take 4 CPUs when you give 8 CPUs, or it will take 8 CPUs when you give 16 CPUs. Anybody has the same issue or check with your environment and pay more attention. Thanks, ### What did you expect to see? I expect to use all of CPUs if provided. ### Steps to reproduce Follow the guide line to build your Ollama, and you will see this problem. ### Are there any recent changes that introduced the issue? A full-newly environment and follow the official guide line. ### OS Linux, Windows ### Architecture amd64 ### Platform _No response_ ### Ollama version 129 ### GPU Intel ### GPU info I running on CPU ### CPU Intel ### Other software no other Software, only full-newly environment specially for Ollama. But most environment on VMware. If I test other press software on VMware, the CPUs works as normal as fully-taken.
GiteaMirror added the bug label 2026-04-12 12:16:08 -05:00
Author
Owner

@remy415 commented on GitHub (Mar 24, 2024):

I just tested this on my Jetson (ARM64 6 core) and can confirm only 3 cores were active during compute.

<!-- gh-comment-id:2016665061 --> @remy415 commented on GitHub (Mar 24, 2024): I just tested this on my Jetson (ARM64 6 core) and can confirm only 3 cores were active during compute.
Author
Owner

@Masterchief-07 commented on GitHub (Mar 25, 2024):

set the number of threads in each request to ollama
curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt": "Why is the sky blue?",
"options": {
"num_thread": 8
}
}'

<!-- gh-comment-id:2017674919 --> @Masterchief-07 commented on GitHub (Mar 25, 2024): set the number of threads in each request to ollama curl http://localhost:11434/api/generate -d '{ "model": "llama2", "prompt": "Why is the sky blue?", "options": { "num_thread": 8 } }'
Author
Owner

@remy415 commented on GitHub (Mar 25, 2024):

@OPDEV001 As @Masterchief-07 pointed out, the client request is what sets the number of threads spawned. Since you're using Ollama as your client and your back-end, looks like the way to go is to create a custom model with the num_thread field set to the number of CPU cores on your machine. There is a guide for this in the documentation.

<!-- gh-comment-id:2018821461 --> @remy415 commented on GitHub (Mar 25, 2024): @OPDEV001 As @Masterchief-07 pointed out, the client request is what sets the number of threads spawned. Since you're using Ollama as your client and your back-end, looks like the way to go is to create a custom model with the `num_thread` field set to the number of CPU cores on your machine. There is a guide for this in the [documentation](https://github.com/ollama/ollama/tree/main/docs).
Author
Owner

@OPDEV001 commented on GitHub (Mar 26, 2024):

Thanks for your reply, but I can not access to doclementation on Github Ollama, strange...

Can I setup the ollama by config file or environment variable, or startup command?

Thanks,

<!-- gh-comment-id:2020369790 --> @OPDEV001 commented on GitHub (Mar 26, 2024): Thanks for your reply, but I can not access to doclementation on Github Ollama, strange... Can I setup the ollama by config file or environment variable, or startup command? Thanks,
Author
Owner

@remy415 commented on GitHub (Mar 26, 2024):

The specific file detailing the creation of custom models is here.

One thing to note from the page:

num_thread Sets the number of threads to use during computation. By default, Ollama will detect this for optimal performance. It is recommended to set this value to the number of physical CPU cores your system has (as opposed to the logical number of cores). int num_thread 8

If Ollama is detecting & setting this automatically and is focused on physical cores, then it is logical to assume it is set to total cores / 2 as most systems are running Hyperthreading, SMT, etc, and this number would correspond with physical cpu cores on a typical system.

The example file given on the page is:

FROM llama2
# sets the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1
# sets the context window size to 4096, this controls how many tokens the LLM can use as context to generate the next token
PARAMETER num_ctx 4096

# sets a custom system message to specify the behavior of the chat assistant
SYSTEM You are Mario from super mario bros, acting as an assistant.

So borrowing from this syntax, the model file created would say something like

FROM mistral

# Set this to the number of physical CPU cores
PARAMETER num_threads 8

Then you would import the model file into Ollama:

1. Save it as a file (e.g. Modelfile)
2. ollama create choose-a-model-name -f <location of the file e.g. ./Modelfile>'
3. ollama run choose-a-model-name
<!-- gh-comment-id:2020397285 --> @remy415 commented on GitHub (Mar 26, 2024): The specific file detailing the creation of custom models is [here](https://github.com/ollama/ollama/blob/main/docs/modelfile.md). One thing to note from the page: num_thread | Sets the number of threads to use during computation. By default, Ollama will detect this for optimal performance. It is recommended to set this value to the number of physical CPU cores your system has (as opposed to the logical number of cores). | int | num_thread 8 -- | -- | -- | -- If Ollama is detecting & setting this automatically and is focused on physical cores, then it is logical to assume it is set to `total cores / 2` as most systems are running Hyperthreading, SMT, etc, and this number would correspond with physical cpu cores on a typical system. The example file given on the page is: ``` FROM llama2 # sets the temperature to 1 [higher is more creative, lower is more coherent] PARAMETER temperature 1 # sets the context window size to 4096, this controls how many tokens the LLM can use as context to generate the next token PARAMETER num_ctx 4096 # sets a custom system message to specify the behavior of the chat assistant SYSTEM You are Mario from super mario bros, acting as an assistant. ``` So borrowing from this syntax, the model file created would say something like ``` FROM mistral # Set this to the number of physical CPU cores PARAMETER num_threads 8 ``` Then you would import the model file into Ollama: ``` 1. Save it as a file (e.g. Modelfile) 2. ollama create choose-a-model-name -f <location of the file e.g. ./Modelfile>' 3. ollama run choose-a-model-name ```
Author
Owner

@OPDEV001 commented on GitHub (Mar 26, 2024):

Thanks for your replying.

I used open-webui to complete that goal.

Navigate to open-webui and create your own mode file, typing name and some specification in short. Then type the parameters like below, here is my content:

FROM zephyr:7b-beta-q5_K_M

PARAMETER num_threads 16

SYSTEM """Code Companion provides direct, precise solutions in Golang, TypeScript, Rust, and Python, and creating UI mockups from uploaded screenshots. For coding queries, it still offers quick solutions without detailed explanations, assuming users have a good grasp of programming concepts."""

Actually, I copied the content from https://openwebui.com/modelfiles and you can search any modelfile as your wish. I just add the line of "PARAMETER num_threads 16". For the other parameter, other friends can refer to the documentation, :)

By now, I can run full usage for ollama.

Many thanks again.

<!-- gh-comment-id:2020514961 --> @OPDEV001 commented on GitHub (Mar 26, 2024): Thanks for your replying. I used open-webui to complete that goal. Navigate to open-webui and create your own mode file, typing name and some specification in short. Then type the parameters like below, here is my content: FROM zephyr:7b-beta-q5_K_M PARAMETER num_threads 16 SYSTEM """Code Companion provides direct, precise solutions in Golang, TypeScript, Rust, and Python, and creating UI mockups from uploaded screenshots. For coding queries, it still offers quick solutions without detailed explanations, assuming users have a good grasp of programming concepts.""" Actually, I copied the content from https://openwebui.com/modelfiles and you can search any modelfile as your wish. I just add the line of "PARAMETER num_threads 16". For the other parameter, other friends can refer to the documentation, :) By now, I can run full usage for ollama. Many thanks again.
Author
Owner

@0x7CFE commented on GitHub (Jul 26, 2024):

I am again seeing this issue on recent versions of web ui 0.3.10 and ollama 3.0, though the issue reappeared somewhere around versions 0.3.8 (0.2.5 for ollama).

I see that num_threads is set, but only half of CPU cores are actually busy.

Ubuntu Linux 22.04.4 LTS

<!-- gh-comment-id:2252171837 --> @0x7CFE commented on GitHub (Jul 26, 2024): I am again seeing this issue on recent versions of web ui 0.3.10 and ollama 3.0, though the issue reappeared somewhere around versions 0.3.8 (0.2.5 for ollama). I see that `num_threads` is set, but only half of CPU cores are actually busy. Ubuntu Linux 22.04.4 LTS
Author
Owner

@dberardo-com commented on GitHub (Jan 30, 2025):

is num_threads a valid model parameter? i cant see that in the doc

<!-- gh-comment-id:2625347322 --> @dberardo-com commented on GitHub (Jan 30, 2025): is num_threads a valid model parameter? i cant see that in the doc
Author
Owner

@mbrbug commented on GitHub (Feb 5, 2025):

same here
is parameter removed?

❯ ollama create myqwen -f ./myqwen
gathering model components
Error: unknown parameter 'num_threads'
<!-- gh-comment-id:2637907977 --> @mbrbug commented on GitHub (Feb 5, 2025): same here is parameter removed? ``` ❯ ollama create myqwen -f ./myqwen gathering model components Error: unknown parameter 'num_threads' ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#2041