[GH-ISSUE #2936] Does not using all threads on NUMA configuration (server motherboards 2, 4, 6 multisocket CPU) on Windows #63838

Open
opened 2026-05-03 15:08:53 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @GermanAizek on GitHub (Mar 5, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2936

Originally assigned to: @dhiltgen on GitHub.

Very old problem since 00s early Microsoft concerns setThreadAffinity which by default does not cover all logical processors in the system, this was fixed 20 years later only in Windows 11 and more Windows Server, but count threads is most likely incorrectly calculated. I do not know Golang, but if it were in С/C++ I would be able to help. I have already fixed this issue here before:

https://github.com/ggerganov/llama.cpp/issues/5524

https://github.com/x64dbg/x64dbg/pull/3272

https://github.com/giampaolo/psutil/issues/771

d1fa25f376

image

Originally created by @GermanAizek on GitHub (Mar 5, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2936 Originally assigned to: @dhiltgen on GitHub. Very old problem since 00s early Microsoft concerns `setThreadAffinity` which by default does not cover all logical processors in the system, this was fixed 20 years later only in Windows 11 and more Windows Server, but count threads is most likely incorrectly calculated. I do not know Golang, but if it were in С/C++ I would be able to help. I have already fixed this issue here before: https://github.com/ggerganov/llama.cpp/issues/5524 https://github.com/x64dbg/x64dbg/pull/3272 https://github.com/giampaolo/psutil/issues/771 https://github.com/GermanAizek/llvm-project/commit/d1fa25f37631b8b33a71fbe9eb4ea89e3a47b723 ![image](https://github.com/ollama/ollama/assets/21138600/a517d7a8-b9ba-48de-af90-488b0039474c)
GiteaMirror added the bugwindows labels 2026-05-03 15:08:53 -05:00
Author
Owner

@remy415 commented on GitHub (Mar 5, 2024):

https://github.com/ggerganov/llama.cpp/issues/5524

Ollama uses llama.cpp as backend. If llama.cpp pushes a fix for the issue, Ollama should automatically use it in its next release.

<!-- gh-comment-id:1979664096 --> @remy415 commented on GitHub (Mar 5, 2024): > https://github.com/ggerganov/llama.cpp/issues/5524 Ollama uses llama.cpp as backend. If llama.cpp pushes a fix for the issue, Ollama should automatically use it in its next release.
Author
Owner

@dhiltgen commented on GitHub (Jun 1, 2024):

@GermanAizek skimming through the llama.cpp commit log, I didn't notice a commit from you fixing this, and it looks like the issue was closed due to inactivity. As @remy415 points out, this logic in ollama is based on the llama.cpp C++ code, so if you contribute a fix to llama.cpp, we'll pick it up automatically next time we bump our dependency.

<!-- gh-comment-id:2143588006 --> @dhiltgen commented on GitHub (Jun 1, 2024): @GermanAizek skimming through the llama.cpp commit log, I didn't notice a commit from you fixing this, and it looks like the issue was closed due to inactivity. As @remy415 points out, this logic in ollama is based on the llama.cpp C++ code, so if you contribute a fix to llama.cpp, we'll pick it up automatically next time we bump our dependency.
Author
Owner

@dhiltgen commented on GitHub (Aug 9, 2024):

This should be working better now on Linux with #6186, but windows support is still a gap.

<!-- gh-comment-id:2278859555 --> @dhiltgen commented on GitHub (Aug 9, 2024): This should be working better now on Linux with #6186, but windows support is still a gap.
Author
Owner

@dhiltgen commented on GitHub (Oct 15, 2024):

We've merged new logic to discover the available CPU Sockets, Cores (efficiency and performance) and hyperthreads (logical cpus) for MacOS, Windows, and Linux. In testing on a NUMA system on both Linux and Windows, there's still more work to do, so we'll only allocate the number of physical cores in one socket for now to avoid thrashing. Inference speed should be better, but will still be using only a single socket.

<!-- gh-comment-id:2415305393 --> @dhiltgen commented on GitHub (Oct 15, 2024): We've [merged new logic](https://github.com/ollama/ollama/pull/6264) to discover the available CPU Sockets, Cores (efficiency and performance) and hyperthreads (logical cpus) for MacOS, Windows, and Linux. In testing on a NUMA system on both Linux and Windows, there's still more work to do, so we'll only allocate the number of physical cores in one socket for now to avoid thrashing. Inference speed should be better, but will still be using only a single socket.
Author
Owner

@james-irwin commented on GitHub (Feb 4, 2025):

Notwithstanding being made in the context of manual override (reducing) of the number of threads and via an environment variable (for CPU-based targets) #8792 appears to make more observable and consistent effect than /set parameter num_thread invoked from the client. It could be used to dial them up, too with the same command-line invocation placing those thread-counts on specific cores (via taskset or numactl). The motive for mentioning this here is while I applaud the efforts for automatic and out-of-the-box great default experience, the manual overrides are appropriate for my use case, and it appears for others too.

<!-- gh-comment-id:2634241229 --> @james-irwin commented on GitHub (Feb 4, 2025): Notwithstanding being made in the context of manual override (reducing) of the number of threads and via an environment variable (for CPU-based targets) #8792 appears to make more observable and consistent effect than `/set parameter num_thread` invoked from the client. It could be used to dial them up, too with the same command-line invocation placing those thread-counts on specific cores (via taskset or numactl). The motive for mentioning this here is while I applaud the efforts for automatic and out-of-the-box great default experience, the manual overrides are appropriate for my use case, and it appears for others too.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#63838