[PR #11413] runner: Pass custom thread count to backend via env varibale #13533

Open
opened 2026-04-13 00:29:38 -05:00 by GiteaMirror · 0 comments
Owner

Original Pull Request: https://github.com/ollama/ollama/pull/11413

State: open
Merged: No


This patch allows users to override the number of threads used during model execution via the OLLAMA_NUM_THREADS environment variable.

If this variable is not set, an optimal thread count is set for Power Arch only

By default, the thread count is set to runtime.NumCPU(), which uses all available logical cores. This can cause CPU thrashing and degraded performance.

In internal testing, tuning the thread count to an optimal value for the granite-3b:8b model led to a 40% improvement in inference throughput, compared to the default thread setting.

Example usage:
OLLAMA_NUM_THREADS=8 ollama run ...

**Original Pull Request:** https://github.com/ollama/ollama/pull/11413 **State:** open **Merged:** No --- This patch allows users to override the number of threads used during model execution via the OLLAMA_NUM_THREADS environment variable. If this variable is not set, an optimal thread count is set for Power Arch only By default, the thread count is set to runtime.NumCPU(), which uses all available logical cores. This can cause CPU thrashing and degraded performance. In internal testing, tuning the thread count to an optimal value for the `granite-3b:8b` model led to a **40% improvement in inference throughput**, compared to the default thread setting. Example usage: OLLAMA_NUM_THREADS=8 ollama run ...
GiteaMirror added the pull-request label 2026-04-13 00:29:38 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#13533