[PR #11413] runner: Pass custom thread count to backend via env varibale #44781

Open
opened 2026-04-25 00:27:10 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/11413
Author: @shalinib-ibm
Created: 7/14/2025
Status: 🔄 Open

Base: mainHead: set_env_threads


📝 Commits (2)

  • bffd076 PowerPC: Pass Optimal thread count to backend
  • 529675e Merge branch 'main' into set_env_threads

📊 Changes

1 file changed (+47 additions, -1 deletions)

View changed files

📝 runner/llamarunner/runner.go (+47 -1)

📄 Description

This patch allows users to override the number of threads used during model execution via the OLLAMA_NUM_THREADS environment variable.

If this variable is not set, an optimal thread count is set for Power Arch only

By default, the thread count is set to runtime.NumCPU(), which uses all available logical cores. This can cause CPU thrashing and degraded performance.

In internal testing, tuning the thread count to an optimal value for the granite-3b:8b model led to a 40% improvement in inference throughput, compared to the default thread setting.

Example usage:
OLLAMA_NUM_THREADS=8 ollama run ...


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/11413 **Author:** [@shalinib-ibm](https://github.com/shalinib-ibm) **Created:** 7/14/2025 **Status:** 🔄 Open **Base:** `main` ← **Head:** `set_env_threads` --- ### 📝 Commits (2) - [`bffd076`](https://github.com/ollama/ollama/commit/bffd0763f7e007243d26ce6f1b656a30b3da2b68) PowerPC: Pass Optimal thread count to backend - [`529675e`](https://github.com/ollama/ollama/commit/529675e958ec037dfa4be213bdcaadc37fae56f2) Merge branch 'main' into set_env_threads ### 📊 Changes **1 file changed** (+47 additions, -1 deletions) <details> <summary>View changed files</summary> 📝 `runner/llamarunner/runner.go` (+47 -1) </details> ### 📄 Description This patch allows users to override the number of threads used during model execution via the OLLAMA_NUM_THREADS environment variable. If this variable is not set, an optimal thread count is set for Power Arch only By default, the thread count is set to runtime.NumCPU(), which uses all available logical cores. This can cause CPU thrashing and degraded performance. In internal testing, tuning the thread count to an optimal value for the `granite-3b:8b` model led to a **40% improvement in inference throughput**, compared to the default thread setting. Example usage: OLLAMA_NUM_THREADS=8 ollama run ... --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-25 00:27:10 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#44781