[GH-ISSUE #4569] OLLAMA_NUM_PARALLEL problem #2866

Closed
opened 2026-04-12 13:12:48 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @marxy on GitHub (May 22, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4569

Originally assigned to: @jmorganca on GitHub.

What is the issue?

企业微信截图_17163461556090
When I set the OLLAMA_NUM_PARALLEL=3 environment parameter, I found an exception on multi-threaded requests in a single model, as shown in the figure.
企业微信截图_17163461339124
At the same time, I also found abnormal output in the log, is this a model's problem or a problem of multi-threaded requests?

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.1.38

Originally created by @marxy on GitHub (May 22, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4569 Originally assigned to: @jmorganca on GitHub. ### What is the issue? ![企业微信截图_17163461556090](https://github.com/ollama/ollama/assets/12171912/6ab3b3b7-2a18-4887-b4c7-67af4cd5eef4) When I set the OLLAMA_NUM_PARALLEL=3 environment parameter, I found an exception on multi-threaded requests in a single model, as shown in the figure. ![企业微信截图_17163461339124](https://github.com/ollama/ollama/assets/12171912/5adeb5e1-bdc6-41aa-b427-9f7b467aaaeb) At the same time, I also found abnormal output in the log, is this a model's problem or a problem of multi-threaded requests? ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.1.38
GiteaMirror added the bug label 2026-04-12 13:12:48 -05:00
Author
Owner

@ITHealer commented on GitHub (May 22, 2024):

How to set env to run parallel in linux?
I had set up env by:

sudo nano ~/.bashrc
export OLLAMA_NUM_PARALLEL=4
source ~/.bashrc

But it doesn't work

<!-- gh-comment-id:2124047956 --> @ITHealer commented on GitHub (May 22, 2024): How to set env to run parallel in linux? I had set up env by: sudo nano ~/.bashrc export OLLAMA_NUM_PARALLEL=4 source ~/.bashrc But it doesn't work
Author
Owner

@marxy commented on GitHub (May 22, 2024):

How to set env to run parallel in linux? I had set up env by:

sudo nano ~/.bashrc export OLLAMA_NUM_PARALLEL=4 source ~/.bashrc

But it doesn't work

If you use systemctl command to start ollama service, you can fllow this guide to add environment parameter(add Environment in /etc/systemd/system/ollama.service [Service]).
https://github.com/ollama/ollama/blob/main/docs/linux.md#adding-ollama-as-a-startup-service-recommended
https://github.com/ollama/ollama/blob/main/docs/faq.md#setting-environment-variables-on-linux

<!-- gh-comment-id:2124956488 --> @marxy commented on GitHub (May 22, 2024): > How to set env to run parallel in linux? I had set up env by: > > sudo nano ~/.bashrc export OLLAMA_NUM_PARALLEL=4 source ~/.bashrc > > But it doesn't work If you use systemctl command to start ollama service, you can fllow this guide to add environment parameter(add Environment in /etc/systemd/system/ollama.service [Service]). https://github.com/ollama/ollama/blob/main/docs/linux.md#adding-ollama-as-a-startup-service-recommended https://github.com/ollama/ollama/blob/main/docs/faq.md#setting-environment-variables-on-linux
Author
Owner

@zhaowei0315 commented on GitHub (May 29, 2024):

When I use OLLAMA_NUM_PARALLEL=4 OLLAMA_MAX_LOADED_MODELS=4 ollama serve
Ollama responses so slow. how to fix it?

<!-- gh-comment-id:2136662348 --> @zhaowei0315 commented on GitHub (May 29, 2024): When I use OLLAMA_NUM_PARALLEL=4 OLLAMA_MAX_LOADED_MODELS=4 ollama serve Ollama responses so slow. how to fix it?
Author
Owner

@AndyX-Net commented on GitHub (Jun 17, 2024):

When I use OLLAMA_NUM_PARALLEL=4 OLLAMA_MAX_LOADED_MODELS=4 ollama serve Ollama responses so slow. how to fix it?

Excessive PARALLEL might cause parts of the LLM to be split onto CPU load. Inspect your ollama logs, or check if the CPU usage significantly increases when issues occur.

<!-- gh-comment-id:2172538063 --> @AndyX-Net commented on GitHub (Jun 17, 2024): > When I use OLLAMA_NUM_PARALLEL=4 OLLAMA_MAX_LOADED_MODELS=4 ollama serve Ollama responses so slow. how to fix it? Excessive PARALLEL might cause parts of the LLM to be split onto CPU load. Inspect your ollama logs, or check if the CPU usage significantly increases when issues occur.
Author
Owner

@dhiltgen commented on GitHub (Jul 25, 2024):

Concurrency is enabled by default in 0.2.0 and up. If no parallel setting is provided, we'll try to load with 4 as long as that doesn't cause the model to spill over into CPU. If the model wont fit fully in GPU, then we'll fall back to 1 parallel.

More details at https://github.com/ollama/ollama/blob/main/docs/faq.md#how-does-ollama-handle-concurrent-requests

<!-- gh-comment-id:2251550142 --> @dhiltgen commented on GitHub (Jul 25, 2024): Concurrency is enabled by default in 0.2.0 and up. If no parallel setting is provided, we'll try to load with 4 as long as that doesn't cause the model to spill over into CPU. If the model wont fit fully in GPU, then we'll fall back to 1 parallel. More details at https://github.com/ollama/ollama/blob/main/docs/faq.md#how-does-ollama-handle-concurrent-requests
Author
Owner

@JedrzejMajko commented on GitHub (Jul 3, 2025):

It's not working. Windows 11 with one model and 32 concurrent workers - 1% GPU used, 98% GPU memory taken.
It's broken.

<!-- gh-comment-id:3032701064 --> @JedrzejMajko commented on GitHub (Jul 3, 2025): It's not working. Windows 11 with one model and 32 concurrent workers - 1% GPU used, 98% GPU memory taken. It's broken.
Author
Owner

@dhiltgen commented on GitHub (Jul 3, 2025):

@JedrzejMajko please go ahead and file a new issue and describe your setup. Please include server logs as well. (my suspicion would be the model isn't fully loaded on the GPU and you're CPU bound, but the logs will help show what's going on.)

<!-- gh-comment-id:3032714589 --> @dhiltgen commented on GitHub (Jul 3, 2025): @JedrzejMajko please go ahead and file a new issue and describe your setup. Please include server logs as well. (my suspicion would be the model isn't fully loaded on the GPU and you're CPU bound, but the logs will help show what's going on.)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#2866