[GH-ISSUE #11923] Ollama 0.11.5-RC2: New Memory Management: Ollama starts more instances than required. #7915

Closed
opened 2026-04-12 20:04:58 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @dan-and on GitHub (Aug 15, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11923

What is the issue?

Not a bug per se, but a notification for @jessegross regarding the new Memory Management: Ollama will use as many forks as GPUs are available to the system, even when only a lower amount of GPUs are is utilized. (See Screenshot)

(OLLAMA_NEW_ENGINE=1 is set)

Before your MR #11090 , but already including my GPU grouping patch ( MR #10678 ), the behavior of Ollama was to spin only so many forks as GPUs are required by the model.

This is not a showstopper, but it is not an efficient behavior. Please decide on your own if that is fine for the Ollama team.

Thanks for the remarkable work and support. :-)

Image

Relevant log output

See attached Screenshot.

OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

0.11.5-RC2

Originally created by @dan-and on GitHub (Aug 15, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11923 ### What is the issue? Not a bug per se, but a notification for @jessegross regarding the new Memory Management: Ollama will use as many forks as GPUs are available to the system, even when only a lower amount of GPUs are is utilized. (See Screenshot) (OLLAMA_NEW_ENGINE=1 is set) Before your MR #11090 , but already including my GPU grouping patch ( MR #10678 ), the behavior of Ollama was to spin only so many forks as GPUs are required by the model. This is not a showstopper, but it is not an efficient behavior. Please decide on your own if that is fine for the Ollama team. Thanks for the remarkable work and support. :-) <img width="1448" height="1131" alt="Image" src="https://github.com/user-attachments/assets/b2d13cfb-f904-423b-a553-db88f40576a9" /> ### Relevant log output ```shell See attached Screenshot. ``` ### OS Linux ### GPU Nvidia ### CPU AMD ### Ollama version 0.11.5-RC2
GiteaMirror added the bug label 2026-04-12 20:04:58 -05:00
Author
Owner

@jessegross commented on GitHub (Aug 15, 2025):

These aren't forks - you can see the PIDs for all of them are the same. Before the subprocess was only allowed to see the necessary GPUs through the use of CUDA_VISIBLE_DEVICES, so they were masked out at the CUDA level. Now, it sees all of GPUs but only schedules on the appropriate ones.

Is there something specific that you see that is not efficient?

<!-- gh-comment-id:3192171487 --> @jessegross commented on GitHub (Aug 15, 2025): These aren't forks - you can see the PIDs for all of them are the same. Before the subprocess was only allowed to see the necessary GPUs through the use of CUDA_VISIBLE_DEVICES, so they were masked out at the CUDA level. Now, it sees all of GPUs but only schedules on the appropriate ones. Is there something specific that you see that is not efficient?
Author
Owner

@dan-and commented on GitHub (Aug 15, 2025):

You are right. So this is a non-issue. Thanks.

<!-- gh-comment-id:3192215615 --> @dan-and commented on GitHub (Aug 15, 2025): You are right. So this is a non-issue. Thanks.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#7915