[GH-ISSUE #10845] Super Slow Responses (Mac) #53633

Closed
opened 2026-04-29 04:20:39 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @onsomlem on GitHub (May 24, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10845

What is the issue?

I tried using Ollama for local ai server, when the answer is generating it takes forever (which I think shouldn't be happening right?), and sometimes my computer is so slow. It takes few minutes to completely generate an answer from a question. I use Apple M4 chip with 24GB of RAM memory. I was trying to run the qwen3-14b model and it was running really slow. So, I tried to run the qwen3-4b model and had the same issue. Both times I have had minimal background activity.

Relevant log output


OS

macOS

GPU

Apple

CPU

Apple

Ollama version

ollama version is 0.6.8

Originally created by @onsomlem on GitHub (May 24, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10845 ### What is the issue? I tried using Ollama for local ai server, when the answer is generating it takes forever (which I think shouldn't be happening right?), and sometimes my computer is so slow. It takes few minutes to completely generate an answer from a question. I use Apple M4 chip with 24GB of RAM memory. I was trying to run the qwen3-14b model and it was running really slow. So, I tried to run the qwen3-4b model and had the same issue. Both times I have had minimal background activity. ### Relevant log output ```shell ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version ollama version is 0.6.8
GiteaMirror added the bug label 2026-04-29 04:20:39 -05:00
Author
Owner

@rick-github commented on GitHub (May 24, 2025):

Server logs may aid in debugging.

<!-- gh-comment-id:2906906643 --> @rick-github commented on GitHub (May 24, 2025): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may aid in debugging.
Author
Owner

@igorschlum commented on GitHub (May 24, 2025):

@onsomlem You should also try version 0.7.1 as improvements are made at each version.
Try to look "Activity Monitor" to see the memory usage of your computer. Even with 24 GB of RAM, you can have other applications opened.

<!-- gh-comment-id:2906948793 --> @igorschlum commented on GitHub (May 24, 2025): @onsomlem You should also try version 0.7.1 as improvements are made at each version. Try to look "Activity Monitor" to see the memory usage of your computer. Even with 24 GB of RAM, you can have other applications opened.
Author
Owner

@onsomlem commented on GitHub (May 28, 2025):

I am observing it again its running says its using 100% of my GPU but has very low CPU usage is there any way to force it to use both?

<!-- gh-comment-id:2917721224 --> @onsomlem commented on GitHub (May 28, 2025): I am observing it again its running says its using 100% of my GPU but has very low CPU usage is there any way to force it to use both?
Author
Owner

@rick-github commented on GitHub (May 28, 2025):

Sure, you can tell ollama to load less layers in the GPU by setting num_gpu as described here.

<!-- gh-comment-id:2917809204 --> @rick-github commented on GitHub (May 28, 2025): Sure, you can tell ollama to load less layers in the GPU by setting `num_gpu` as described [here](https://github.com/ollama/ollama/issues/6950#issuecomment-2373663650).
Author
Owner

@igorschlum commented on GitHub (May 29, 2025):

@onsomlem on a mac, the RAM is shared between the GPU and the CPU. As Ollama is more efficient with GPU, the Mac dispatch the RAM to the GPU. It will not speed up Ollama to use more CPU than GPU. What is the super slow Responses for you? Can you share a prompt and a speed and I could make some testing and share my results?
Best

<!-- gh-comment-id:2918767033 --> @igorschlum commented on GitHub (May 29, 2025): @onsomlem on a mac, the RAM is shared between the GPU and the CPU. As Ollama is more efficient with GPU, the Mac dispatch the RAM to the GPU. It will not speed up Ollama to use more CPU than GPU. What is the super slow Responses for you? Can you share a prompt and a speed and I could make some testing and share my results? Best
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#53633