[GH-ISSUE #8959] After Dozens of Conversations, Causing Sudden and Obvious Slowdowns #5812

Closed
opened 2026-04-12 17:09:09 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @piz-ewing on GitHub (Feb 9, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8959

What is the issue?

After using Ollama for dozens of conversations, conversations suddenly become very slow. At this point, GPU usage reaches 100%, though it usually stays around 96%. The memory usage is around 50% and remains unchanged. This issue occurs on macOS 15.3 and resets to normal after using the /clear command. It is unclear whether this is related to a configuration or if there is another underlying issue. Any assistance would be appreciated.

Relevant log output

Previous time

total duration:       6.324506s
load duration:        31.330917ms
prompt eval count:    969 token(s)
prompt eval duration: 1.149s
prompt eval rate:     843.34 tokens/s
eval count:           46 token(s)
eval duration:        5.099s
eval rate:            9.02 tokens/s

Last time

total duration:       16.450105625s
load duration:        32.530125ms
prompt eval count:    1011 token(s)
prompt eval duration: 10.389s
prompt eval rate:     97.31 tokens/s
eval count:           47 token(s)
eval duration:        5.992s
eval rate:            7.84 tokens/s

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.5.7

Originally created by @piz-ewing on GitHub (Feb 9, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8959 ### What is the issue? After using Ollama for dozens of conversations, conversations suddenly become very slow. At this point, GPU usage reaches 100%, though it usually stays around 96%. The memory usage is around 50% and remains unchanged. This issue occurs on macOS 15.3 and resets to normal after using the `/clear` command. It is unclear whether this is related to a configuration or if there is another underlying issue. Any assistance would be appreciated. ### Relevant log output **Previous time** ```shell total duration: 6.324506s load duration: 31.330917ms prompt eval count: 969 token(s) prompt eval duration: 1.149s prompt eval rate: 843.34 tokens/s eval count: 46 token(s) eval duration: 5.099s eval rate: 9.02 tokens/s ``` **Last time** ```shell total duration: 16.450105625s load duration: 32.530125ms prompt eval count: 1011 token(s) prompt eval duration: 10.389s prompt eval rate: 97.31 tokens/s eval count: 47 token(s) eval duration: 5.992s eval rate: 7.84 tokens/s ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.5.7
GiteaMirror added the bug label 2026-04-12 17:09:09 -05:00
Author
Owner

@piz-ewing commented on GitHub (Feb 9, 2025):

I set the /set parameter num_ctx to a larger value(10240), and after going through a slower process (around 30 seconds), it recovered. I expect it might still occur once the maximum value is reached. I will close the issue now.

<!-- gh-comment-id:2646028168 --> @piz-ewing commented on GitHub (Feb 9, 2025): I set the `/set parameter num_ctx` to a larger value(10240), and after going through a slower process (around 30 seconds), it recovered. I expect it might still occur once the maximum value is reached. I will close the issue now.
Author
Owner

@jmorganca commented on GitHub (Feb 9, 2025):

@piz-ewing thanks for the update and sorry that happened.

<!-- gh-comment-id:2646030432 --> @jmorganca commented on GitHub (Feb 9, 2025): @piz-ewing thanks for the update and sorry that happened.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#5812