[GH-ISSUE #9616] Performance hit since v0.5.10 #6273

Closed
opened 2026-04-12 17:41:52 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @webdev23 on GitHub (Mar 9, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9616

Originally assigned to: @jmorganca on GitHub.

What is the issue?

Clear performance hit, of over 20%, on my machine, between [v0.5.10] and later versions, gradually decreases.

Following multiple benchmarks over multiple versions.

The offloading to CPU is more prominent on later versions.

As a result, on my machine 14B models such as phi4 or deepseek-14B, cannot run at all, with less than 1 token by seconds, leading quickly to CPU over temperature with all cores at 100%, while this behavior never happen on [v0.5.10] .

On [v0.5.10] , on my machine, such 14B could run, at usable speed, 3 or 5 tokens by seconds are doable, over many minutes, not a problem.

Relevant log output


OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

[v0.5.13]

Originally created by @webdev23 on GitHub (Mar 9, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9616 Originally assigned to: @jmorganca on GitHub. ### What is the issue? Clear performance hit, of over 20%, on my machine, between [v0.5.10] and later versions, gradually decreases. Following multiple benchmarks over multiple versions. The offloading to CPU is more prominent on later versions. As a result, on my machine 14B models such as phi4 or deepseek-14B, cannot run at all, with less than 1 token by seconds, leading quickly to CPU over temperature with all cores at 100%, while this behavior never happen on [v0.5.10] . On [v0.5.10] , _on my machine_, such 14B could run, at usable speed, 3 or 5 tokens by seconds are doable, over many minutes, not a problem. ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version [v0.5.13]
GiteaMirror added the bug label 2026-04-12 17:41:52 -05:00
Author
Owner

@rick-github commented on GitHub (Mar 9, 2025):

Server logs may aid in debugging.

<!-- gh-comment-id:2709037562 --> @rick-github commented on GitHub (Mar 9, 2025): [Server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) may aid in debugging.
Author
Owner

@jmorganca commented on GitHub (Mar 9, 2025):

@webdev23 sorry about that. As @rick-github mentioned server logs will help us a ton. Ideally for both 0.5.9 and 0.5.13 – thanks so much

<!-- gh-comment-id:2709061633 --> @jmorganca commented on GitHub (Mar 9, 2025): @webdev23 sorry about that. As @rick-github mentioned server logs will help us a ton. Ideally for both 0.5.9 and 0.5.13 – thanks so much
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#6273