[GH-ISSUE #11299] Performance regression: v0.9.2 and higher #53966

Open
opened 2026-04-29 05:01:17 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @Stogas on GitHub (Jul 4, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11299

What is the issue?

Ollama v0.9.2, at least on Windows, takes a significantly more time before each inference starts on the GPU.

The following system usage graphs are from identical payload sequences and contents - 5 chat completions and 2 generations.

The issue persists across multiple reinstalls of each version. v0.9.1 and v0.9.0 are not affected.

CPU/GPU usage v0.9.1 - OK

Image Image

CPU/GPU usage v0.9.2 - REGRESSION

Image Image

CPU/GPU usage v0.9.5 - REGRESSION (same)

Image Image

Relevant log output

Normal:

Debug logs enabled:

Tracing enabled:

OS

Windows

GPU

AMD

CPU

AMD

Ollama version

0.9.2 and up

Originally created by @Stogas on GitHub (Jul 4, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11299 ### What is the issue? Ollama v0.9.2, at least on Windows, takes a significantly more time before each inference starts on the GPU. The following system usage graphs are from identical payload sequences and contents - 5 chat completions and 2 generations. The issue persists across multiple reinstalls of each version. v0.9.1 and v0.9.0 are not affected. <details><summary>CPU/GPU usage v0.9.1 - OK</summary> <p> <img width="540" height="468" alt="Image" src="https://github.com/user-attachments/assets/0646bc7a-e397-4033-bb69-8ad0a103b394" /> <img width="540" height="468" alt="Image" src="https://github.com/user-attachments/assets/00fd558f-7d8d-433b-95c0-cdbfe12a7536" /> </p> </details> <details><summary>CPU/GPU usage v0.9.2 - REGRESSION</summary> <p> <img width="540" height="468" alt="Image" src="https://github.com/user-attachments/assets/a29659b1-c53a-4548-bc20-766da4b46b9c" /> <img width="540" height="468" alt="Image" src="https://github.com/user-attachments/assets/e0004bbc-2118-4399-a799-4dfdf55b3920" /> </p> </details> <details><summary>CPU/GPU usage v0.9.5 - REGRESSION (same)</summary> <p> <img width="540" height="468" alt="Image" src="https://github.com/user-attachments/assets/bc6f0efc-7899-47fd-919d-c425966e2396" /> <img width="540" height="468" alt="Image" src="https://github.com/user-attachments/assets/6277d993-0155-469d-a47e-c5fd6e23e739" /> </p> </details> ### Relevant log output Normal: - [server-0.9.1.log](https://github.com/user-attachments/files/21064293/server-0.9.1.log) - [server-0.9.2.log](https://github.com/user-attachments/files/21064295/server-0.9.2.log) - [server-0.9.5.log](https://github.com/user-attachments/files/21064297/server-0.9.5.log) Debug logs enabled: - [server-0.9.1-debug.log](https://github.com/user-attachments/files/21064418/server-0.9.1-debug.log) - [server-0.9.2-debug.log](https://github.com/user-attachments/files/21064390/server-0.9.2-debug.log) - [server-0.9.5-debug.log](https://github.com/user-attachments/files/21064391/server-0.9.5-debug.log) Tracing enabled: - [server-0.9.1-trace.log](https://github.com/user-attachments/files/21064567/server-0.9.1-trace.log) - [server-0.9.2-trace.log](https://github.com/user-attachments/files/21064566/server-0.9.2-trace.log) ### OS Windows ### GPU AMD ### CPU AMD ### Ollama version 0.9.2 and up
GiteaMirror added the bug label 2026-04-29 05:01:17 -05:00
Author
Owner

@rick-github commented on GitHub (Jul 4, 2025):

Logs?

<!-- gh-comment-id:3036991721 --> @rick-github commented on GitHub (Jul 4, 2025): Logs?
Author
Owner

@Stogas commented on GitHub (Jul 4, 2025):

Sorry, updated description - issue template messed with the formatting.

<!-- gh-comment-id:3037051294 --> @Stogas commented on GitHub (Jul 4, 2025): Sorry, updated description - issue template messed with the formatting.
Author
Owner

@Stogas commented on GitHub (Jul 4, 2025):

On the main branch, reverting 9e125d884c seems to fix the issue.

<!-- gh-comment-id:3037077266 --> @Stogas commented on GitHub (Jul 4, 2025): On the main branch, reverting https://github.com/ollama/ollama/commit/9e125d884cf995dfae7fcd74690d525e4326a517 seems to fix the issue.
Author
Owner

@2jfs904judsw20600jikn613d0dookl23jsig commented on GitHub (Jul 5, 2025):

I can't add anything technical here other than that I've noticed this as well. Long GPU inference times with no erroring. Extreme slowdowns with certain model-params.

<!-- gh-comment-id:3038437973 --> @2jfs904judsw20600jikn613d0dookl23jsig commented on GitHub (Jul 5, 2025): I can't add anything technical here other than that I've noticed this as well. Long GPU inference times with no erroring. Extreme slowdowns with certain model-params.
Author
Owner

@Stogas commented on GitHub (Jul 10, 2025):

@rick-github genuine question - should I keep the PR open without any reviewers, i.e. it'll get a look when someone has time? I'm not familiar with the review process in this repo

<!-- gh-comment-id:3057612245 --> @Stogas commented on GitHub (Jul 10, 2025): @rick-github genuine question - should I keep the PR open without any reviewers, i.e. it'll get a look when someone has time? I'm not familiar with the review process in this repo
Author
Owner

@rick-github commented on GitHub (Jul 10, 2025):

It will get looked at eventually, but it can take some time - I've had PRs that have taken months to get an initial review. The ollama team prioritize code-touching PRs from the core team, so PRs from external contributors can languish.

<!-- gh-comment-id:3057708481 --> @rick-github commented on GitHub (Jul 10, 2025): It will get looked at eventually, but it can take some time - I've had PRs that have taken months to get an initial review. The ollama team prioritize code-touching PRs from the core team, so PRs from external contributors can languish.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#53966