[GH-ISSUE #13389] Ollama Very Slow and Buggy 0.13.2 #86528

Closed
opened 2026-05-10 03:35:27 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @eliciel0513 on GitHub (Dec 9, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13389

What is the issue?

With version 0.13.2, overall stability has noticeably degraded and the system feels very buggy.

GPT-OSS 20B is experiencing severe issues such as repetitive looped thinking, frequent crashes, and unresponsive behavior.

qwen3-next:80b-a3b-thinking-q4_K_M is extremely slow and clearly not performing at the level expected when compared to larger models like GPT-OSS 120B. In practical terms, QWEN-3 Next takes roughly 5 minutes of thinking time at ~4 tokens/sec, whereas GPT-OSS 120B completes similar reasoning in about 1 minute at ~7 tokens/sec. This may be due to architectural differences, but the performance gap is significant.

Additionally, GPT-OSS 120B, which previously ran stably at 9–10 tokens/sec, is now consistently operating in the low 7 tokens/sec range under version 0.13.2.

Relevant log output


OS

Windows 11

GPU

RTX 3090

CPU

i9 14900k

Ollama version

0.13.2

Originally created by @eliciel0513 on GitHub (Dec 9, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13389 ### What is the issue? With version 0.13.2, overall stability has noticeably degraded and the system feels very buggy. GPT-OSS 20B is experiencing severe issues such as repetitive looped thinking, frequent crashes, and unresponsive behavior. qwen3-next:80b-a3b-thinking-q4_K_M is extremely slow and clearly not performing at the level expected when compared to larger models like GPT-OSS 120B. In practical terms, QWEN-3 Next takes roughly 5 minutes of thinking time at ~4 tokens/sec, whereas GPT-OSS 120B completes similar reasoning in about 1 minute at ~7 tokens/sec. This may be due to architectural differences, but the performance gap is significant. Additionally, GPT-OSS 120B, which previously ran stably at 9–10 tokens/sec, is now consistently operating in the low 7 tokens/sec range under version 0.13.2. ### Relevant log output ```shell ``` ### OS Windows 11 ### GPU RTX 3090 ### CPU i9 14900k ### Ollama version 0.13.2
GiteaMirror added the performancebugneeds more info labels 2026-05-10 03:35:28 -05:00
Author
Owner

@rick-github commented on GitHub (Dec 9, 2025):

Post the server log of a crash.

<!-- gh-comment-id:3632389101 --> @rick-github commented on GitHub (Dec 9, 2025): Post the [server log](https://docs.ollama.com/troubleshooting) of a crash.
Author
Owner

@mchiang0610 commented on GitHub (Dec 11, 2025):

@eliciel0513 I wanted to follow up on this. Is it possible to ask for more information on the server logs as @rick-github mentioned? It would help us in troubleshooting this. Thank you!

<!-- gh-comment-id:3643919328 --> @mchiang0610 commented on GitHub (Dec 11, 2025): @eliciel0513 I wanted to follow up on this. Is it possible to ask for more information on the server logs as @rick-github mentioned? It would help us in troubleshooting this. Thank you!
Author
Owner

@rick-github commented on GitHub (Jan 14, 2026):

Closing as stale, please re-open with server logs if the problem persists.

<!-- gh-comment-id:3749144935 --> @rick-github commented on GitHub (Jan 14, 2026): Closing as stale, please re-open with server logs if the problem persists.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#86528