[GH-ISSUE #9070] Stuck in generating #67963

Closed
opened 2026-05-04 12:07:54 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @ChaRoSaMa on GitHub (Feb 13, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9070

What is the issue?

Firstly, I'd like to appreciate all the people contributed to the awesome software. I've tried to search for the similar issue, I found one, but not very sure if it is a similar one, and I will mention it later. Here's my issue:

When I run Qwen2.5:1.5b to generate some content, it sometimes stuck in generating. Ollama is running in serving mode, I use python api to generate content. It occurred many times, and the time it took was abnormally long. And while stucking in generating, the GPU is at 100% usage.

I am not sure if it's a problem of the Qwen model or something.

What worth a mention is that in #8178 , he also used Qwen2.5 series model and also stuck in generating, though not sure the same reason.

Device

OS: Windows 11
GPU: Nvidia 4060 laptop

Relevant log output

[GIN] 2025/02/13 - 15:11:24 | 200 |    5.8646002s |       127.0.0.1 | POST     "/api/generate"
[GIN] 2025/02/13 - 22:23:25 | 500 |      7h10m32s |       127.0.0.1 | POST     "/api/generate"

OS

Windows

GPU

Nvidia

CPU

AMD

Ollama version

0.5.7

Originally created by @ChaRoSaMa on GitHub (Feb 13, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9070 ### What is the issue? Firstly, I'd like to appreciate all the people contributed to the awesome software. I've tried to search for the similar issue, I found one, but not very sure if it is a similar one, and I will mention it later. Here's my issue: When I run Qwen2.5:1.5b to generate some content, it sometimes stuck in generating. Ollama is running in serving mode, I use python api to generate content. It occurred many times, and the time it took was abnormally long. And while stucking in generating, the GPU is at 100% usage. I am not sure if it's a problem of the Qwen model or something. What worth a mention is that in #8178 , he also used Qwen2.5 series model and also stuck in generating, though not sure the same reason. ### Device OS: Windows 11 GPU: Nvidia 4060 laptop ### Relevant log output ```shell [GIN] 2025/02/13 - 15:11:24 | 200 | 5.8646002s | 127.0.0.1 | POST "/api/generate" [GIN] 2025/02/13 - 22:23:25 | 500 | 7h10m32s | 127.0.0.1 | POST "/api/generate" ``` ### OS Windows ### GPU Nvidia ### CPU AMD ### Ollama version 0.5.7
GiteaMirror added the bug label 2026-05-04 12:07:54 -05:00
Author
Owner

@rick-github commented on GitHub (Feb 13, 2025):

If you set OLLAMA_DEBUG=1 you will probably see a lot of shifting messages when this occurs. I think what's happened is that the model has lost coherence and is "rambling" - that is, just generating tokens without generating an end-of-sequence (EOS) token. This can happen sometimes, particularly with smaller models. It can be triggered by exceeding the size of the context buffer, but is not the only cause. You can have the model exit this state by limiting the number of tokens the model generates by setting num_predict in the API call or setting as a PARAMETER in the Modelfile.

<!-- gh-comment-id:2656951572 --> @rick-github commented on GitHub (Feb 13, 2025): If you set `OLLAMA_DEBUG=1` you will probably see a lot of `shifting` messages when this occurs. I think what's happened is that the model has lost coherence and is "rambling" - that is, just generating tokens without generating an end-of-sequence (EOS) token. This can happen sometimes, particularly with smaller models. It can be triggered by exceeding the size of the context buffer, but is not the only cause. You can have the model exit this state by limiting the number of tokens the model generates by setting [`num_predict`](https://github.com/ollama/ollama/blob/main/docs/modelfile.md#valid-parameters-and-values:~:text=stop%20%22AI%20assistant%3A%22-,num_predict,-Maximum%20number%20of) in the API call or setting as a PARAMETER in the Modelfile.
Author
Owner

@ChaRoSaMa commented on GitHub (Feb 14, 2025):

Thhhhhhhhanks

<!-- gh-comment-id:2658257524 --> @ChaRoSaMa commented on GitHub (Feb 14, 2025): Thhhhhhhhanks
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#67963