Ollama hangs in infinite loop during code update requests, requires service restart #8649

Open
opened 2025-11-12 14:48:14 -06:00 by GiteaMirror · 2 comments
Owner

Originally created by @A on GitHub (Nov 10, 2025).

What is the issue?

When requesting code updates through various client tools, Ollama enters an infinite loop state where:

  • GPU utilization remains active (visible in nvtop)
  • No output is produced to the client (in rare cases, it loops one block of text over and over)
  • The process continues even after canceling the task in the client
  • ollama stop command has no effect
  • Only systemctl restart ollama resolves the issue

Models tested (with different context size up to 32k):

  • maryasov/qwen2.5-coder-cline:32b
  • llama3.1:70b
  • codellama:34b-code

Clients tested:

  • avante.nvim
  • Cline for VSCode
  • Roocode

Description:


OS

Linux ubuntu 6.8.0-59-generic #61~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Apr 15 17:03:15 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

GPU

  • Nvidia RTX A4000
  • 2x RTX 3090 on different vast.ai servers

CPU

Not important

Ollama version

0.12.10

Originally created by @A on GitHub (Nov 10, 2025). ### What is the issue? When requesting code updates through various client tools, Ollama enters an infinite loop state where: - GPU utilization remains active (visible in nvtop) - No output is produced to the client (in rare cases, it loops one block of text over and over) - The process continues even after canceling the task in the client - ollama stop <model> command has no effect - Only systemctl restart ollama resolves the issue Models tested (with different context size up to 32k): - maryasov/qwen2.5-coder-cline:32b - llama3.1:70b - codellama:34b-code Clients tested: - avante.nvim - Cline for VSCode - Roocode Description: ```shell ``` ### OS Linux ubuntu 6.8.0-59-generic #61~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Apr 15 17:03:15 UTC 2 x86_64 x86_64 x86_64 GNU/Linux ### GPU - Nvidia RTX A4000 - 2x RTX 3090 on different vast.ai servers ### CPU Not important ### Ollama version 0.12.10
GiteaMirror added the bug label 2025-11-12 14:48:14 -06:00
Author
Owner

@rick-github commented on GitHub (Nov 11, 2025):

Ollama looping is usually due to the model losing track of what it was doing and generating a stream of tokens that doesn't include an end-of-sequence token. This can be triggered by overflowing the context buffer, causing either message truncation at the time of prompt processing, or causing a context shift during token generation. If you could add a log with OLLAMA_DEBUG=1 set that would help in debugging.

@rick-github commented on GitHub (Nov 11, 2025): Ollama looping is usually due to the model losing track of what it was doing and generating a stream of tokens that doesn't include an end-of-sequence token. This can be triggered by overflowing the context buffer, causing either message truncation at the time of prompt processing, or causing a context shift during token generation. If you could add a log with `OLLAMA_DEBUG=1` set that would help in debugging.
Author
Owner

@pdevine commented on GitHub (Nov 12, 2025):

@A are you hitting this when you've run through the context, or some other case?

@pdevine commented on GitHub (Nov 12, 2025): @A are you hitting this when you've run through the context, or some other case?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama-ollama#8649