[GH-ISSUE #14421] qwen3.5:35b Looping Issue #55874

Open
opened 2026-04-29 09:51:00 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @PetrRichter on GitHub (Feb 25, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14421

What is the issue?

qwen3.5:35b gets stuck in endless loops.
Ollama Version 0.17.1-rc1

Relevant log output

No insightful logs just multiple GET and POST even though it was just one chat input (looping by itself)

OS

Windows

GPU

AMD

CPU

AMD

Ollama version

0.17.1-rc1

Originally created by @PetrRichter on GitHub (Feb 25, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14421 ### What is the issue? qwen3.5:35b gets stuck in endless loops. Ollama Version 0.17.1-rc1 ### Relevant log output ```shell No insightful logs just multiple GET and POST even though it was just one chat input (looping by itself) ``` ### OS Windows ### GPU AMD ### CPU AMD ### Ollama version 0.17.1-rc1
GiteaMirror added the bug label 2026-04-29 09:51:00 -05:00
Author
Owner

@xmddmx commented on GitHub (Feb 26, 2026):

Similar for me.

This prompt (on macOS Ollama .17.1-rc1, M4 Pro MacBook with 48GB RAM, ollama context set at 32K ) ends up looping,

"Puzzle game: there are 16 words, grouped into 4 categories, which can include semantic, phonetic, or more tricky (such as adding or removing a letter or word). The 4 categories vary in difficulty, from obvious to obscure. Please tell me the 4 words in each category, and define the category, and list the categories in order from easiest to hardest. Here are the 16 words: calliope superiority ringmaster atlas oedipus buzzard echo electra trace dialect inferiority dictionary thesaurus reminder encyclopedia vestige"

(this is a NYT Connections puzzle from recently)

It thinks for about 615 seconds, then gets stuck.

The final loop is:

Is it "Words that contain the word 'Ring'"?

Is it "Words that contain the word 'Ring'"?

Is it "Words that contain the word 'Ring'"?

Is it "Words that contain the word 'Ring'"?

Is it "Words that contain the word 'Ring'"?

[etc.]

<!-- gh-comment-id:3963084269 --> @xmddmx commented on GitHub (Feb 26, 2026): Similar for me. This prompt (on macOS Ollama .17.1-rc1, M4 Pro MacBook with 48GB RAM, ollama context set at 32K ) ends up looping, "Puzzle game: there are 16 words, grouped into 4 categories, which can include semantic, phonetic, or more tricky (such as adding or removing a letter or word). The 4 categories vary in difficulty, from obvious to obscure. Please tell me the 4 words in each category, and define the category, and list the categories in order from easiest to hardest. Here are the 16 words: calliope superiority ringmaster atlas oedipus buzzard echo electra trace dialect inferiority dictionary thesaurus reminder encyclopedia vestige" (this is a NYT Connections puzzle from recently) It thinks for about 615 seconds, then gets stuck. The final loop is: Is it "Words that contain the word 'Ring'"? Is it "Words that contain the word 'Ring'"? Is it "Words that contain the word 'Ring'"? Is it "Words that contain the word 'Ring'"? Is it "Words that contain the word 'Ring'"? [etc.]
Author
Owner

@xmddmx commented on GitHub (Feb 27, 2026):

FYI, I updated to 0.17.2 and I'm no longer seeing this looping behavior with the latest qwen3.5:35b

<!-- gh-comment-id:3969998256 --> @xmddmx commented on GitHub (Feb 27, 2026): FYI, I updated to 0.17.2 and I'm no longer seeing this looping behavior with the latest qwen3.5:35b
Author
Owner

@without-ordinary commented on GitHub (Feb 27, 2026):

I'm running 0.17.4 and qwen3.5:122b-a10b, when I don't get https://github.com/ollama/ollama/issues/14444 while loading, I'll get this looping issue on the first response when using ollama cli chat but not when using openwebui.

Example:

>>> This is a test, reply "hello" 
Thinking...
Thinking Process:

1.  **Analyze the Request:**
    *   Input: "This is a test, reply "hello""
    *   Intent: The user is testing the system's response capability.
    *   Constraint: The user explicitly asks me to reply "hello".

2.  **Determine the appropriate response:**
    *   The instruction is clear and simple.
    *   I need to output the word "hello" as requested.
    *   I should avoid adding unnecessary fluff unless it helps clarify, but sticking to the instruction is usually best for tests.

3.  **Draft the response:**
    *   Option 1: hello
    *   Option 2: Hello
    *   Option 3: "hello"
    *   Selection: Option 2 (Hello) is the most natural capitalization, but the user wrote "hello" in lowercase. However, standard English capitalization is usually preferred unless specified otherwise. Let's look at 
the instruction: reply "hello". Usually, this means the content should be hello. I will reply with "Hello" to be polite and standard, or "hello" to be exact. Given it's a test, exactness might be appreciated, but 
"Hello" is safer for general interaction. Actually, looking at common practices, just writing the word is key. I'll write "hello" to match the prompt exactly, or "Hello" for correctness. Let's go with "Hello" as it's 
the standard greeting. Wait, if I want to be precise to the instruction "reply 'hello'", I should output hello. But often users accept "Hello". I will output "hello" to be strictly compliant.
    *   Re-evaluation: The prompt says reply "hello". I will output exactly that.

4.  **Final Decision:**
    *   Output: hello
    *   Wait, adding a little context might be nice but the instruction is a test. Keep it simple.
    *   Actually, capitalizing the first letter is standard for a sentence. "Hello" looks better. But if it's a string match test... Let's just say "hello".
    *   Okay, I'll just say "hello".
    *   Wait, I am an AI assistant. I should be helpful. A simple "hello" is best.
    *   Let's check the instruction again: `reply "hello"`.
    *   I will output: hello
    *   Actually, I'll capitalize it because it's the start of a message. "Hello". No, let's stick to the prompt's casing to show I followed instructions precisely. "hello".
    *   Decision: "hello" (lowercase) might look like I'm being lazy. "Hello" is standard. I'll use "Hello".
    *   Wait, no. If a user says "say hello", you say "Hello". If they say "reply 'hello'", they might be testing string matching. I'll output "hello".
    *   Actually, most likely they just want to see if I respond. I'll output "Hello". It's more natural.
    *   Okay, final decision: "Hello". (Capitalized for correctness).
    *   Wait, looking at the prompt again: `reply "hello"`. I will output exactly `hello`.
    *   Okay, I'll just output the word.
    *   Wait, I should not overthink. Just reply.
    *   Response: hello

5.  **Final Polish:**
    *   Keep it concise.
    *   "hello"
    *   (Self-Correction): Since I'm an AI, standard greeting is "Hello". I'll use "Hello".
    *   Wait, if I'm being tested, maybe they want exact string. I'll use "hello".
    *   Okay, I'll go with "hello".
    *   Actually, I'll add a period? No, the prompt didn't ask for one.
    *   Okay, just "hello".
    *   Wait, I should check if there are any safety guidelines. No.
    *   Okay, proceeding.
    *   Wait, one more thought. If I write "Hello", it's still replying "hello". I'll write "Hello".
    *   Okay, I'll write "Hello".
    *   Wait, the prompt says reply "hello". I will write hello.
    *   Okay, final final decision: hello.
    *   Wait, I'll capitalize it. It's a start of a sentence. Hello.
    *   Okay, I'm overthinking. Just output the text.
    *   "hello"
    *   Okay, I'll output: hello
    *   Wait, I'll output: Hello
    *   Okay, I'll output: hello
    *   (Decided): I will output "hello" exactly as requested in quotes to show compliance.
    *   Wait, no, I should not include the quotes unless asked to repeat the quotes. Just the word.
    *   Okay: hello
    *   Wait, I'll capitalize. Hello.
    *   Okay, I'll stop wavering. I'll output "Hello".
    *   Wait, the prompt is `reply "hello"`.
    *   Okay, I'll output: hello
    *   Wait, I'll output: Hello
    *   Okay, I'll output: hello
    *   (Final Final Decision): I will output "hello". It matches the prompt exactly.
    *   Wait, I'll output "Hello". It's more professional.
    *   Okay, I'll output "hello".
    *   Wait, I'll output "Hello".
    *   Okay, I'll output "hello".
    *   (Stop loop). I'll output "hello".
    *   Wait, I'll output "Hello".
    *   Okay, I'll output "hello".
    *   (Reasoning): The user asked to reply "hello". I will do exactly that.
    *   Output: hello
    *   Wait, I'll capitalize. Hello.
    *   Okay, I'll output: Hello
    *   (Final Decision): Hello.
    *   Wait, I'll just output hello.
    *   Okay, I'll output: hello
    *   (Wait, I need to stop this loop). I will output: hello
    *   (Wait, I'll output: Hello)
    *   (Okay, I'll output: hello)
    *   (Okay, I'll output: Hello)
    *   (Okay, I'll output: hello)
    *   Okay, I'll output: Hello. It's a greeting.
    *   Wait, the instruction is specific. "hello".
    *   Okay, I'll output: hello
    *   (End of thought process)
    *   Wait, I'll just output hello.
    *   Okay.
    *   Wait, I'll output: hello
    *   (Okay, final answer)
    *   hello
    *   Wait, I'll capitalize.
    *   Hello
    *   Okay, I'll output: hello
    *   (Stop)
    *   Okay, I'll output: hello
    *   (Wait, I'll output: Hello)
    *   (Okay, I'll output: hello)
    *   (Okay, I'll output: Hello)
    *   (Okay, I'll output: hello)
    *   (Okay, I'll output: Hello)
    *   (Okay, I'll output: hello)
[...]

It repeats the last output, switching between h and H each time. (I've removed the excess blank lines from the output to save space.)

<!-- gh-comment-id:3972510617 --> @without-ordinary commented on GitHub (Feb 27, 2026): I'm running `0.17.4` and `qwen3.5:122b-a10b`, when I don't get https://github.com/ollama/ollama/issues/14444 while loading, I'll get this looping issue on the first response when using ollama cli chat but not when using openwebui. Example: ``` >>> This is a test, reply "hello" Thinking... Thinking Process: 1. **Analyze the Request:** * Input: "This is a test, reply "hello"" * Intent: The user is testing the system's response capability. * Constraint: The user explicitly asks me to reply "hello". 2. **Determine the appropriate response:** * The instruction is clear and simple. * I need to output the word "hello" as requested. * I should avoid adding unnecessary fluff unless it helps clarify, but sticking to the instruction is usually best for tests. 3. **Draft the response:** * Option 1: hello * Option 2: Hello * Option 3: "hello" * Selection: Option 2 (Hello) is the most natural capitalization, but the user wrote "hello" in lowercase. However, standard English capitalization is usually preferred unless specified otherwise. Let's look at the instruction: reply "hello". Usually, this means the content should be hello. I will reply with "Hello" to be polite and standard, or "hello" to be exact. Given it's a test, exactness might be appreciated, but "Hello" is safer for general interaction. Actually, looking at common practices, just writing the word is key. I'll write "hello" to match the prompt exactly, or "Hello" for correctness. Let's go with "Hello" as it's the standard greeting. Wait, if I want to be precise to the instruction "reply 'hello'", I should output hello. But often users accept "Hello". I will output "hello" to be strictly compliant. * Re-evaluation: The prompt says reply "hello". I will output exactly that. 4. **Final Decision:** * Output: hello * Wait, adding a little context might be nice but the instruction is a test. Keep it simple. * Actually, capitalizing the first letter is standard for a sentence. "Hello" looks better. But if it's a string match test... Let's just say "hello". * Okay, I'll just say "hello". * Wait, I am an AI assistant. I should be helpful. A simple "hello" is best. * Let's check the instruction again: `reply "hello"`. * I will output: hello * Actually, I'll capitalize it because it's the start of a message. "Hello". No, let's stick to the prompt's casing to show I followed instructions precisely. "hello". * Decision: "hello" (lowercase) might look like I'm being lazy. "Hello" is standard. I'll use "Hello". * Wait, no. If a user says "say hello", you say "Hello". If they say "reply 'hello'", they might be testing string matching. I'll output "hello". * Actually, most likely they just want to see if I respond. I'll output "Hello". It's more natural. * Okay, final decision: "Hello". (Capitalized for correctness). * Wait, looking at the prompt again: `reply "hello"`. I will output exactly `hello`. * Okay, I'll just output the word. * Wait, I should not overthink. Just reply. * Response: hello 5. **Final Polish:** * Keep it concise. * "hello" * (Self-Correction): Since I'm an AI, standard greeting is "Hello". I'll use "Hello". * Wait, if I'm being tested, maybe they want exact string. I'll use "hello". * Okay, I'll go with "hello". * Actually, I'll add a period? No, the prompt didn't ask for one. * Okay, just "hello". * Wait, I should check if there are any safety guidelines. No. * Okay, proceeding. * Wait, one more thought. If I write "Hello", it's still replying "hello". I'll write "Hello". * Okay, I'll write "Hello". * Wait, the prompt says reply "hello". I will write hello. * Okay, final final decision: hello. * Wait, I'll capitalize it. It's a start of a sentence. Hello. * Okay, I'm overthinking. Just output the text. * "hello" * Okay, I'll output: hello * Wait, I'll output: Hello * Okay, I'll output: hello * (Decided): I will output "hello" exactly as requested in quotes to show compliance. * Wait, no, I should not include the quotes unless asked to repeat the quotes. Just the word. * Okay: hello * Wait, I'll capitalize. Hello. * Okay, I'll stop wavering. I'll output "Hello". * Wait, the prompt is `reply "hello"`. * Okay, I'll output: hello * Wait, I'll output: Hello * Okay, I'll output: hello * (Final Final Decision): I will output "hello". It matches the prompt exactly. * Wait, I'll output "Hello". It's more professional. * Okay, I'll output "hello". * Wait, I'll output "Hello". * Okay, I'll output "hello". * (Stop loop). I'll output "hello". * Wait, I'll output "Hello". * Okay, I'll output "hello". * (Reasoning): The user asked to reply "hello". I will do exactly that. * Output: hello * Wait, I'll capitalize. Hello. * Okay, I'll output: Hello * (Final Decision): Hello. * Wait, I'll just output hello. * Okay, I'll output: hello * (Wait, I need to stop this loop). I will output: hello * (Wait, I'll output: Hello) * (Okay, I'll output: hello) * (Okay, I'll output: Hello) * (Okay, I'll output: hello) * Okay, I'll output: Hello. It's a greeting. * Wait, the instruction is specific. "hello". * Okay, I'll output: hello * (End of thought process) * Wait, I'll just output hello. * Okay. * Wait, I'll output: hello * (Okay, final answer) * hello * Wait, I'll capitalize. * Hello * Okay, I'll output: hello * (Stop) * Okay, I'll output: hello * (Wait, I'll output: Hello) * (Okay, I'll output: hello) * (Okay, I'll output: Hello) * (Okay, I'll output: hello) * (Okay, I'll output: Hello) * (Okay, I'll output: hello) [...] ``` It repeats the last output, switching between `h` and `H` each time. (I've removed the excess blank lines from the output to save space.)
Author
Owner

@sqblg commented on GitHub (Feb 27, 2026):

This looping behavior is a classic 'reasoning cap' issue where smaller local models get lost in complex or repetitive instructions.

I've been solving this with a hybrid approach using ClawRouter (https://github.com/BlockRunAI/ClawRouter). It sits in front of Ollama and scores each prompt across 15 dimensions. If a task looks like it might trigger a loop or requires higher 'IQ', it can automatically route that specific call to a cloud flagship (like Claude 3.5), while keeping everything else local and cheap. It’s been a life-saver for keeping my Qwen/Llama agent workflows stable without blowing my entire API budget.

<!-- gh-comment-id:3974417027 --> @sqblg commented on GitHub (Feb 27, 2026): This looping behavior is a classic 'reasoning cap' issue where smaller local models get lost in complex or repetitive instructions. I've been solving this with a hybrid approach using **ClawRouter** (https://github.com/BlockRunAI/ClawRouter). It sits in front of Ollama and scores each prompt across 15 dimensions. If a task looks like it might trigger a loop or requires higher 'IQ', it can automatically route that specific call to a cloud flagship (like Claude 3.5), while keeping everything else local and cheap. It’s been a life-saver for keeping my Qwen/Llama agent workflows stable without blowing my entire API budget.
Author
Owner

@rick-github commented on GitHub (Feb 27, 2026):

Server logs will aid in debugging.

<!-- gh-comment-id:3974447286 --> @rick-github commented on GitHub (Feb 27, 2026): [Server logs](https://docs.ollama.com/troubleshooting) will aid in debugging.
Author
Owner

@xmddmx commented on GitHub (Feb 27, 2026):

Could this be "Bug 1" described here? https://github.com/ollama/ollama/issues/14493

The Go runner's sampler has zero implementation of penalty sampling. repeat_penalty, presence_penalty, and frequency_penalty are accepted by the API without error and silently discarded. The model card explicitly recommends presence_penalty=1.5 to prevent repetition loops during thinking

<!-- gh-comment-id:3974641919 --> @xmddmx commented on GitHub (Feb 27, 2026): Could this be "Bug 1" described here? https://github.com/ollama/ollama/issues/14493 > The Go runner's sampler has zero implementation of penalty sampling. repeat_penalty, presence_penalty, and frequency_penalty are accepted by the API without error and silently discarded. The model card explicitly recommends presence_penalty=1.5 to prevent repetition loops during thinking
Author
Owner

@javajuice1337 commented on GitHub (Mar 17, 2026):

Please fix this... qwen models have been looping in ollama forever... qwen 3.5 models are doing it too. Thank you.

<!-- gh-comment-id:4071474141 --> @javajuice1337 commented on GitHub (Mar 17, 2026): Please fix this... qwen models have been looping in ollama forever... qwen 3.5 models are doing it too. Thank you.
Author
Owner

@rick-github commented on GitHub (Mar 17, 2026):

Server logs will aid in debugging.

<!-- gh-comment-id:4074962666 --> @rick-github commented on GitHub (Mar 17, 2026): [Server logs](https://docs.ollama.com/troubleshooting) will aid in debugging.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#55874