[GH-ISSUE #14816] anthropic/anthropic.go:Process() uses wrong index for tool call content block #35325

Closed
opened 2026-04-22 19:45:56 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @egtung on GitHub (Mar 13, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14816

What is the issue?

Environment

  • Ollama 0.17.7
  • Claude Code 2.1.70
  • OS/hardware: Ubuntu 24.04, AMD Ryzen 5, GeForce RTX 3060

If it matters, I'm running everything locally, ethernet cable unplugged.

$ ollama ps
NAME                    ID              SIZE     PROCESSOR          CONTEXT    UNTIL               
glm-4.7-flash:latest    d1a8a26252f1    21 GB    46%/54% CPU/GPU    65536      59 minutes from now

Reproduction steps

  1. Run Claude Code via Ollama: ollama launch claude --model glm-4.7-flash
  2. At the prompt, enter something like Use the AskUserQuestion tool to get my preference for dogs or cats. Don't say anything else, just use the tool to ask a question.

Observed behavior

When the request finishes, grep ERROR ~/.claude/debug/latest shows [ERROR] Error streaming, falling back to non-streaming mode: Content block not found

Expected behavior

When the request finishes, grep ERROR ~/.claude/debug/latest should not find anything.

Impact

At first, this error seems minor - Claude Code still displays the question, and even if there's a problem with the streaming API, Claude falls back to a non-streaming API for future requests. However, Claude's non-streaming API call seems to have a timeout of 5 minutes (also mentioned in https://github.com/ollama/ollama/issues/13949 ). My system is weak enough that I cannot run entirely on the GPU and even moderately complex requests frequently exceed that timeout. Claude then retries 10 times before giving up, so that's about an hour of the prompt silently spinning before ultimately failing to make progress. Unfortunately, I haven't found a simple reproduction case that quickly triggers the entire sequence of behaviors up through the effective hour timeout, though I can hit it fairly reliably.

Without being able to look at the source for Claude Code, I'm not 100% sure there's a causal relationship between the content block being missing and the 5 minute timeout on later requests (e.g. perhaps something else is responsible, or perhaps streaming mode also has a timeout which would eventually trigger this path), but after playing around for a bit it sure does seem correlated. Fixing this bug might make the difference between my being able to only run very trivial, uninteresting prompts vs. being able to actually try things out, even if it's slow.

Problematic error sequence

From ~/.claude/debug/latest

[ERROR] Error streaming, falling back to non-streaming mode: Content block not found    # The error triggered in this bug report
[ERROR] Error streaming, falling back to non-streaming mode: The operation timed out.   # The result of a later, more complex query. I don't have a simple repro case for this part; the above repro steps do not trigger this.
[ERROR] API error (attempt 1/11): undefined Request timed out.                          # Uh oh, I'm about to have a bad time.

Packet traces

Before prompting Claude, I started a packet trace with tcpdump -vv -i lo -w ollama.pcap 'port 11434' and ended it after getting the response.

Unhappy case

The prompt in the reproduction steps produces a (lightly cleaned) packet trace like:

event: message_start, data: {"type":"message_start","message":{"id":"msg_9770d19a455133436742b66b","type":"message","role":"assistant","model":"glm-4.7-flash","content":[],"usage":{"input_tokens":18205,"output_tokens":0}}}
event: content_block_start, data: {"type":"content_block_start","index":0,"content_block":{"type":"thinking","thinking":""}}
event: content_block_delta, data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":"The"}}
event: content_block_delta, data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":" user"}}
event: content_block_delta, data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":" wants"}}
<snip>
event: content_block_delta, data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":" with"}}
event: content_block_delta, data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":" no"}}
event: content_block_delta, data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":" introductory"}}
event: content_block_delta, data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":" text"}}
event: content_block_delta, data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":"."}}
event: content_block_start, data: {"type":"content_block_start","index":0,"content_block":{"type":"tool_use","id":"call_ct45givg","name":"AskUserQuestion","input":{}}}	               # Reusing index 0
event: content_block_delta, data: {"type":"content_block_delta","index":0,"delta":{"type":"input_json_delta","partial_json":"{\"questions\":[{\"header\":\"Pet preference\",\"multiSelect\":false,\"options\":[{\"description\":\"I prefer dogs\",\"label\":\"Dogs\"},{\"description\":\"I prefer cats\",\"label\":\"Cats\"}],\"question\":\"Do you prefer dogs or cats?\"}]}"}} 	               # Reusing index 0
event: content_block_stop, data: {"type":"content_block_stop","index":0} 
event: content_block_stop, data: {"type":"content_block_stop","index":1}                    # There was never a content_block_start for index 1
event: message_delta, data: {"type":"message_delta","delta":{"stop_reason":"tool_use"},"usage":{"input_tokens":16893,"output_tokens":126}}
event: message_stop, data: {"type":"message_stop"}

Happy case

For comparison, the prompt Come up with a list of 4 unusual animals and use the AskUserQuestion tool to get my preferences looks like

event: message_start\x0adata: {"type":"message_start","message":{"id":"msg_11d319cf7d2f2b85e96ad2ff","type":"message","role":"assistant","model":"glm-4.7-flash","content":[],"usage":{"input_tokens":18196,"output_tokens":0}}}
event: content_block_start\x0adata: {"type":"content_block_start","index":0,"content_block":{"type":"thinking","thinking":""}}
event: content_block_delta\x0adata: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":"The"}}
event: content_block_delta\x0adata: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":" user"}}
<snip>
event: content_block_delta\x0adata: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":"."}}
event: content_block_stop\x0adata: {"type":"content_block_stop","index":0}
event: content_block_start\x0adata: {"type":"content_block_start","index":1,"content_block":{"type":"text","text":""}}
event: content_block_delta\x0adata: {"type":"content_block_delta","index":1,"delta":{"type":"text_delta","text":"I"}}
event: content_block_delta\x0adata: {"type":"content_block_delta","index":1,"delta":{"type":"text_delta","text":"'ll"}}
<snip>
event: content_block_delta\x0adata: {"type":"content_block_delta","index":1,"delta":{"type":"text_delta","text":":"}}
event: content_block_stop\x0adata: {"type":"content_block_stop","index":1}
event: content_block_start\x0adata: {"type":"content_block_start","index":2,"content_block":{"type":"tool_use","id":"call_hn6rzpqt","name":"AskUserQuestion","input":{}}}
event: content_block_delta\x0adata: {"type":"content_block_delta","index":2,"delta":{"type":"input_json_delta","partial_json":"{\"questions\":[{\"header\":\"Favorite\",\"multiSelect\":false,\"options\":[{\"description\":\"A Mexican salamander famous for its ability to regenerate body parts and its smiling face\",\"label\":\"Axolotl\"},{\"description\":\"An endangered scaly mammal that rolls into a ball for protection\",\"label\":\"Pangolin\"},{\"description\":\"A lemur with huge eyes and a specially adapted middle finger to catch insects\",\"label\":\"Aye-Aye\"},{\"description\":\"A forest-dwelling creature with a zebra-like body but a giraffe's neck\",\"label\":\"Okapi\"}],\"question\":\"Which of these unusual animals appeals to you most?\"}]}"}}
event: content_block_stop\x0adata: {"type":"content_block_stop","index":2}
event: message_delta\x0adata: {"type":"message_delta","delta":{"stop_reason":"tool_use"},"usage":{"input_tokens":16887,"output_tokens":354}}
event: message_stop\x0adata: {"type":"message_stop"}

and the indexes make sense.

Root Cause

anthropic/anthropic.go:Process keeps a state machine of the ongoing stream. When going from thinking to text content, or text content to tool usage content, it closes out the previous content block, increments contentIndex, and starts a block for the new content. However, if there is thinking content and tool usage but no text content in between, the code doesn't notice that a content block was already in progress and reuses contentIndex without incrementing it.

Suggested fix: Do something similar to f676231de9/anthropic/anthropic.go (L810-L820) at f676231de9/anthropic/anthropic.go (L854) .

Relevant log output


OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

0.17.7

Originally created by @egtung on GitHub (Mar 13, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14816 ### What is the issue? # Environment * Ollama 0.17.7 * Claude Code 2.1.70 * OS/hardware: Ubuntu 24.04, AMD Ryzen 5, GeForce RTX 3060 If it matters, I'm running everything locally, ethernet cable unplugged. ``` $ ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL glm-4.7-flash:latest d1a8a26252f1 21 GB 46%/54% CPU/GPU 65536 59 minutes from now ``` # Reproduction steps 1. Run Claude Code via Ollama: `ollama launch claude --model glm-4.7-flash` 2. At the prompt, enter something like `Use the AskUserQuestion tool to get my preference for dogs or cats. Don't say anything else, just use the tool to ask a question.` ## Observed behavior When the request finishes, `grep ERROR ~/.claude/debug/latest` shows `[ERROR] Error streaming, falling back to non-streaming mode: Content block not found` ## Expected behavior When the request finishes, `grep ERROR ~/.claude/debug/latest` should not find anything. # Impact At first, this error seems minor - Claude Code still displays the question, and even if there's a problem with the streaming API, Claude falls back to a non-streaming API for future requests. However, Claude's non-streaming API call seems to have a timeout of 5 minutes (also mentioned in https://github.com/ollama/ollama/issues/13949 ). My system is weak enough that I cannot run entirely on the GPU and even moderately complex requests frequently exceed that timeout. Claude then retries 10 times before giving up, so that's about an hour of the prompt silently spinning before ultimately failing to make progress. Unfortunately, I haven't found a simple reproduction case that quickly triggers the entire sequence of behaviors up through the effective hour timeout, though I can hit it fairly reliably. Without being able to look at the source for Claude Code, I'm not 100% sure there's a causal relationship between the content block being missing and the 5 minute timeout on later requests (e.g. perhaps something else is responsible, or perhaps streaming mode also has a timeout which would eventually trigger this path), but after playing around for a bit it sure does seem correlated. Fixing this bug might make the difference between my being able to only run very trivial, uninteresting prompts vs. being able to actually try things out, even if it's slow. ## Problematic error sequence From ~/.claude/debug/latest ``` [ERROR] Error streaming, falling back to non-streaming mode: Content block not found # The error triggered in this bug report [ERROR] Error streaming, falling back to non-streaming mode: The operation timed out. # The result of a later, more complex query. I don't have a simple repro case for this part; the above repro steps do not trigger this. [ERROR] API error (attempt 1/11): undefined Request timed out. # Uh oh, I'm about to have a bad time. ``` # Packet traces Before prompting Claude, I started a packet trace with `tcpdump -vv -i lo -w ollama.pcap 'port 11434'` and ended it after getting the response. ## Unhappy case The prompt in the reproduction steps produces a (lightly cleaned) packet trace like: ``` event: message_start, data: {"type":"message_start","message":{"id":"msg_9770d19a455133436742b66b","type":"message","role":"assistant","model":"glm-4.7-flash","content":[],"usage":{"input_tokens":18205,"output_tokens":0}}} event: content_block_start, data: {"type":"content_block_start","index":0,"content_block":{"type":"thinking","thinking":""}} event: content_block_delta, data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":"The"}} event: content_block_delta, data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":" user"}} event: content_block_delta, data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":" wants"}} <snip> event: content_block_delta, data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":" with"}} event: content_block_delta, data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":" no"}} event: content_block_delta, data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":" introductory"}} event: content_block_delta, data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":" text"}} event: content_block_delta, data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":"."}} event: content_block_start, data: {"type":"content_block_start","index":0,"content_block":{"type":"tool_use","id":"call_ct45givg","name":"AskUserQuestion","input":{}}} # Reusing index 0 event: content_block_delta, data: {"type":"content_block_delta","index":0,"delta":{"type":"input_json_delta","partial_json":"{\"questions\":[{\"header\":\"Pet preference\",\"multiSelect\":false,\"options\":[{\"description\":\"I prefer dogs\",\"label\":\"Dogs\"},{\"description\":\"I prefer cats\",\"label\":\"Cats\"}],\"question\":\"Do you prefer dogs or cats?\"}]}"}} # Reusing index 0 event: content_block_stop, data: {"type":"content_block_stop","index":0} event: content_block_stop, data: {"type":"content_block_stop","index":1} # There was never a content_block_start for index 1 event: message_delta, data: {"type":"message_delta","delta":{"stop_reason":"tool_use"},"usage":{"input_tokens":16893,"output_tokens":126}} event: message_stop, data: {"type":"message_stop"} ``` ## Happy case For comparison, the prompt `Come up with a list of 4 unusual animals and use the AskUserQuestion tool to get my preferences` looks like ``` event: message_start\x0adata: {"type":"message_start","message":{"id":"msg_11d319cf7d2f2b85e96ad2ff","type":"message","role":"assistant","model":"glm-4.7-flash","content":[],"usage":{"input_tokens":18196,"output_tokens":0}}} event: content_block_start\x0adata: {"type":"content_block_start","index":0,"content_block":{"type":"thinking","thinking":""}} event: content_block_delta\x0adata: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":"The"}} event: content_block_delta\x0adata: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":" user"}} <snip> event: content_block_delta\x0adata: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":"."}} event: content_block_stop\x0adata: {"type":"content_block_stop","index":0} event: content_block_start\x0adata: {"type":"content_block_start","index":1,"content_block":{"type":"text","text":""}} event: content_block_delta\x0adata: {"type":"content_block_delta","index":1,"delta":{"type":"text_delta","text":"I"}} event: content_block_delta\x0adata: {"type":"content_block_delta","index":1,"delta":{"type":"text_delta","text":"'ll"}} <snip> event: content_block_delta\x0adata: {"type":"content_block_delta","index":1,"delta":{"type":"text_delta","text":":"}} event: content_block_stop\x0adata: {"type":"content_block_stop","index":1} event: content_block_start\x0adata: {"type":"content_block_start","index":2,"content_block":{"type":"tool_use","id":"call_hn6rzpqt","name":"AskUserQuestion","input":{}}} event: content_block_delta\x0adata: {"type":"content_block_delta","index":2,"delta":{"type":"input_json_delta","partial_json":"{\"questions\":[{\"header\":\"Favorite\",\"multiSelect\":false,\"options\":[{\"description\":\"A Mexican salamander famous for its ability to regenerate body parts and its smiling face\",\"label\":\"Axolotl\"},{\"description\":\"An endangered scaly mammal that rolls into a ball for protection\",\"label\":\"Pangolin\"},{\"description\":\"A lemur with huge eyes and a specially adapted middle finger to catch insects\",\"label\":\"Aye-Aye\"},{\"description\":\"A forest-dwelling creature with a zebra-like body but a giraffe's neck\",\"label\":\"Okapi\"}],\"question\":\"Which of these unusual animals appeals to you most?\"}]}"}} event: content_block_stop\x0adata: {"type":"content_block_stop","index":2} event: message_delta\x0adata: {"type":"message_delta","delta":{"stop_reason":"tool_use"},"usage":{"input_tokens":16887,"output_tokens":354}} event: message_stop\x0adata: {"type":"message_stop"} ``` and the indexes make sense. # Root Cause anthropic/anthropic.go:Process keeps a state machine of the ongoing stream. When going from thinking to text content, or text content to tool usage content, it closes out the previous content block, increments contentIndex, and starts a block for the new content. However, if there is thinking content **and** tool usage **but** no text content in between, the code doesn't notice that a content block was already in progress and reuses contentIndex without incrementing it. Suggested fix: Do something similar to https://github.com/ollama/ollama/blob/f676231de9a54b158d83bff893fb20dbc7472ad2/anthropic/anthropic.go#L810-L820 at https://github.com/ollama/ollama/blob/f676231de9a54b158d83bff893fb20dbc7472ad2/anthropic/anthropic.go#L854 . ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU AMD ### Ollama version 0.17.7
GiteaMirror added the bug label 2026-04-22 19:45:56 -05:00
Author
Owner

@ParthSareen commented on GitHub (Mar 13, 2026):

Great find @egtung and thank you for digging into this! Will be fixed in the upcoming release thanks to @shivamtiwari3.

<!-- gh-comment-id:4057726427 --> @ParthSareen commented on GitHub (Mar 13, 2026): Great find @egtung and thank you for digging into this! Will be fixed in the upcoming release thanks to @shivamtiwari3.
Author
Owner

@egtung commented on GitHub (Mar 14, 2026):

Great, thanks! I figured out how to build Ollama and tested the fix - it helps, and I can now run the task that prompted my initial investigation.

I've now run into another edge case that might be related ([ERROR] Stream completed with message_start but no content blocks completed - triggering non-streaming fallback followed by [ERROR] Error streaming, falling back to non-streaming mode: Stream ended without receiving any events). If I figure out a good repro case or diagnosis, I'll file another issue.

<!-- gh-comment-id:4059528779 --> @egtung commented on GitHub (Mar 14, 2026): Great, thanks! I figured out how to build Ollama and tested the fix - it helps, and I can now run the task that prompted my initial investigation. I've now run into another edge case that might be related (`[ERROR] Stream completed with message_start but no content blocks completed - triggering non-streaming fallback` followed by `[ERROR] Error streaming, falling back to non-streaming mode: Stream ended without receiving any events`). If I figure out a good repro case or diagnosis, I'll file another issue.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#35325