[GH-ISSUE #14902] Feature Request: anthropic/anthropic.go:Process() should send ping events during tool call composition to avoid a streaming timeout #71658

Open
opened 2026-05-05 02:17:27 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @egtung on GitHub (Mar 17, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14902

This is a simpler approach that would also resolve the behavior described in https://github.com/ollama/ollama/issues/14858 . I'll redescribe the issue so you don't have to read an obsolete request.

Environment

  • Ollama 0.18
  • Claude Code 2.1.70
  • OS/hardware: Ubuntu 24.04, AMD Ryzen 5, GeForce RTX 3060

If it matters, I'm running everything locally, ethernet cable unplugged.

$ ollama ps
NAME                    ID              SIZE     PROCESSOR          CONTEXT    UNTIL               
glm-4.7-flash:latest    d1a8a26252f1    21 GB    46%/54% CPU/GPU    65536      59 minutes from now

Summary

While a model is composing a tool call, no responses are sent to the client, even in streaming mode. During that time, the client may time out, causing the request to repeatedly fail and be retried for (in my testing) about an hour.

At least for the Anthropic API, this can be avoided by sending ping events - this works for me on a local build.

Reproduction Steps

  1. Run Claude Code via Ollama: ollama launch claude --model glm-4.7-flash
  2. At the prompt, enter something like Recall a summary of each book of the Old Testament and write them to old_testament_summaries.md

Note: While this is sufficient to reproduce the behavior on my system, the underlying issue is a timeout; a more powerful computer might be able to successfully complete the task quickly enough that you don't see it. Setting CUDA_VISIBLE_DEVICES="" to force CPU inference or specifying a longer text should make the issue more obvious.

Desired behavior

old_testament_summaries.md should be created and contain text.

Current behavior

From the user's point of view, the command takes a long time and eventually starts showing messages about timeouts. grep ERROR ~/.claude/debug/latest shows [ERROR] Error streaming, falling back to non-streaming mode: The operation timed out. and things like [ERROR] API error (attempt 1/11): undefined Request timed out.. No file is created.

Observations

https://ollama.com/blog/streaming-tool initially made me think this was expected to work, but reading it more closely I see that it only talks about streaming tool responses, not composing the arguments, so I'm filing this as a feature request instead of as a bug.

I believe this is due to tools/tools.go:Add() gathering up responses before parsing them into a ToolCall. Certainly anthropic/anthropic.go:Process() is not set up to stream tool call arguments - it expects tool calls to be given in one shot. A packet trace supports this theory - I see Ollama talking to the model and coming up with text, while the Claude-Ollama connection has no traffic and Claude eventually gives up.

Proposed Changes

Disclaimer: While this suggestion fixes my use case, I'm not confident this won't break other things - someone more knowledgeable needs to verify this is a good approach.

server/routes.go:ChatHandler

bbbad97686/server/routes.go (L2443-L2448) only passes along events if the parser comes up with non-empty output. When tools/tools.go:Add() is gathering up responses, it generates empty output, and so during that time nothing makes it to anthropic/anthropic.go:Process()

Instead of doing nothing when there's empty output, other layers should still be notified so they can react if desired.

if req.Stream != nil && *req.Stream {
    ch <- res
}

anthropic/anthropic.go:Process()

anthropic.go defines a PingEvent, though it's never used anywhere. If Process sees an empty message, it could emit a PingEvent:

if r.Message.Thinking == "" && r.Message.Content == "" && !r.Done && len(r.Message.ToolCalls) == 0 {
    events = append(events, StreamEvent{
                    Event: "ping",
                    Data: PingEvent{
                              Type: "ping",
                    },
             })
}

Considerations

bbbad97686/server/routes.go (L610-L614)
suggests that quashing empty events was a conscious choice to avoid triggering bad client behavior. What was the bad behavior and is there another way to avoid it? Does a similar change need to happen here?

Originally created by @egtung on GitHub (Mar 17, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14902 This is a simpler approach that would also resolve the behavior described in https://github.com/ollama/ollama/issues/14858 . I'll redescribe the issue so you don't have to read an obsolete request. # Environment * Ollama 0.18 * Claude Code 2.1.70 * OS/hardware: Ubuntu 24.04, AMD Ryzen 5, GeForce RTX 3060 If it matters, I'm running everything locally, ethernet cable unplugged. ``` $ ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL glm-4.7-flash:latest d1a8a26252f1 21 GB 46%/54% CPU/GPU 65536 59 minutes from now ``` # Summary While a model is composing a tool call, no responses are sent to the client, even in streaming mode. During that time, the client may time out, causing the request to repeatedly fail and be retried for (in my testing) about an hour. At least for the Anthropic API, this can be avoided by sending [ping events](https://platform.claude.com/docs/en/build-with-claude/streaming) - this works for me on a local build. # Reproduction Steps 1. Run Claude Code via Ollama: `ollama launch claude --model glm-4.7-flash` 2. At the prompt, enter something like `Recall a summary of each book of the Old Testament and write them to old_testament_summaries.md` Note: While this is sufficient to reproduce the behavior on my system, the underlying issue is a timeout; a more powerful computer might be able to successfully complete the task quickly enough that you don't see it. Setting `CUDA_VISIBLE_DEVICES=""` to force CPU inference or specifying a longer text should make the issue more obvious. ## Desired behavior old_testament_summaries.md should be created and contain text. ## Current behavior From the user's point of view, the command takes a long time and eventually starts showing messages about timeouts. `grep ERROR ~/.claude/debug/latest` shows `[ERROR] Error streaming, falling back to non-streaming mode: The operation timed out.` and things like `[ERROR] API error (attempt 1/11): undefined Request timed out.`. No file is created. # Observations https://ollama.com/blog/streaming-tool initially made me think this was expected to work, but reading it more closely I see that it only talks about streaming tool *responses*, not composing the *arguments*, so I'm filing this as a feature request instead of as a bug. I believe this is due to tools/tools.go:Add() gathering up responses before parsing them into a ToolCall. Certainly anthropic/anthropic.go:Process() is not set up to stream tool call arguments - it expects tool calls to be given in one shot. A packet trace supports this theory - I see Ollama talking to the model and coming up with text, while the Claude-Ollama connection has no traffic and Claude eventually gives up. # Proposed Changes Disclaimer: While this suggestion fixes *my* use case, I'm not confident this won't break other things - someone more knowledgeable needs to verify this is a good approach. ## server/routes.go:ChatHandler https://github.com/ollama/ollama/blob/bbbad97686205cfd897a9e4e931889a3598a0652/server/routes.go#L2443-L2448 only passes along events if the parser comes up with non-empty output. When tools/tools.go:Add() is gathering up responses, it generates empty output, and so during that time nothing makes it to anthropic/anthropic.go:Process() Instead of doing nothing when there's empty output, other layers should still be notified so they can react if desired. ``` if req.Stream != nil && *req.Stream { ch <- res } ``` ## anthropic/anthropic.go:Process() anthropic.go defines a PingEvent, though it's never used anywhere. If Process sees an empty message, it could emit a PingEvent: ``` if r.Message.Thinking == "" && r.Message.Content == "" && !r.Done && len(r.Message.ToolCalls) == 0 { events = append(events, StreamEvent{ Event: "ping", Data: PingEvent{ Type: "ping", }, }) } ``` ## Considerations https://github.com/ollama/ollama/blob/bbbad97686205cfd897a9e4e931889a3598a0652/server/routes.go#L610-L614 suggests that quashing empty events was a conscious choice to avoid triggering bad client behavior. What was the bad behavior and is there another way to avoid it? Does a similar change need to happen here?
GiteaMirror added the feature request label 2026-05-05 02:17:27 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#71658