[PR #9973] server: support streaming near tool usage #59807

Open
opened 2026-04-29 14:44:33 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/9973
Author: @fizx
Created: 3/25/2025
Status: 🔄 Open

Base: mainHead: streaming


📝 Commits (3)

📊 Changes

3 files changed (+103 additions, -21 deletions)

View changed files

📝 server/images.go (+51 -0)
📝 server/model_test.go (+34 -16)
📝 server/routes.go (+18 -5)

📄 Description

Right now, including any tools in the ChatRequest implicitly turns off streaming responses.

While we don't want to stream tool calls, we often want to stream the text
adjacent to those tool calls. This patch looks at the response and selectively engages streaming.

First PR, trying to follow guidelines, but YMMV. Feedback welcome.

curl localhost:11434/api/chat -d @test.json

Before:

% curl localhost:11434/api/chat -d @test.json
{"model":"llama3.3:latest","created_at":"2025-03-25T05:51:16.13806Z","message":{"role":"assistant","content":"The answer to 2+2 is 4."},"done_reason":"stop","done":true,"total_duration":2643049709,"load_duration":15155625,"prompt_eval_count":92,"prompt_eval_duration":1516872875,"eval_count":12,"eval_duration":1110346291}

After:

% curl localhost:11434/api/chat -d @test.json
{"model":"llama3.3:latest","created_at":"2025-03-25T05:54:36.602815Z","message":{"role":"assistant","content":"The"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-03-25T05:54:36.702575Z","message":{"role":"assistant","content":" answer"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-03-25T05:54:36.80337Z","message":{"role":"assistant","content":" to"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-03-25T05:54:36.904238Z","message":{"role":"assistant","content":" "},"done":false}
{"model":"llama3.3:latest","created_at":"2025-03-25T05:54:37.005213Z","message":{"role":"assistant","content":"2"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-03-25T05:54:37.106363Z","message":{"role":"assistant","content":"+"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-03-25T05:54:37.207305Z","message":{"role":"assistant","content":"2"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-03-25T05:54:37.308348Z","message":{"role":"assistant","content":" is"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-03-25T05:54:37.409389Z","message":{"role":"assistant","content":" "},"done":false}
{"model":"llama3.3:latest","created_at":"2025-03-25T05:54:37.5104Z","message":{"role":"assistant","content":"4"},"done":false}
{"model":"llama3.3:latest","created_at":"2025-03-25T05:54:37.611554Z","message":{"role":"assistant","content":"."},"done":false}
{"model":"llama3.3:latest","created_at":"2025-03-25T05:54:37.712799Z","message":{"role":"assistant","content":""},"done_reason":"stop","done":true,"total_duration":3462526792,"load_duration":1156274000,"prompt_eval_count":92,"prompt_eval_duration":1194830875,"eval_count":12,"eval_duration":1110376750}

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/9973 **Author:** [@fizx](https://github.com/fizx) **Created:** 3/25/2025 **Status:** 🔄 Open **Base:** `main` ← **Head:** `streaming` --- ### 📝 Commits (3) - [`b4ae2de`](https://github.com/ollama/ollama/commit/b4ae2de45a23a0786bd172f3bfe6c671188a2aa2) add a prefix to the model - [`c16c813`](https://github.com/ollama/ollama/commit/c16c813736812b15f1b145255d3dd89660324a91) finish a streaming patch - [`e5512a6`](https://github.com/ollama/ollama/commit/e5512a6e658233ae468ab4919953262d0a6b792c) appease the linter ### 📊 Changes **3 files changed** (+103 additions, -21 deletions) <details> <summary>View changed files</summary> 📝 `server/images.go` (+51 -0) 📝 `server/model_test.go` (+34 -16) 📝 `server/routes.go` (+18 -5) </details> ### 📄 Description Right now, including any tools in the ChatRequest implicitly turns off streaming responses. While we don't want to stream tool calls, we often want to stream the text adjacent to those tool calls. This patch looks at the response and selectively engages streaming. First PR, trying to follow guidelines, but YMMV. Feedback welcome. curl localhost:11434/api/chat -d @[test.json](https://github.com/user-attachments/files/19442584/test.json) Before: ``` % curl localhost:11434/api/chat -d @test.json {"model":"llama3.3:latest","created_at":"2025-03-25T05:51:16.13806Z","message":{"role":"assistant","content":"The answer to 2+2 is 4."},"done_reason":"stop","done":true,"total_duration":2643049709,"load_duration":15155625,"prompt_eval_count":92,"prompt_eval_duration":1516872875,"eval_count":12,"eval_duration":1110346291} ``` After: ``` % curl localhost:11434/api/chat -d @test.json {"model":"llama3.3:latest","created_at":"2025-03-25T05:54:36.602815Z","message":{"role":"assistant","content":"The"},"done":false} {"model":"llama3.3:latest","created_at":"2025-03-25T05:54:36.702575Z","message":{"role":"assistant","content":" answer"},"done":false} {"model":"llama3.3:latest","created_at":"2025-03-25T05:54:36.80337Z","message":{"role":"assistant","content":" to"},"done":false} {"model":"llama3.3:latest","created_at":"2025-03-25T05:54:36.904238Z","message":{"role":"assistant","content":" "},"done":false} {"model":"llama3.3:latest","created_at":"2025-03-25T05:54:37.005213Z","message":{"role":"assistant","content":"2"},"done":false} {"model":"llama3.3:latest","created_at":"2025-03-25T05:54:37.106363Z","message":{"role":"assistant","content":"+"},"done":false} {"model":"llama3.3:latest","created_at":"2025-03-25T05:54:37.207305Z","message":{"role":"assistant","content":"2"},"done":false} {"model":"llama3.3:latest","created_at":"2025-03-25T05:54:37.308348Z","message":{"role":"assistant","content":" is"},"done":false} {"model":"llama3.3:latest","created_at":"2025-03-25T05:54:37.409389Z","message":{"role":"assistant","content":" "},"done":false} {"model":"llama3.3:latest","created_at":"2025-03-25T05:54:37.5104Z","message":{"role":"assistant","content":"4"},"done":false} {"model":"llama3.3:latest","created_at":"2025-03-25T05:54:37.611554Z","message":{"role":"assistant","content":"."},"done":false} {"model":"llama3.3:latest","created_at":"2025-03-25T05:54:37.712799Z","message":{"role":"assistant","content":""},"done_reason":"stop","done":true,"total_duration":3462526792,"load_duration":1156274000,"prompt_eval_count":92,"prompt_eval_duration":1194830875,"eval_count":12,"eval_duration":1110376750} ``` --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-29 14:44:33 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#59807