[PR #13901] runner: Adds prompt_eval_progress argument to stream prompt completion progress with ollama streaming API #61135

Open
opened 2026-04-29 16:12:58 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/13901
Author: @balisujohn
Created: 1/25/2026
Status: 🔄 Open

Base: mainHead: jbalis/dev-prompt-progress


📝 Commits (3)

  • 694f83c v0 of streaming prompt progress
  • 48b10c2 added tests
  • d0f5593 changed progress key to avoid overlap, added tests, switched to blocking sends for progress updates

📊 Changes

7 files changed (+665 additions, -101 deletions)

View changed files

📝 api/types.go (+26 -0)
📝 integration/api_test.go (+161 -0)
📝 llm/server.go (+14 -3)
📝 runner/llamarunner/runner.go (+63 -36)
📝 runner/ollamarunner/runner.go (+65 -38)
📝 server/routes.go (+30 -24)
📝 server/routes_generate_test.go (+306 -0)

📄 Description

This is a messy proof of concept draft.

I have workloads with long prompts on a weak ollama server. As a result of this, a progress indicator on prompt processing should significantly improve frontend user experience. To this end, this PR proposes an optional flag to allow prompt processing updates to be added to streaming endpoints before response tokens are generated. This is in a rough state, and I will clean it up before requesting review, though I would love to hear people's thoughts on this. I will also make an accompanying issue.

I tested manually with the following curl commands, I will look into automated tests.

With new prompt_eval_progress argument

curl -N http://localhost:11434/api/generate -d "{\"model\": \"gemma3:270m\", \"prompt\": \"$(python3 -c "print('The quick brown fox jumps over the lazy dog. ' * 200)")\", \"prompt_eval_progress\": 100, \"stream\": true}"

Response from ollama

{"model":"gemma3:270m","created_at":"2026-01-26T05:50:52.817104393Z","response":"","done":false,"prompt_eval_completed":512,"prompt_eval_total":2010}
{"model":"gemma3:270m","created_at":"2026-01-26T05:51:01.287749423Z","response":"","done":false,"prompt_eval_completed":1024,"prompt_eval_total":2010}
{"model":"gemma3:270m","created_at":"2026-01-26T05:51:12.605503033Z","response":"","done":false,"prompt_eval_completed":1536,"prompt_eval_total":2010}
{"model":"gemma3:270m","created_at":"2026-01-26T05:51:39.533355083Z","response":"The","done":false}
{"model":"gemma3:270m","created_at":"2026-01-26T05:51:39.596124743Z","response":" quick","done":false}
{"model":"gemma3:270m","created_at":"2026-01-26T05:51:39.659647952Z","response":" brown","done":false}
{"model":"gemma3:270m","created_at":"2026-01-26T05:51:39.749931059Z","response":" fox","done":false}
{"model":"gemma3:270m","created_at":"2026-01-26T05:51:39.819387197Z","response":" jumps","done":false}
{"model":"gemma3:270m","created_at":"2026-01-26T05:51:39.895974067Z","response":" over","done":false}
{"model":"gemma3:270m","created_at":"2026-01-26T05:51:39.965382281Z","response":" the","done":false}
{"model":"gemma3:270m","created_at":"2026-01-26T05:51:40.052979529Z","response":" lazy","done":false}
{"model":"gemma3:270m","created_at":"2026-01-26T05:51:40.146880588Z","response":" dog","done":false}
{"model":"gemma3:270m","created_at":"2026-01-26T05:51:40.2099141Z","response":".","done":false}
{"model":"gemma3:270m","created_at":"2026-01-26T05:51:40.29996901Z","response":"\n","done":false}
...

Without new prompt_eval_progress argument

 curl -N http://localhost:11434/api/generate -d "{\"model\": \"gemma3:270m\", \"prompt\": \"$(python3 -c "print('The quick brown fox jumps over the lazy dog. ' * 200)")\",  \"stream\": true}"

Response from Ollama

{"model":"gemma3:270m","created_at":"2026-01-26T05:50:52.817104393Z","response":"","done":false,"prompt_eval_completed":512,"prompt_eval_total":2010}
{"model":"gemma3:270m","created_at":"2026-01-26T05:51:01.287749423Z","response":"","done":false,"prompt_eval_completed":1024,"prompt_eval_total":2010}
{"model":"gemma3:270m","created_at":"2026-01-26T05:51:12.605503033Z","response":"","done":false,"prompt_eval_completed":1536,"prompt_eval_total":2010}
{"model":"gemma3:270m","created_at":"2026-01-26T05:51:39.533355083Z","response":"The","done":false}
{"model":"gemma3:270m","created_at":"2026-01-26T05:51:39.596124743Z","response":" quick","done":false}
{"model":"gemma3:270m","created_at":"2026-01-26T05:51:39.659647952Z","response":" brown","done":false}
{"model":"gemma3:270m","created_at":"2026-01-26T05:51:39.749931059Z","response":" fox","done":false}
{"model":"gemma3:270m","created_at":"2026-01-26T05:51:39.819387197Z","response":" jumps","done":false}
{"model":"gemma3:270m","created_at":"2026-01-26T05:51:39.895974067Z","response":" over","done":false}
{"model":"gemma3:270m","created_at":"2026-01-26T05:51:39.965382281Z","response":" the","done":false}
{"model":"gemma3:270m","created_at":"2026-01-26T05:51:40.052979529Z","response":" lazy","done":false}
{"model":"gemma3:270m","created_at":"2026-01-26T05:51:40.146880588Z","response":" dog","done":false}
{"model":"gemma3:270m","created_at":"2026-01-26T05:51:40.2099141Z","response":".","done":false}
{"model":"gemma3:270m","created_at":"2026-01-26T05:51:40.29996901Z","response":"\n","done":false}

...

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/13901 **Author:** [@balisujohn](https://github.com/balisujohn) **Created:** 1/25/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `jbalis/dev-prompt-progress` --- ### 📝 Commits (3) - [`694f83c`](https://github.com/ollama/ollama/commit/694f83c68944d0b1105f2d9ef23cfee474ad474b) v0 of streaming prompt progress - [`48b10c2`](https://github.com/ollama/ollama/commit/48b10c2875336f88e2b3c74a7c51b5ba41fe3485) added tests - [`d0f5593`](https://github.com/ollama/ollama/commit/d0f559320e5bdea3b22bdbae2d54fbfe45cbc755) changed progress key to avoid overlap, added tests, switched to blocking sends for progress updates ### 📊 Changes **7 files changed** (+665 additions, -101 deletions) <details> <summary>View changed files</summary> 📝 `api/types.go` (+26 -0) 📝 `integration/api_test.go` (+161 -0) 📝 `llm/server.go` (+14 -3) 📝 `runner/llamarunner/runner.go` (+63 -36) 📝 `runner/ollamarunner/runner.go` (+65 -38) 📝 `server/routes.go` (+30 -24) 📝 `server/routes_generate_test.go` (+306 -0) </details> ### 📄 Description This is a messy proof of concept draft. I have workloads with long prompts on a weak ollama server. As a result of this, a progress indicator on prompt processing should significantly improve frontend user experience. To this end, this PR proposes an optional flag to allow prompt processing updates to be added to streaming endpoints before response tokens are generated. This is in a rough state, and I will clean it up before requesting review, though I would love to hear people's thoughts on this. I will also make an accompanying issue. I tested manually with the following curl commands, I will look into automated tests. ## With new `prompt_eval_progress` argument ```` curl -N http://localhost:11434/api/generate -d "{\"model\": \"gemma3:270m\", \"prompt\": \"$(python3 -c "print('The quick brown fox jumps over the lazy dog. ' * 200)")\", \"prompt_eval_progress\": 100, \"stream\": true}" ```` ## Response from ollama ```` {"model":"gemma3:270m","created_at":"2026-01-26T05:50:52.817104393Z","response":"","done":false,"prompt_eval_completed":512,"prompt_eval_total":2010} {"model":"gemma3:270m","created_at":"2026-01-26T05:51:01.287749423Z","response":"","done":false,"prompt_eval_completed":1024,"prompt_eval_total":2010} {"model":"gemma3:270m","created_at":"2026-01-26T05:51:12.605503033Z","response":"","done":false,"prompt_eval_completed":1536,"prompt_eval_total":2010} {"model":"gemma3:270m","created_at":"2026-01-26T05:51:39.533355083Z","response":"The","done":false} {"model":"gemma3:270m","created_at":"2026-01-26T05:51:39.596124743Z","response":" quick","done":false} {"model":"gemma3:270m","created_at":"2026-01-26T05:51:39.659647952Z","response":" brown","done":false} {"model":"gemma3:270m","created_at":"2026-01-26T05:51:39.749931059Z","response":" fox","done":false} {"model":"gemma3:270m","created_at":"2026-01-26T05:51:39.819387197Z","response":" jumps","done":false} {"model":"gemma3:270m","created_at":"2026-01-26T05:51:39.895974067Z","response":" over","done":false} {"model":"gemma3:270m","created_at":"2026-01-26T05:51:39.965382281Z","response":" the","done":false} {"model":"gemma3:270m","created_at":"2026-01-26T05:51:40.052979529Z","response":" lazy","done":false} {"model":"gemma3:270m","created_at":"2026-01-26T05:51:40.146880588Z","response":" dog","done":false} {"model":"gemma3:270m","created_at":"2026-01-26T05:51:40.2099141Z","response":".","done":false} {"model":"gemma3:270m","created_at":"2026-01-26T05:51:40.29996901Z","response":"\n","done":false} ... ```` ## Without new `prompt_eval_progress` argument ```` curl -N http://localhost:11434/api/generate -d "{\"model\": \"gemma3:270m\", \"prompt\": \"$(python3 -c "print('The quick brown fox jumps over the lazy dog. ' * 200)")\", \"stream\": true}" ```` ## Response from Ollama ```` {"model":"gemma3:270m","created_at":"2026-01-26T05:50:52.817104393Z","response":"","done":false,"prompt_eval_completed":512,"prompt_eval_total":2010} {"model":"gemma3:270m","created_at":"2026-01-26T05:51:01.287749423Z","response":"","done":false,"prompt_eval_completed":1024,"prompt_eval_total":2010} {"model":"gemma3:270m","created_at":"2026-01-26T05:51:12.605503033Z","response":"","done":false,"prompt_eval_completed":1536,"prompt_eval_total":2010} {"model":"gemma3:270m","created_at":"2026-01-26T05:51:39.533355083Z","response":"The","done":false} {"model":"gemma3:270m","created_at":"2026-01-26T05:51:39.596124743Z","response":" quick","done":false} {"model":"gemma3:270m","created_at":"2026-01-26T05:51:39.659647952Z","response":" brown","done":false} {"model":"gemma3:270m","created_at":"2026-01-26T05:51:39.749931059Z","response":" fox","done":false} {"model":"gemma3:270m","created_at":"2026-01-26T05:51:39.819387197Z","response":" jumps","done":false} {"model":"gemma3:270m","created_at":"2026-01-26T05:51:39.895974067Z","response":" over","done":false} {"model":"gemma3:270m","created_at":"2026-01-26T05:51:39.965382281Z","response":" the","done":false} {"model":"gemma3:270m","created_at":"2026-01-26T05:51:40.052979529Z","response":" lazy","done":false} {"model":"gemma3:270m","created_at":"2026-01-26T05:51:40.146880588Z","response":" dog","done":false} {"model":"gemma3:270m","created_at":"2026-01-26T05:51:40.2099141Z","response":".","done":false} {"model":"gemma3:270m","created_at":"2026-01-26T05:51:40.29996901Z","response":"\n","done":false} ... ```` --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-29 16:12:58 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#61135