[PR #14420] feat: Add optional keepalive heartbeats for streaming endpoints #61367

Open
opened 2026-04-29 16:27:00 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/14420
Author: @josh-richardson
Created: 2/25/2026
Status: 🔄 Open

Base: mainHead: feat/heartbeats


📝 Commits (1)

  • 69745e2 heartbeat on long-running responses

📊 Changes

12 files changed (+889 additions, -173 deletions)

View changed files

📝 api/types.go (+13 -0)
📝 docs/api.md (+6 -0)
📝 docs/api/openai-compatibility.mdx (+2 -0)
📝 docs/api/streaming.mdx (+3 -1)
📝 docs/openapi.yaml (+11 -0)
📝 envconfig/config.go (+43 -22)
📝 envconfig/config_test.go (+22 -0)
📝 middleware/openai_test.go (+113 -0)
📝 openai/openai.go (+28 -9)
📝 openai/openai_test.go (+52 -0)
📝 server/routes.go (+384 -141)
server/stream_heartbeat_test.go (+212 -0)

📄 Description

Motivation

During long running streaming completions, such as those with extensive prefill times or from models with internal thoughts there are often periods where no data is transmitted over the wire for 60 seconds or more. These idle time triggers connection timeouts in infrastructure layers, and crucially for me in web browsers where I'm trying to talk to Ollama directly from, this makes interaction with ollama for non-trivial prompts impossible from a browser. Client-side retries are generally insufficient since the timeouts originate from intermediate layers while the LLM is still quietly working on the server. Re-running the request only results in the whole process starting over again.

This PR adds an optional server-side keepalive heartbeat to streaming API endpoints. It ensures that data continues to flow through the TCP socket at regular intervals, preventing idle timeouts without affecting token semantics.

What Changed

  • New Request Options: Added stream_options.heartbeat_ms to core native request types (api.GenerateRequest and api.ChatRequest).
    • Values > 0 configure the keepalive interval in milliseconds.
    • Values <= 0 explicitly disable keepalives.
  • Global Default Configuration: Added the OLLAMA_STREAM_HEARTBEAT_MS environment variable (defaults to 10000 / 10s) to transparently govern the keepalive interval globally if not overridden per-request.
  • OpenAI Compatibility: Mapped stream_options.heartbeat_ms from /v1/chat/completions and /v1/completions into the native request stream options.
  • Unified Streaming Path: Refactored the streaming writer within the server (routes.go) to use a heartbeat-aware time.Ticker via a background goroutine wrapper (streamResponseWithOptions).
    • This handles multiplexing streaming frames and keepalives alongside net/http connection draining, Context cancelations, and cleanly aborting LLM work.
    • Applies universally across: /api/chat, /api/generate, remote streaming branches, image generation streaming, and OpenAI-compatible endpoints.
  • Keepalive Frame Behaviors:
    • Native endpoints emit a valid non-terminal JSON chunk ("done": false) with empty text content.
    • OpenAI-compatible endpoints emit a no-op SSE data chunk, preserving standard parser compatibility and [DONE] sequence semantics.
  • Documentation: Updated the OpenAPI schema, API documentation, and streaming guides to reflect the new options.
  • Tests: Added coverage for artificial silence interleaving, frequent output suppression, and disabled heartbeat behavior in stream_heartbeat_test.go, alongside updating OpenAI middleware validations.

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/14420 **Author:** [@josh-richardson](https://github.com/josh-richardson) **Created:** 2/25/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `feat/heartbeats` --- ### 📝 Commits (1) - [`69745e2`](https://github.com/ollama/ollama/commit/69745e27970357b21c18daea52afcfa39df3028a) heartbeat on long-running responses ### 📊 Changes **12 files changed** (+889 additions, -173 deletions) <details> <summary>View changed files</summary> 📝 `api/types.go` (+13 -0) 📝 `docs/api.md` (+6 -0) 📝 `docs/api/openai-compatibility.mdx` (+2 -0) 📝 `docs/api/streaming.mdx` (+3 -1) 📝 `docs/openapi.yaml` (+11 -0) 📝 `envconfig/config.go` (+43 -22) 📝 `envconfig/config_test.go` (+22 -0) 📝 `middleware/openai_test.go` (+113 -0) 📝 `openai/openai.go` (+28 -9) 📝 `openai/openai_test.go` (+52 -0) 📝 `server/routes.go` (+384 -141) ➕ `server/stream_heartbeat_test.go` (+212 -0) </details> ### 📄 Description ## Motivation During long running streaming completions, such as those with extensive prefill times or from models with internal thoughts there are often periods where no data is transmitted over the wire for 60 seconds or more. These idle time triggers connection timeouts in infrastructure layers, and crucially for me in web browsers where I'm trying to talk to Ollama directly from, this makes interaction with ollama for non-trivial prompts impossible from a browser. Client-side retries are generally insufficient since the timeouts originate from intermediate layers while the LLM is still quietly working on the server. Re-running the request only results in the whole process starting over again. This PR adds an optional server-side keepalive heartbeat to streaming API endpoints. It ensures that data continues to flow through the TCP socket at regular intervals, preventing idle timeouts without affecting token semantics. ## What Changed - **New Request Options**: Added `stream_options.heartbeat_ms` to core native request types (`api.GenerateRequest` and `api.ChatRequest`). - Values `> 0` configure the keepalive interval in milliseconds. - Values `<= 0` explicitly disable keepalives. - **Global Default Configuration**: Added the `OLLAMA_STREAM_HEARTBEAT_MS` environment variable (defaults to `10000` / 10s) to transparently govern the keepalive interval globally if not overridden per-request. - **OpenAI Compatibility**: Mapped `stream_options.heartbeat_ms` from `/v1/chat/completions` and `/v1/completions` into the native request stream options. - **Unified Streaming Path**: Refactored the streaming writer within the server (`routes.go`) to use a heartbeat-aware `time.Ticker` via a background goroutine wrapper (`streamResponseWithOptions`). - This handles multiplexing streaming frames and keepalives alongside `net/http` connection draining, Context cancelations, and cleanly aborting LLM work. - Applies universally across: `/api/chat`, `/api/generate`, remote streaming branches, image generation streaming, and OpenAI-compatible endpoints. - **Keepalive Frame Behaviors**: - Native endpoints emit a valid non-terminal JSON chunk (`"done": false`) with empty text content. - OpenAI-compatible endpoints emit a no-op SSE data chunk, preserving standard parser compatibility and `[DONE]` sequence semantics. - **Documentation**: Updated the OpenAPI schema, API documentation, and streaming guides to reflect the new options. - **Tests**: Added coverage for artificial silence interleaving, frequent output suppression, and disabled heartbeat behavior in `stream_heartbeat_test.go`, alongside updating OpenAI middleware validations. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-29 16:27:00 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#61367