[GH-ISSUE #11586] Support text/event-stream (SSE) in responses to /generate and /chat #69705

Open
opened 2026-05-04 18:55:18 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @prantlf on GitHub (Jul 30, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11586

Support response streaming using the simple text/event-stream content type in addition to the current implementation of application/x-ndjson.

This was originally requested by #4788, but that issue was closed as "completed" without completing the request.

Existing Streaming

Streaming is currently supported by chunked encoding with the content type application/x-ndjson. For example:

curl -v -H 'Content-Type: application/json' \
  -d '{"model":"deepseek-r1:1.5b","messages":[{"role":"user","content":"what is ollama?"}]}' \
  http://localhost:11434/api/chat

< HTTP/1.1 200 OK
< Content-Type: application/x-ndjson
< Transfer-Encoding: chunked
<
{"model":"deepseek-r1:1.5b","created_at":"2025-07-30T09:45:03.537758Z","message":{"role":"assistant","content":"\n\n"},"done":false}
{"model":"deepseek-r1:1.5b","created_at":"2025-07-30T09:45:03.654744Z","message":{"role":"assistant","content":"\u003c/think\u003e"},"done":false}
{"model":"deepseek-r1:1.5b","created_at":"2025-07-30T09:45:03.76711Z","message":{"role":"assistant","content":"\n\n"},"done":false}
{"model":"deepseek-r1:1.5b","created_at":"2025-07-30T09:45:03.877521Z","message":{"role":"assistant","content":"O"},"done":false}
{"model":"deepseek-r1:1.5b","created_at":"2025-07-30T09:45:03.988219Z","message":{"role":"assistant","content":"ll"},"done":false}
{"model":"deepseek-r1:1.5b","created_at":"2025-07-30T09:45:04.098806Z","message":
...
{"model":"deepseek-r1:1.5b","created_at":"2025-07-30T09:45:48.36868Z","message":{"role":"assistant","content":" setup"},"done":false}
{"model":"deepseek-r1:1.5b","created_at":"2025-07-30T09:45:48.517462Z","message":{"role":"assistant","content":"."},"done":false}
{"model":"deepseek-r1:1.5b","created_at":"2025-07-30T09:45:48.665553Z","message":{"role":"assistant","content":""},"done_reason":"stop","done":true,"total_duration":49729537151,"load_duration":3445259896,"prompt_eval_count":9,"prompt_eval_duration":1024746038,"eval_count":274,"eval_duration":45257702287}

SSE Streaming

The simplest SSE support can be added by prefixing each JSON object (chunk) with data: and ending it with two line breaks (\n). For example, this is how Google Gemini supports streaming with SSE:

POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:streamGenerateContent?alt=sse

< HTTP/2 200
< content-type: text/event-stream
<
data: {"candidates": ...}

data: {"candidates": ... "finishReason": "STOP" ...}

The simplest selection of SSE could reuse the existing flag in the POST payload (stream: true) and change the response content type according to the request header Accept: text/event-stream, for example:

curl -v -H 'Content-Type: application/json' -H 'Accept: text/event-stream' \
  -d '{"model":"deepseek-r1:1.5b","messages":[{"role":"user","content":"what is ollama?"}]}' \
  http://localhost:11434/api/chat

< HTTP/1.1 200 OK
< Content-Type: text/event-stream
< Transfer-Encoding: chunked
<
data: {"model":"deepseek-r1:1.5b","created_at":"2025-07-30T14:17:29.148526Z","message":{"role":"assistant","content":"\u003cthink\u003e"},"done":false}

data: {"model":"deepseek-r1:1.5b","created_at":"2025-07-30T14:17:29.264745Z","message":{"role":"assistant","content":"\n\n"},"done":false}

data: {"model":"deepseek-r1:1.5b","created_at":"2025-07-30T14:17:29.376586Z","message":{"role":"assistant","content":"\u003c/think\u003e"},"done":false}

data: {"model":"deepseek-r1:1.5b","created_at":"2025-07-30T14:17:29.482884Z","message":{"role":"assistant","content":"\n\n"},"done":false}

data: {"model":"deepseek-r1:1.5b","created_at":"2025-07-30T14:17:29.595889Z","message":{"role":"assistant","content":"O"},"done":false}

data: {"model":"deepseek-r1:1.5b","created_at":"2025-07-30T14:17:29.704313Z","message":{"role":"assistant","content":"ll"},"done":false}
...
data: {"model":"deepseek-r1:1.5b","created_at":"2025-07-30T14:18:00.560568Z","message":{"role":"assistant","content":".ai"},"done":false}

data: {"model":"deepseek-r1:1.5b","created_at":"2025-07-30T14:18:00.710417Z","message":{"role":"assistant","content":")."},"done":false}

data: {"model":"deepseek-r1:1.5b","created_at":"2025-07-30T14:18:00.860329Z","message":{"role":"assistant","content":""},"done_reason":"stop","done":true,"total_duration":31899147101,"load_duration":66380957,"prompt_eval_count":9,"prompt_eval_duration":116676206,"eval_count":249,"eval_duration":31713086111}

Alternatively, a new flag could be added to the POST payload:

Stream:     true
StreamType: "text/event-stream"

Or overload the existing Stream to be boolean or string, like it was done with Think, for example:

Stream: false | true | "jsonl" | "sse"

SSE in Browsers

EventSource is the built-in API in the browser. However, it doesn't allow setting request method, headers and payload in the standard specification. There's been a request filed for this, with enough interest to have it, but not enough interest to specify and implement it :-) Polyfills:

Modern browsers support fetch and streams, which allow handling SSE communication too, just arguably less conveniently.

Originally created by @prantlf on GitHub (Jul 30, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11586 Support response streaming using the [simple `text/event-stream` content type](https://html.spec.whatwg.org/multipage/server-sent-events.html#parsing-an-event-stream) in addition to the current implementation of `application/x-ndjson`. This was originally requested by #4788, but that issue was closed as "completed" without completing the request. Existing Streaming ------------------ Streaming is currently supported by chunked encoding with the content type `application/x-ndjson`. For example: ``` curl -v -H 'Content-Type: application/json' \ -d '{"model":"deepseek-r1:1.5b","messages":[{"role":"user","content":"what is ollama?"}]}' \ http://localhost:11434/api/chat < HTTP/1.1 200 OK < Content-Type: application/x-ndjson < Transfer-Encoding: chunked < {"model":"deepseek-r1:1.5b","created_at":"2025-07-30T09:45:03.537758Z","message":{"role":"assistant","content":"\n\n"},"done":false} {"model":"deepseek-r1:1.5b","created_at":"2025-07-30T09:45:03.654744Z","message":{"role":"assistant","content":"\u003c/think\u003e"},"done":false} {"model":"deepseek-r1:1.5b","created_at":"2025-07-30T09:45:03.76711Z","message":{"role":"assistant","content":"\n\n"},"done":false} {"model":"deepseek-r1:1.5b","created_at":"2025-07-30T09:45:03.877521Z","message":{"role":"assistant","content":"O"},"done":false} {"model":"deepseek-r1:1.5b","created_at":"2025-07-30T09:45:03.988219Z","message":{"role":"assistant","content":"ll"},"done":false} {"model":"deepseek-r1:1.5b","created_at":"2025-07-30T09:45:04.098806Z","message": ... {"model":"deepseek-r1:1.5b","created_at":"2025-07-30T09:45:48.36868Z","message":{"role":"assistant","content":" setup"},"done":false} {"model":"deepseek-r1:1.5b","created_at":"2025-07-30T09:45:48.517462Z","message":{"role":"assistant","content":"."},"done":false} {"model":"deepseek-r1:1.5b","created_at":"2025-07-30T09:45:48.665553Z","message":{"role":"assistant","content":""},"done_reason":"stop","done":true,"total_duration":49729537151,"load_duration":3445259896,"prompt_eval_count":9,"prompt_eval_duration":1024746038,"eval_count":274,"eval_duration":45257702287} ``` SSE Streaming ------------- The simplest SSE support can be added by prefixing each JSON object (chunk) with `data: ` and ending it with two line breaks (`\n`). For example, this is how [Google Gemini supports streaming with SSE](https://ai.google.dev/gemini-api/docs/text-generation): ``` POST https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:streamGenerateContent?alt=sse < HTTP/2 200 < content-type: text/event-stream < data: {"candidates": ...} data: {"candidates": ... "finishReason": "STOP" ...} ``` The simplest selection of SSE could reuse the existing flag in the POST payload (`stream: true`) and change the response content type according to the request header `Accept: text/event-stream`, for example: ``` curl -v -H 'Content-Type: application/json' -H 'Accept: text/event-stream' \ -d '{"model":"deepseek-r1:1.5b","messages":[{"role":"user","content":"what is ollama?"}]}' \ http://localhost:11434/api/chat < HTTP/1.1 200 OK < Content-Type: text/event-stream < Transfer-Encoding: chunked < data: {"model":"deepseek-r1:1.5b","created_at":"2025-07-30T14:17:29.148526Z","message":{"role":"assistant","content":"\u003cthink\u003e"},"done":false} data: {"model":"deepseek-r1:1.5b","created_at":"2025-07-30T14:17:29.264745Z","message":{"role":"assistant","content":"\n\n"},"done":false} data: {"model":"deepseek-r1:1.5b","created_at":"2025-07-30T14:17:29.376586Z","message":{"role":"assistant","content":"\u003c/think\u003e"},"done":false} data: {"model":"deepseek-r1:1.5b","created_at":"2025-07-30T14:17:29.482884Z","message":{"role":"assistant","content":"\n\n"},"done":false} data: {"model":"deepseek-r1:1.5b","created_at":"2025-07-30T14:17:29.595889Z","message":{"role":"assistant","content":"O"},"done":false} data: {"model":"deepseek-r1:1.5b","created_at":"2025-07-30T14:17:29.704313Z","message":{"role":"assistant","content":"ll"},"done":false} ... data: {"model":"deepseek-r1:1.5b","created_at":"2025-07-30T14:18:00.560568Z","message":{"role":"assistant","content":".ai"},"done":false} data: {"model":"deepseek-r1:1.5b","created_at":"2025-07-30T14:18:00.710417Z","message":{"role":"assistant","content":")."},"done":false} data: {"model":"deepseek-r1:1.5b","created_at":"2025-07-30T14:18:00.860329Z","message":{"role":"assistant","content":""},"done_reason":"stop","done":true,"total_duration":31899147101,"load_duration":66380957,"prompt_eval_count":9,"prompt_eval_duration":116676206,"eval_count":249,"eval_duration":31713086111} ``` Alternatively, a new flag could be added to the POST payload: ``` Stream: true StreamType: "text/event-stream" ``` Or overload the existing `Stream` to be boolean or string, like it was done with `Think`, for example: ``` Stream: false | true | "jsonl" | "sse" ``` SSE in Browsers --------------- [`EventSource` is the built-in API](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events) in the browser. However, it doesn't allow setting request method, headers and payload in the standard specification. There's been [a request filed for this](https://github.com/whatwg/html/issues/2177), with enough interest to have it, but not enough interest to specify and implement it :-) Polyfills: * https://github.com/mpetazzoni/sse.js * https://github.com/Yaffle/EventSource * https://github.com/EventSource/eventsource * https://github.com/joshmossas/event-source-plus Modern browsers support [fetch](https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API/Using_Fetch) and [streams](https://developer.mozilla.org/en-US/docs/Web/API/Streams_API/Concepts), which allow handling SSE communication too, just arguably less conveniently.
GiteaMirror added the feature request label 2026-05-04 18:55:18 -05:00
Author
Owner

@chrisbward commented on GitHub (Aug 12, 2025):

This has caught me out a little, so would love to see this go in!

<!-- gh-comment-id:3180129760 --> @chrisbward commented on GitHub (Aug 12, 2025): This has caught me out a little, so would love to see this go in!
Author
Owner

@prantlf commented on GitHub (Aug 18, 2025):

I noticed that the think parameter was extended from boolean to boolean or string:

think: false | true | 'low' | 'medium' | 'high'

The same could be done with the stream parameter, for example:

stream: false | true | 'jsonl' | 'sse'
<!-- gh-comment-id:3195656441 --> @prantlf commented on GitHub (Aug 18, 2025): I noticed that the `think` parameter was extended from boolean to boolean or string: ``` think: false | true | 'low' | 'medium' | 'high' ``` The same could be done with the `stream` parameter, for example: ``` stream: false | true | 'jsonl' | 'sse' ```
Author
Owner

@prantlf commented on GitHub (Aug 20, 2025):

I noticed that the OpenAI compatibility API (/v1/chat/completions) does support SSE. Was there a reason why Ollama API doesn't?

<!-- gh-comment-id:3206814420 --> @prantlf commented on GitHub (Aug 20, 2025): I noticed that the OpenAI compatibility API (`/v1/chat/completions`) does support SSE. Was there a reason why Ollama API doesn't?
Author
Owner

@joshua-mo-143 commented on GitHub (Sep 17, 2025):

Hi, just commenting my interest on this issue - adding SSE support for streaming would be great.

I currently maintain rig which is currently the leading AI agent framework in Rust. We're very interested in getting this issue resolved/merged as we currently support around 20+ model providers and pretty much all of them support SSE - with the sole exception of ollama unless you use the OpenAI compatibility stuff.

(edit: Of course, that's to say that I guess since this issue hasn't been resolved for a while, I don't see it being merged for a long while and is not particularly urgent to be fixed for our use case. It would be very convenient though!)

<!-- gh-comment-id:3302019796 --> @joshua-mo-143 commented on GitHub (Sep 17, 2025): Hi, just commenting my interest on this issue - adding SSE support for streaming would be great. I currently maintain [rig](https://github.com/0xPlaygrounds/rig) which is currently the leading AI agent framework in Rust. We're very interested in getting this issue resolved/merged as we currently support around 20+ model providers and pretty much all of them support SSE - with the sole exception of ollama unless you use the OpenAI compatibility stuff. (edit: Of course, that's to say that I guess since this issue hasn't been resolved for a while, I don't see it being merged for a long while and is not particularly urgent to be fixed for our use case. It would be very convenient though!)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#69705