[GH-ISSUE #294] Streaming responses should have Content-Type set to application/x-ndjson #46640

Closed
opened 2026-04-27 23:17:28 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @jmorganca on GitHub (Aug 6, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/294

Originally assigned to: @jmorganca on GitHub.

Currently streaming responses return text/plain but they should return application/x-ndjson . Later we should consider application/json (see #281) or text/event-stream for browser based clients

Originally created by @jmorganca on GitHub (Aug 6, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/294 Originally assigned to: @jmorganca on GitHub. Currently streaming responses return `text/plain` but they should return `application/x-ndjson `. Later we should consider `application/json` (see #281) or `text/event-stream` for browser based clients
GiteaMirror added the buggood first issue labels 2026-04-27 23:17:28 -05:00
Author
Owner

@jmorganca commented on GitHub (Aug 9, 2023):

This was fixed in cff002b

<!-- gh-comment-id:1670639043 --> @jmorganca commented on GitHub (Aug 9, 2023): This was fixed in [cff002b](https://github.com/jmorganca/ollama/commit/cff002b82447a5bed197be1a39ca3e338cd6aa19)
Author
Owner

@drhino commented on GitHub (Aug 9, 2023):

Hi jmorganca, just trying to be helpful here:

The official approach your looking for is: https://datatracker.ietf.org/doc/html/rfc7464

Anything that says application/*json is supposed to be valid json. A stream of json objects is not valid json.
Interesting discussion about that here: https://github.com/spring-projects/spring-framework/issues/21283

Not every content-type can be streamed (depending on the client). But text/plain you can always stream. And is widely adopted.
text/event-stream has limited support. It also complicates reading with curl from the command line.

In summary:

  • Valid: text/plain
  • Better: text/plain; charset=utf-8 (since json is always utf-8)
  • Standard: application/json-seq (support unknown)
  • Limited support: text/event-stream
  • Invalid: application/stream+json (since the response is not json)
  • Invalid: application/x-ndjson (non-official, partial support)

More info:


About #281, I think you meant: Accept: application/json rather than Content-Type.
If that's the case, that could be a great addition for those who want to.
However, since this is HTTP, there is no way to differentiate between a timeout and waiting for the response to be generated. Some clients also replay the request if the connection temporarily drops.

So for that to work, can I suggest the following (just a suggestion):

POST /api/generate HTTP/1.1
Host: localhost:3333
Accept: application/json

{ "model": "orca", "prompt": "hi" }
HTTP/1.1 202 Accepted
Content-Type: application/json
Vary: Accept

{ "id": "1234" }
GET /api/generate?id=1234 HTTP/1.1
Host: localhost:3333
HTTP/1.1 299 Generating
Content-Type: text/plain; charset=utf-8
Retry-After: 5
Refresh: 5
Cache-Control: no-store

Generating... This page will refresh after 5 seconds (in the browser).
HTTP/1.1 200 OK
Content-Type: text/plain

I'm just an AI.

Note that 299 Generating does not exist. But neither 4xx or 5xx is good for this edge-case. Any of the 2xx range will work. But since none actually describe this behaviour, you can always invent your own. This should be compatible with any client since 2xx simply indicates the request is valid and the server understands that request. Refresh is not official either. But it actually triggers a refresh in the browser. The Retry-After is the official one to use. Any OK response can be cached by a client, so using Cache-Control: no-store is advised.

<!-- gh-comment-id:1670754797 --> @drhino commented on GitHub (Aug 9, 2023): Hi jmorganca, just trying to be helpful here: The official approach your looking for is: https://datatracker.ietf.org/doc/html/rfc7464 Anything that says `application/*json` is supposed to be valid json. A stream of json objects is not valid json. Interesting discussion about that here: https://github.com/spring-projects/spring-framework/issues/21283 Not every content-type can be streamed (depending on the client). But `text/plain` you can always stream. And is widely adopted. `text/event-stream` has limited support. It also complicates reading with curl from the command line. In summary: - Valid: `text/plain` - Better: `text/plain; charset=utf-8` (since json is always utf-8) - Standard: `application/json-seq` (support unknown) - Limited support: `text/event-stream` - Invalid: `application/stream+json` (since the response is not json) - Invalid: `application/x-ndjson` (non-official, partial support) More info: - https://www.w3.org/TR/activitystreams-core/#ex27-jsonld - https://www.w3.org/wiki/Activity_Streams/Examples - https://www.w3.org/wiki/Activity_Streams#Definition - https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events - https://jsonlines.org/ (! non-standard) - https://github.com/ndjson/ndjson-spec (! non-standard) - https://www.iana.org/assignments/media-types/media-types.xhtml (official, lists `json-seq`) --- About #281, I think you meant: `Accept: application/json` rather than `Content-Type`. If that's the case, that could be a great addition for those who want to. However, since this is HTTP, there is no way to differentiate between a timeout and waiting for the response to be generated. Some clients also replay the request if the connection temporarily drops. So for that to work, can I suggest the following (just a suggestion): ``` POST /api/generate HTTP/1.1 Host: localhost:3333 Accept: application/json { "model": "orca", "prompt": "hi" } ``` ``` HTTP/1.1 202 Accepted Content-Type: application/json Vary: Accept { "id": "1234" } ``` ``` GET /api/generate?id=1234 HTTP/1.1 Host: localhost:3333 ``` ``` HTTP/1.1 299 Generating Content-Type: text/plain; charset=utf-8 Retry-After: 5 Refresh: 5 Cache-Control: no-store Generating... This page will refresh after 5 seconds (in the browser). ``` ``` HTTP/1.1 200 OK Content-Type: text/plain I'm just an AI. ``` Note that `299 Generating` does not exist. But neither 4xx or 5xx is good for this edge-case. Any of the 2xx range will work. But since none actually describe this behaviour, you can always invent your own. This should be compatible with any client since 2xx simply indicates the request is valid and the server understands that request. `Refresh` is not official either. But it actually triggers a refresh in the browser. The `Retry-After` is the official one to use. Any `OK` response can be cached by a client, so using `Cache-Control: no-store` is advised.
Author
Owner

@Enhitech commented on GitHub (Sep 18, 2023):

What is the final solution for non-streaming response?

<!-- gh-comment-id:1722710019 --> @Enhitech commented on GitHub (Sep 18, 2023): What is the final solution for non-streaming response?
Author
Owner

@pdevine commented on GitHub (Jan 27, 2024):

@Enhitech you can set stream to false in the /api/generate and /api/chat endpoints.

<!-- gh-comment-id:1912862783 --> @pdevine commented on GitHub (Jan 27, 2024): @Enhitech you can set `stream` to `false` in the `/api/generate` and `/api/chat` endpoints.
Author
Owner

@twinsant commented on GitHub (Nov 14, 2025):

It's better to add charset in the Content-Type header: https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Content-Type

otherwise we got troubles when use encoding other than iso-8859-1 :-(

<!-- gh-comment-id:3531123102 --> @twinsant commented on GitHub (Nov 14, 2025): It's better to add charset in the Content-Type header: https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Content-Type otherwise we got troubles when use encoding other than iso-8859-1 :-(
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#46640