[GH-ISSUE #1728] Streaming multiple json objects at the same time #47495

Closed
opened 2026-04-28 03:56:10 -05:00 by GiteaMirror · 11 comments
Owner

Originally created by @pepperoni21 on GitHub (Dec 27, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1728

It seems like sometimes Ollama streams multiple json objects one after the other in the same streamed response, which cannot be deserialized.

Here's an example of one single streamed json response using the /generate endpoint

{"model":"dolphin-mixtral:latest","created_at":"2023-12-25T01:12:45.58944567Z","response":" you","done":false}\n
{"model":"dolphin-mixtral:latest","created_at":"2023-12-25T01:12:45.607384298Z","response":" today","done":false}\n
{"model":"dolphin-mixtral:latest","created_at":"2023-12-25T01:12:45.625372937Z","response":"?","done":false}\n
{"model":"dolphin-mixtral:latest","created_at":"2023-12-25T01:12:45.643531751Z","response":"","done":true,"context":[32001,6574,13,24205,574,8570,6817,28723,32000,13,32001,1838,13,21558,28801,13,32000,13,32001,489,11143,13,22557,28808,1602,541,315,6031,368,3154,28804],"total_duration":376468647,"load_duration":758387,"prompt_eval_count":23,"prompt_eval_duration":226302000,"eval_count":9,"eval_duration":147877000}
Originally created by @pepperoni21 on GitHub (Dec 27, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1728 It seems like sometimes Ollama streams multiple json objects one after the other in the same streamed response, which cannot be deserialized. Here's an example of one single streamed json response using the /generate endpoint ```json {"model":"dolphin-mixtral:latest","created_at":"2023-12-25T01:12:45.58944567Z","response":" you","done":false}\n {"model":"dolphin-mixtral:latest","created_at":"2023-12-25T01:12:45.607384298Z","response":" today","done":false}\n {"model":"dolphin-mixtral:latest","created_at":"2023-12-25T01:12:45.625372937Z","response":"?","done":false}\n {"model":"dolphin-mixtral:latest","created_at":"2023-12-25T01:12:45.643531751Z","response":"","done":true,"context":[32001,6574,13,24205,574,8570,6817,28723,32000,13,32001,1838,13,21558,28801,13,32000,13,32001,489,11143,13,22557,28808,1602,541,315,6031,368,3154,28804],"total_duration":376468647,"load_duration":758387,"prompt_eval_count":23,"prompt_eval_duration":226302000,"eval_count":9,"eval_duration":147877000} ```
Author
Owner

@xprnio commented on GitHub (Dec 27, 2023):

Not sometimes, but always unless you specify stream: false in the request.
If stream: true (default), Ollama will stream a ND-JSON payload (Newline-Delimited JSON) which just means that each new line of the response is a separate JSON payload which you need to process. Depending on whether you want your application to support streaming the response or if you're okay with waiting for the whole response to come through before doing anything with it, you can change the stream setting in the request.

https://github.com/jmorganca/ollama/blob/main/docs/api.md#generate-a-completion

<!-- gh-comment-id:1870395104 --> @xprnio commented on GitHub (Dec 27, 2023): Not sometimes, but always unless you specify `stream: false` in the request. If `stream: true` (default), Ollama will stream a ND-JSON payload (Newline-Delimited JSON) which just means that each new line of the response is a separate JSON payload which you need to process. Depending on whether you want your application to support streaming the response or if you're okay with waiting for the whole response to come through before doing anything with it, you can change the `stream` setting in the request. https://github.com/jmorganca/ollama/blob/main/docs/api.md#generate-a-completion
Author
Owner

@pepperoni21 commented on GitHub (Dec 27, 2023):

Not sometimes, but always unless you specify stream: false in the request. If stream: true (default), Ollama will stream a ND-JSON payload (Newline-Delimited JSON) which just means that each new line of the response is a separate JSON payload which you need to process. Depending on whether you want your application to support streaming the response or if you're okay with waiting for the whole response to come through before doing anything with it, you can change the stream setting in the request.

https://github.com/jmorganca/ollama/blob/main/docs/api.md#generate-a-completion

Oh ok I didn't know this ND-JSON format, because most of the time I get 1 json object per payload. Thanks!

<!-- gh-comment-id:1870410953 --> @pepperoni21 commented on GitHub (Dec 27, 2023): > Not sometimes, but always unless you specify `stream: false` in the request. If `stream: true` (default), Ollama will stream a ND-JSON payload (Newline-Delimited JSON) which just means that each new line of the response is a separate JSON payload which you need to process. Depending on whether you want your application to support streaming the response or if you're okay with waiting for the whole response to come through before doing anything with it, you can change the `stream` setting in the request. > > https://github.com/jmorganca/ollama/blob/main/docs/api.md#generate-a-completion Oh ok I didn't know this ND-JSON format, because most of the time I get 1 json object per payload. Thanks!
Author
Owner

@marcospgp commented on GitHub (Feb 14, 2025):

I ran into this same issue today and was unaware the format was expected. I guess two things come to mind:

  1. If there are two chunks available, why not send them in a single object? If ollama sends chunks as soon as they are ready, one would expect there to only ever be one
  2. The documentation should be clearer on this as I was not expecting it after reading it while setting up my code
<!-- gh-comment-id:2660329736 --> @marcospgp commented on GitHub (Feb 14, 2025): I ran into this same issue today and was unaware the format was expected. I guess two things come to mind: 1. If there are two chunks available, why not send them in a single object? If ollama sends chunks as soon as they are ready, one would expect there to only ever be one 2. The documentation should be clearer on this as I was not expecting it after reading it while setting up my code
Author
Owner

@xprnio commented on GitHub (Feb 18, 2025):

@marcospgp But the documentation does pretty clearly say that the API returns a stream of JSON objects, as opposed to a single JSON object. I'm not sure if it's just me or what, but this seems like something that any moderately competent developer would be able to catch even without any documentation. But then again - we live in an age where more and more code is being written by AI, leading to the general reduction of actual programming knowledge among developers.

<!-- gh-comment-id:2666264160 --> @xprnio commented on GitHub (Feb 18, 2025): @marcospgp But the documentation does pretty clearly say that [the API returns a stream of JSON objects, as opposed to a single JSON object](https://github.com/ollama/ollama/blob/main/docs/api.md#response). I'm not sure if it's just me or what, but this seems like something that any moderately competent developer would be able to catch even without any documentation. But then again - we live in an age where more and more code is being written by AI, leading to the general reduction of actual programming knowledge among developers.
Author
Owner

@pepperoni21 commented on GitHub (Feb 18, 2025):

@marcusziade But the documentation does pretty clearly say that the API returns a stream of JSON objects, as opposed to a single JSON object. I'm not sure if it's just me or what, but this seems like something that any moderately competent developer would be able to catch even without any documentation. But then again - we live in an age where more and more code is being written by AI, leading to the general reduction of actual programming knowledge among developers.

Then a simple link would've been enough, no one needed your opinion here

<!-- gh-comment-id:2666332963 --> @pepperoni21 commented on GitHub (Feb 18, 2025): > [@marcusziade](https://github.com/marcusziade) But the documentation does pretty clearly say that [the API returns a stream of JSON objects, as opposed to a single JSON object](https://github.com/ollama/ollama/blob/main/docs/api.md#response). I'm not sure if it's just me or what, but this seems like something that any moderately competent developer would be able to catch even without any documentation. But then again - we live in an age where more and more code is being written by AI, leading to the general reduction of actual programming knowledge among developers. Then a simple link would've been enough, no one needed your opinion here
Author
Owner

@guitaripod commented on GitHub (Feb 18, 2025):

Why am I tagged here?

<!-- gh-comment-id:2667046309 --> @guitaripod commented on GitHub (Feb 18, 2025): Why am I tagged here?
Author
Owner

@xprnio commented on GitHub (Feb 19, 2025):

@marcospgp But the documentation does pretty clearly say that the API returns a stream of JSON objects, as opposed to a single JSON object. I'm not sure if it's just me or what, but this seems like something that any moderately competent developer would be able to catch even without any documentation. But then again - we live in an age where more and more code is being written by AI, leading to the general reduction of actual programming knowledge among developers.

Then a simple link would've been enough, no one needed your opinion here

Yet I felt it had to be mentioned. Was my opinion wrong in any way, or did you just feel hurt by the honesty within it? Trust me - if you want to have a discussion about this topic, then let's go. But being a delicate little snowflake and feeling hurt by my honest opinion (which, in my opinion, also carries valuable constructive criticism) isn't valuable to anybody either

<!-- gh-comment-id:2668648972 --> @xprnio commented on GitHub (Feb 19, 2025): > > [@marcospgp](https://github.com/marcospgp) But the documentation does pretty clearly say that [the API returns a stream of JSON objects, as opposed to a single JSON object](https://github.com/ollama/ollama/blob/main/docs/api.md#response). I'm not sure if it's just me or what, but this seems like something that any moderately competent developer would be able to catch even without any documentation. But then again - we live in an age where more and more code is being written by AI, leading to the general reduction of actual programming knowledge among developers. > > Then a simple link would've been enough, no one needed your opinion here Yet I felt it had to be mentioned. Was my opinion wrong in any way, or did you just feel hurt by the honesty within it? Trust me - if you want to have a discussion about this topic, then let's go. But being a delicate little snowflake and feeling hurt by my honest opinion (which, in my opinion, also carries valuable constructive criticism) isn't valuable to anybody either
Author
Owner

@xprnio commented on GitHub (Feb 19, 2025):

Why am I tagged here?

Oh my god, I'm so sorry - I must've selected the wrong person from the dropdown. I'm sorry, I meant to tag @marcospgp

<!-- gh-comment-id:2668651579 --> @xprnio commented on GitHub (Feb 19, 2025): > Why am I tagged here? Oh my god, I'm so sorry - I must've selected the wrong person from the dropdown. I'm sorry, I meant to tag @marcospgp
Author
Owner

@guitaripod commented on GitHub (Feb 20, 2025):

It's all good lol

<!-- gh-comment-id:2670543585 --> @guitaripod commented on GitHub (Feb 20, 2025): It's all good lol
Author
Owner

@marcospgp commented on GitHub (Feb 26, 2025):

@xprnio congrats on the hardcore non AI dev but docs say response is a stream of objects, doesn't mean more than one object will be sent per buffer flush

<!-- gh-comment-id:2686151074 --> @marcospgp commented on GitHub (Feb 26, 2025): @xprnio congrats on the hardcore non AI dev but docs say response is a stream of objects, doesn't mean more than one object will be sent per buffer flush
Author
Owner

@xprnio commented on GitHub (Mar 4, 2025):

@marcospgp Then please explain how you interpret what's said in the documentation, and how it could be improved. Maybe this is just the inverse of a skill issue, maybe it is just me understanding it because of prior experience, or maybe it's a language barrier. But in any case, I personally do think that the documentation is quite clear

<!-- gh-comment-id:2697725212 --> @xprnio commented on GitHub (Mar 4, 2025): @marcospgp Then please explain how you interpret what's said in the documentation, and how it could be improved. Maybe this is just the inverse of a skill issue, maybe it is just me understanding it because of prior experience, or maybe it's a language barrier. But in any case, I personally do think that the documentation is quite clear
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#47495