[GH-ISSUE #13239] non-stream mode returns done:false #70812

Closed
opened 2026-05-04 23:04:44 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @vbour-arm on GitHub (Nov 25, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13239

What is the issue?

Non-stream /api/generate sometimes returns a chunk-shaped object (done:false, no metrics) instead of the final object (done:true + metrics).
I observed this with Ollama version 0.13.0
any clue what i am missing or is this a known issue?

Relevant log output

json object returns from the api/generate with stream: false

    "gen_json": {
        "created_at": "2025-11-25T13:51:58.481232019Z",
        "done": false,
        "model": "llama3.2",
        "response": " \n\nThis is a very long piece of text that contains exactly 50 words, each word repeated multiple times in the same sequence. The result is an unusual text that reads like a stream of consciousness, with every word appearing in the same order throughout its repetition. Here's a breakdown of the structure and meaning behind this unusual piece of writing:\n\n*   Each word is \"word\" (except for the first one, which is not specified but could be any other 50-word repeated sequence)\n*   The text starts with a single word or phrase that sets the tone\n*   After the initial word, the repetition of every word creates an unusual and repetitive rhythm\n\n    This piece of writing can be seen as a commentary on the nature of language and its ability to create meaning through repetition. It could also be interpreted as a reflection on the human experience, where certain words or phrases become synonymous with emotions or ideas. However, one must consider that this kind of writing is not widely used in everyday communication or formal writing.\n\n    Overall, \"word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word"
    }
}

OS

Linux

GPU

No response

CPU

Intel

Ollama version

0.13.0

Originally created by @vbour-arm on GitHub (Nov 25, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13239 ### What is the issue? Non-stream /api/generate sometimes returns a chunk-shaped object (done:false, no metrics) instead of the final object (done:true + metrics). I observed this with Ollama version 0.13.0 any clue what i am missing or is this a known issue? ### Relevant log output ```shell json object returns from the api/generate with stream: false "gen_json": { "created_at": "2025-11-25T13:51:58.481232019Z", "done": false, "model": "llama3.2", "response": " \n\nThis is a very long piece of text that contains exactly 50 words, each word repeated multiple times in the same sequence. The result is an unusual text that reads like a stream of consciousness, with every word appearing in the same order throughout its repetition. Here's a breakdown of the structure and meaning behind this unusual piece of writing:\n\n* Each word is \"word\" (except for the first one, which is not specified but could be any other 50-word repeated sequence)\n* The text starts with a single word or phrase that sets the tone\n* After the initial word, the repetition of every word creates an unusual and repetitive rhythm\n\n This piece of writing can be seen as a commentary on the nature of language and its ability to create meaning through repetition. It could also be interpreted as a reflection on the human experience, where certain words or phrases become synonymous with emotions or ideas. However, one must consider that this kind of writing is not widely used in everyday communication or formal writing.\n\n Overall, \"word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word" } } ``` ### OS Linux ### GPU _No response_ ### CPU Intel ### Ollama version 0.13.0
GiteaMirror added the bugneeds more info labels 2026-05-04 23:04:45 -05:00
Author
Owner

@vbour-arm commented on GitHub (Nov 25, 2025):

I can also add the following context:
the failure occurred after a few 10's of consecutive request executed in an Ansible playbook.
I observed the same using the uri module or the shell or the command modules in Ansible (with a curl thru shell or command modules)

<!-- gh-comment-id:3576065587 --> @vbour-arm commented on GitHub (Nov 25, 2025): I can also add the following context: the failure occurred after a few 10's of consecutive request executed in an Ansible playbook. I observed the same using the uri module or the shell or the command modules in Ansible (with a curl thru shell or command modules)
Author
Owner

@rick-github commented on GitHub (Nov 25, 2025):

 "response": " \n\nThis ...  Overall, \"word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word"

The ollama server has a repeated token detector that stops the generation when the model goes off the rails as appears the case here. Model behaviour like this can result from the context buffer filling up and being shifted. Server log may be helpful in debugging.

<!-- gh-comment-id:3576141062 --> @rick-github commented on GitHub (Nov 25, 2025): ``` "response": " \n\nThis ... Overall, \"word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word" ``` The ollama server has a [repeated token detector](https://github.com/ollama/ollama/blob/47e272c35a9d9b5780826a4965f3115908187a7b/llm/server.go#L1590) that stops the generation when the model goes off the rails as appears the case here. Model behaviour like this can result from the context buffer filling up and being shifted. [Server log](https://github.com/ollama/ollama/blob/47e272c35a9d9b5780826a4965f3115908187a7b/llm/server.go#L1590) may be helpful in debugging.
Author
Owner

@vbour-arm commented on GitHub (Nov 25, 2025):

Thanks for the insight.
Indeed I reproduced 2 more times the failure and in both case I hit the 30 repeated tokens as you flagged.
I did not see any suspicious message in the log. I may have to add some verbosity though.

<!-- gh-comment-id:3576705382 --> @vbour-arm commented on GitHub (Nov 25, 2025): Thanks for the insight. Indeed I reproduced 2 more times the failure and in both case I hit the 30 repeated tokens as you flagged. I did not see any suspicious message in the log. I may have to add some verbosity though.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#70812