[GH-ISSUE #13229] Inconsistent Responses and done:false with mistral-small3.1:24b (caused by repeated token ?) #55259

Open
opened 2026-04-29 08:38:39 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @ZeyBal on GitHub (Nov 24, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13229

What is the issue?

Inconsistent Responses and done:false When Analyzing Multiple Files

The model seem to stop predicting after a lot of whitespace.
I want to analyze a list of files using a specific prompt, where I ask the model to return a table with the following columns:

  • The first column corresponds to the criterion.
  • The second column contains the model’s analysis based on the documents. If no information is available, it should return -.
  • The third column contains the associated quotation extracted from the document.

After extracting the text from all files, I send the files' content (in Markdown) along with my prompt to the Ollama endpoint /api/chat.

However, I encounter inconsistent behavior, sometimes I get the full table, but sometimes I don’t.
Sometimes the response contains "done": false, and the model output always stops at the Analysis column.
In the Docker logs of my Ollama container, the request still returns HTTP 200.

What I tested

I tested using a Python script and also with Postman.
I tested both endpoints: /api/chat and /api/generate.

Example request:

{
  "model": "mistral-small3.1:24b",
  "messages": [
    { "role": "user", "content": "File names and file contents with the prompt " }
  ],
  "stream": false,
  "options": {
    "num_ctx": 128000,
    "temperature": 0.2
  }
}

Example incomplete response:

{
    "model": "mistral-small3.1:24b",
    "created_at": "2025-11-24T15:43:38.33241347Z",
    "message": {
        "role": "assistant",
        "content": "Here is the analysis of the provided documents:\n\n| Criterion                                                                 | Analysis                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                "
    },
    "done": false
}

I also tested multiple temperature values: 0.1, 0.2, 0.3, 0.5, 0.8.

When using "stream": true, I waited more than 30 minutes but I did not receive a complete response.

Issue

I don’t understand why the model gets stuck for this specific prompt, or why it sometimes stops generating output partway through the table. Any insight into why this happens or how to fix it would be appreciated.

Relevant log output


OS

Linux, Docker

GPU

Nvidia

CPU

No response

Ollama version

0.9.0

Originally created by @ZeyBal on GitHub (Nov 24, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13229 ### What is the issue? ## Inconsistent Responses and `done:false` When Analyzing Multiple Files The model seem to stop predicting after a lot of whitespace. I want to analyze a list of files using a specific prompt, where I ask the model to return a table with the following columns: * The first column corresponds to the criterion. * The second column contains the model’s analysis based on the documents. If no information is available, it should return `-`. * The third column contains the associated quotation extracted from the document. After extracting the text from all files, I send the files' content (in Markdown) along with my prompt to the Ollama endpoint `/api/chat`. However, I encounter inconsistent behavior, sometimes I get the full table, but sometimes I don’t. Sometimes the response contains `"done": false`, and the model output always stops at the **Analysis** column. In the Docker logs of my Ollama container, the request still returns HTTP 200. ### What I tested I tested using a Python script and also with Postman. I tested both endpoints: `/api/chat` and `/api/generate`. Example request: ```json { "model": "mistral-small3.1:24b", "messages": [ { "role": "user", "content": "File names and file contents with the prompt " } ], "stream": false, "options": { "num_ctx": 128000, "temperature": 0.2 } } ``` Example incomplete response: ```json { "model": "mistral-small3.1:24b", "created_at": "2025-11-24T15:43:38.33241347Z", "message": { "role": "assistant", "content": "Here is the analysis of the provided documents:\n\n| Criterion | Analysis " }, "done": false } ``` I also tested multiple `temperature` values: `0.1`, `0.2`, `0.3`, `0.5`, `0.8`. When using `"stream": true`, I waited more than 30 minutes but I did not receive a complete response. ### Issue I don’t understand why the model gets stuck for this specific prompt, or why it sometimes stops generating output partway through the table. Any insight into why this happens or how to fix it would be appreciated. ### Relevant log output ```shell ``` ### OS Linux, Docker ### GPU Nvidia ### CPU _No response_ ### Ollama version 0.9.0
GiteaMirror added the bug label 2026-04-29 08:38:39 -05:00
Author
Owner

@Bottlecap202 commented on GitHub (Nov 24, 2025):

Devstral -} IBM model

On Mon, Nov 24, 2025, 10:19 AM Zeynep Balikci @.***>
wrote:

ZeyBal created an issue (ollama/ollama#13229)
https://github.com/ollama/ollama/issues/13229
What is the issue? Problem: Inconsistent Responses and done:false When
Analyzing Multiple Files

I want to analyze a list of files using a specific prompt, where I ask the
model to return a table with the following columns:

  • The first column corresponds to the criterion.
  • The second column contains the model’s analysis based on the
    documents. If no information is available, it should return -.
  • The third column contains the associated quotation extracted from
    the document.

After extracting the text from all files, I send the files' content (in
Markdown) along with my prompt to the Ollama endpoint /api/chat.

However, I encounter inconsistent behavior, sometimes I get the full
table, but sometimes I don’t.
Sometimes the response contains "done": false, and the model output
always stops at the Analysis column.
In the Docker logs of my Ollama container, the request still returns HTTP
200.
What I tested

I tested using a Python script and also with Postman.
I tested both endpoints: /api/chat and /api/generate.

Example request:

{
"model": "mistral-small3.1:24b",
"messages": [
{ "role": "user", "content": "File names and file contents with the prompt " }
],
"stream": false,
"options": {
"num_ctx": 128000,
"temperature": 0.2
}
}

Example response:

{
"model": "mistral-small3.1:24b",
"created_at": "2025-11-24T15:43:38.33241347Z",
"message": {
"role": "assistant",
"content": "Voici l'analyse des documents fournis :\n\n| Critère | Analyse "
},
"done": false
}

I also tested multiple temperatures: 0.1, 0.2, 0.3, 0.5, 0.8.

When using "stream": true, I waited more than 30 minutes but did not
receive a complete response.
Issue

I don’t understand why the model gets stuck for this specific prompt, or
why it sometimes stops generating output partway through the table. Any
insight into why this happens or how to fix it would be appreciated.
Relevant log output

OS

Linux, Docker
GPU

Nvidia
CPU

No response
Ollama version

0.9.0


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/13229, or unsubscribe
https://github.com/notifications/unsubscribe-auth/BDHQPUKOHI37J5T3L527THD36MV2NAVCNFSM6AAAAACNBNDVLWVHI2DSMVQWIX3LMV43ASLTON2WKOZTGY2TSNJZGU2DINY
.
You are receiving this because you are subscribed to this thread.Message
ID: @.***>

<!-- gh-comment-id:3571706580 --> @Bottlecap202 commented on GitHub (Nov 24, 2025): Devstral -} IBM model On Mon, Nov 24, 2025, 10:19 AM Zeynep Balikci ***@***.***> wrote: > *ZeyBal* created an issue (ollama/ollama#13229) > <https://github.com/ollama/ollama/issues/13229> > What is the issue? Problem: Inconsistent Responses and done:false When > Analyzing Multiple Files > > I want to analyze a list of files using a specific prompt, where I ask the > model to return a table with the following columns: > > - The first column corresponds to the criterion. > - The second column contains the model’s analysis based on the > documents. If no information is available, it should return -. > - The third column contains the associated quotation extracted from > the document. > > After extracting the text from all files, I send the files' content (in > Markdown) along with my prompt to the Ollama endpoint /api/chat. > > However, I encounter inconsistent behavior, sometimes I get the full > table, but sometimes I don’t. > Sometimes the response contains "done": false, and the model output > always stops at the *Analysis* column. > In the Docker logs of my Ollama container, the request still returns HTTP > 200. > What I tested > > I tested using a Python script and also with Postman. > I tested both endpoints: /api/chat and /api/generate. > > Example request: > > { > "model": "mistral-small3.1:24b", > "messages": [ > { "role": "user", "content": "File names and file contents with the prompt " } > ], > "stream": false, > "options": { > "num_ctx": 128000, > "temperature": 0.2 > } > } > > Example response: > > { > "model": "mistral-small3.1:24b", > "created_at": "2025-11-24T15:43:38.33241347Z", > "message": { > "role": "assistant", > "content": "Voici l'analyse des documents fournis :\n\n| Critère | Analyse " > }, > "done": false > } > > I also tested multiple temperatures: 0.1, 0.2, 0.3, 0.5, 0.8. > > When using "stream": true, I waited more than 30 minutes but did not > receive a complete response. > Issue > > I don’t understand why the model gets stuck for this specific prompt, or > why it sometimes stops generating output partway through the table. Any > insight into why this happens or how to fix it would be appreciated. > Relevant log output > > OS > > Linux, Docker > GPU > > Nvidia > CPU > > *No response* > Ollama version > > 0.9.0 > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/13229>, or unsubscribe > <https://github.com/notifications/unsubscribe-auth/BDHQPUKOHI37J5T3L527THD36MV2NAVCNFSM6AAAAACNBNDVLWVHI2DSMVQWIX3LMV43ASLTON2WKOZTGY2TSNJZGU2DINY> > . > You are receiving this because you are subscribed to this thread.Message > ID: ***@***.***> >
Author
Owner

@Bottlecap202 commented on GitHub (Nov 24, 2025):

English only, also*

On Mon, Nov 24, 2025, 10:34 AM charlie getman @.***>
wrote:

Devstral -} IBM model

On Mon, Nov 24, 2025, 10:19 AM Zeynep Balikci @.***>
wrote:

ZeyBal created an issue (ollama/ollama#13229)
https://github.com/ollama/ollama/issues/13229
What is the issue? Problem: Inconsistent Responses and done:false When
Analyzing Multiple Files

I want to analyze a list of files using a specific prompt, where I ask
the model to return a table with the following columns:

  • The first column corresponds to the criterion.
  • The second column contains the model’s analysis based on the
    documents. If no information is available, it should return -.
  • The third column contains the associated quotation extracted from
    the document.

After extracting the text from all files, I send the files' content (in
Markdown) along with my prompt to the Ollama endpoint /api/chat.

However, I encounter inconsistent behavior, sometimes I get the full
table, but sometimes I don’t.
Sometimes the response contains "done": false, and the model output
always stops at the Analysis column.
In the Docker logs of my Ollama container, the request still returns HTTP
200.
What I tested

I tested using a Python script and also with Postman.
I tested both endpoints: /api/chat and /api/generate.

Example request:

{
"model": "mistral-small3.1:24b",
"messages": [
{ "role": "user", "content": "File names and file contents with the prompt " }
],
"stream": false,
"options": {
"num_ctx": 128000,
"temperature": 0.2
}
}

Example response:

{
"model": "mistral-small3.1:24b",
"created_at": "2025-11-24T15:43:38.33241347Z",
"message": {
"role": "assistant",
"content": "Voici l'analyse des documents fournis :\n\n| Critère | Analyse "
},
"done": false
}

I also tested multiple temperatures: 0.1, 0.2, 0.3, 0.5, 0.8.

When using "stream": true, I waited more than 30 minutes but did not
receive a complete response.
Issue

I don’t understand why the model gets stuck for this specific prompt, or
why it sometimes stops generating output partway through the table. Any
insight into why this happens or how to fix it would be appreciated.
Relevant log output

OS

Linux, Docker
GPU

Nvidia
CPU

No response
Ollama version

0.9.0


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/13229, or unsubscribe
https://github.com/notifications/unsubscribe-auth/BDHQPUKOHI37J5T3L527THD36MV2NAVCNFSM6AAAAACNBNDVLWVHI2DSMVQWIX3LMV43ASLTON2WKOZTGY2TSNJZGU2DINY
.
You are receiving this because you are subscribed to this thread.Message
ID: @.***>

<!-- gh-comment-id:3571708545 --> @Bottlecap202 commented on GitHub (Nov 24, 2025): English only, also* On Mon, Nov 24, 2025, 10:34 AM charlie getman ***@***.***> wrote: > Devstral -} IBM model > > On Mon, Nov 24, 2025, 10:19 AM Zeynep Balikci ***@***.***> > wrote: > >> *ZeyBal* created an issue (ollama/ollama#13229) >> <https://github.com/ollama/ollama/issues/13229> >> What is the issue? Problem: Inconsistent Responses and done:false When >> Analyzing Multiple Files >> >> I want to analyze a list of files using a specific prompt, where I ask >> the model to return a table with the following columns: >> >> - The first column corresponds to the criterion. >> - The second column contains the model’s analysis based on the >> documents. If no information is available, it should return -. >> - The third column contains the associated quotation extracted from >> the document. >> >> After extracting the text from all files, I send the files' content (in >> Markdown) along with my prompt to the Ollama endpoint /api/chat. >> >> However, I encounter inconsistent behavior, sometimes I get the full >> table, but sometimes I don’t. >> Sometimes the response contains "done": false, and the model output >> always stops at the *Analysis* column. >> In the Docker logs of my Ollama container, the request still returns HTTP >> 200. >> What I tested >> >> I tested using a Python script and also with Postman. >> I tested both endpoints: /api/chat and /api/generate. >> >> Example request: >> >> { >> "model": "mistral-small3.1:24b", >> "messages": [ >> { "role": "user", "content": "File names and file contents with the prompt " } >> ], >> "stream": false, >> "options": { >> "num_ctx": 128000, >> "temperature": 0.2 >> } >> } >> >> Example response: >> >> { >> "model": "mistral-small3.1:24b", >> "created_at": "2025-11-24T15:43:38.33241347Z", >> "message": { >> "role": "assistant", >> "content": "Voici l'analyse des documents fournis :\n\n| Critère | Analyse " >> }, >> "done": false >> } >> >> I also tested multiple temperatures: 0.1, 0.2, 0.3, 0.5, 0.8. >> >> When using "stream": true, I waited more than 30 minutes but did not >> receive a complete response. >> Issue >> >> I don’t understand why the model gets stuck for this specific prompt, or >> why it sometimes stops generating output partway through the table. Any >> insight into why this happens or how to fix it would be appreciated. >> Relevant log output >> >> OS >> >> Linux, Docker >> GPU >> >> Nvidia >> CPU >> >> *No response* >> Ollama version >> >> 0.9.0 >> >> — >> Reply to this email directly, view it on GitHub >> <https://github.com/ollama/ollama/issues/13229>, or unsubscribe >> <https://github.com/notifications/unsubscribe-auth/BDHQPUKOHI37J5T3L527THD36MV2NAVCNFSM6AAAAACNBNDVLWVHI2DSMVQWIX3LMV43ASLTON2WKOZTGY2TSNJZGU2DINY> >> . >> You are receiving this because you are subscribed to this thread.Message >> ID: ***@***.***> >> >
Author
Owner

@ZeyBal commented on GitHub (Nov 25, 2025):

Devstral -} IBM model

I don't understand — what do you mean by that?

<!-- gh-comment-id:3575031107 --> @ZeyBal commented on GitHub (Nov 25, 2025): > Devstral -} IBM model > […](#) I don't understand — what do you mean by that?
Author
Owner

@ZeyBal commented on GitHub (Nov 25, 2025):

English only, also*

On Mon, Nov 24, 2025, 10:34 AM charlie getman @.***>
wrote:

I don't understand — what do you mean by that? I wrote it in English

<!-- gh-comment-id:3575040296 --> @ZeyBal commented on GitHub (Nov 25, 2025): > English only, also* > > On Mon, Nov 24, 2025, 10:34 AM charlie getman ***@***.***> > wrote: > […](#) I don't understand — what do you mean by that? I wrote it in English
Author
Owner

@ZeyBal commented on GitHub (Nov 26, 2025):

I suppose it's because of the repeated whitespace.
Ollama did limit it to 30 with tokenRepeat.
How can I modify this value? In my Ollama Docker container? Or can I add an option when calling it with Python?

<!-- gh-comment-id:3580542068 --> @ZeyBal commented on GitHub (Nov 26, 2025): I suppose it's because of the repeated whitespace. Ollama did limit it to 30 with [tokenRepeat](https://github.com/ollama/ollama/blob/47e272c35a9d9b5780826a4965f3115908187a7b/llm/server.go#L1590). How can I modify this value? In my Ollama Docker container? Or can I add an option when calling it with Python?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#55259