[GH-ISSUE #3411] Anyone else noticed deteriorating quality of response with subsequent/looping generations? #2102

Open
opened 2026-04-12 12:20:39 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @130jd on GitHub (Mar 30, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3411

I've been running ollama.generate() in a for loop, feeding it a new prompt each time and triggering a new generation. I have stream=False, raw=False, model="mistral"

Noticing that the first couple of prompts, it follows my instructions perfectly, but after the 4th or 5th prompt the response quality starts to deteriorate. It starts reciting back to me the examples that were in my instructions. I'm not sure why this would be the case, as I thought generations would be independent of ones that came before (since I'm not in streaming/chat mode).

I've tried to make sure it isn't passing back the context parameter from prior generations into subsequent generations, by passing in the optional parameter context=[] at the start of every loop, but it doesn't seem to stop the deterioration in later generations.

Another weird thing I noticed: I've set temperature=0 because I want the outputs to be reproducible. I know there'll still be some fluctuations with temperature=0, but in my case, I'll run a single prompt for several generations in a row, get the same output, all good. Then I'll wait a bit (several minutes), run other prompts / generations, then come back to the original prompt. Now the output is wildly different, and absolutely nothing has changed in my script.

This makes me suspect there's some hidden context being passed/not passed from prior generations to influence subsequent generations. Is that a known issue or am I just hallucinating here? Thank you.

Originally created by @130jd on GitHub (Mar 30, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3411 I've been running `ollama.generate()` in a `for` loop, feeding it a new prompt each time and triggering a new generation. I have `stream=False`, `raw=False`, `model="mistral"` Noticing that the first couple of prompts, it follows my instructions perfectly, but after the 4th or 5th prompt the response quality starts to deteriorate. It starts reciting back to me the examples that were in my instructions. I'm not sure why this would be the case, as I thought generations would be independent of ones that came before (since I'm not in streaming/chat mode). I've tried to make sure it isn't passing back the context parameter from prior generations into subsequent generations, by passing in the optional parameter `context=[]` at the start of every loop, but it doesn't seem to stop the deterioration in later generations. Another weird thing I noticed: I've set `temperature=0` because I want the outputs to be reproducible. I know there'll still be some fluctuations with `temperature=0`, but in my case, I'll run a single prompt for several generations in a row, get the same output, all good. Then I'll wait a bit (several minutes), run other prompts / generations, then come back to the original prompt. Now the output is *wildly* different, and absolutely nothing has changed in my script. This makes me suspect there's some hidden context being passed/not passed from prior generations to influence subsequent generations. Is that a known issue or am I just hallucinating here? Thank you.
Author
Owner

@130jd commented on GitHub (Mar 30, 2024):

Still happening... example here.

Prompt:

Extract just the team name from the following text: "Goalkeeper at Manchester United"

Response:

Manchester United

Now wait a bit, re-run the script with the exact same settings (temperature=0) and prompt.

Response:

"I can't say what it's like to be a goalkeeper. But if you want more ways to think about it, etc."

Completely non-sequitur...

Then when I shut down ollama and restart it, now it works properly again. Would love to understand what's going on here.

<!-- gh-comment-id:2028485707 --> @130jd commented on GitHub (Mar 30, 2024): Still happening... example here. Prompt: > Extract just the team name from the following text: "Goalkeeper at Manchester United" Response: > Manchester United Now wait a bit, re-run the script with the exact same settings (temperature=0) and prompt. Response: > "I can't say what it's like to be a goalkeeper. But if you want more ways to think about it, etc." Completely non-sequitur... Then when I shut down ollama and restart it, now it works properly again. Would love to understand what's going on here.
Author
Owner

@mrjesseking commented on GitHub (Apr 9, 2024):

Yes, I have been experiencing the same issue. I've tried rebuilding the model in each loop with the hopes that it would reset the model but it has not helped. As the model runs it seems to get worse and worse.

<!-- gh-comment-id:2045918198 --> @mrjesseking commented on GitHub (Apr 9, 2024): Yes, I have been experiencing the same issue. I've tried rebuilding the model in each loop with the hopes that it would reset the model but it has not helped. As the model runs it seems to get worse and worse.
Author
Owner

@VvanGemert commented on GitHub (Apr 21, 2024):

Hi @130jd and @mrjesseking, I've also experienced this issue and found out that if I increase the keep alive timeout it will no longer happens. The default keep alive timeout is 5 minutes and somehow seems to change the output quality after those 5 minutes. I'm not sure if this is the solution, but it helped for me and haven't seen it fade in result quality anymore. You can set it with the OLLAMA_KEEP_ALIVE env variable.

<!-- gh-comment-id:2068148072 --> @VvanGemert commented on GitHub (Apr 21, 2024): Hi @130jd and @mrjesseking, I've also experienced this issue and found out that if I increase the keep alive timeout it will no longer happens. The default keep alive timeout is 5 minutes and somehow seems to change the output quality after those 5 minutes. I'm not sure if this is the solution, but it helped for me and haven't seen it fade in result quality anymore. You can set it with the `OLLAMA_KEEP_ALIVE` env variable.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#2102