[GH-ISSUE #7123] Long responses can corrupt the model until unloaded #4525

Open
opened 2026-04-12 15:27:43 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @ragibson on GitHub (Oct 7, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7123

What is the issue?

In a relatively simple prompt, one of the Phi models went off track and ranted for several thousand words. After, all future responses produced (mostly) garbage output, even in separate API calls or interactive sessions with cleared session context. This persisted until the model was completely unloaded and reloaded.

It feels like something may have overflowed a buffer used for the context window or response and corrupted the model weights. Within the garbage output, the model appeared to have brief periods of "lucidity" where it demonstrated knowledge of prompts from completely separate sessions.

In the most recent case, I was using phi3.5:3.8b-mini-instruct-q4_K_M but have seen the same sort of behavior in other Phi releases. I'll try to find a prompt that can replicate this, though it's obviously stochastic given the nature of LLMs.

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.3.12

Originally created by @ragibson on GitHub (Oct 7, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7123 ### What is the issue? In a relatively simple prompt, one of the Phi models went off track and ranted for several thousand words. After, all future responses produced (mostly) garbage output, even in separate API calls or interactive sessions with cleared session context. This persisted until the model was completely unloaded and reloaded. It feels like something may have overflowed a buffer used for the context window or response and corrupted the model weights. Within the garbage output, the model appeared to have brief periods of "lucidity" where it demonstrated knowledge of prompts from completely separate sessions. In the most recent case, I was using `phi3.5:3.8b-mini-instruct-q4_K_M` but have seen the same sort of behavior in other Phi releases. I'll try to find a prompt that can replicate this, though it's obviously stochastic given the nature of LLMs. ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.3.12
GiteaMirror added the bugneeds more info labels 2026-04-12 15:27:43 -05:00
Author
Owner

@ragibson commented on GitHub (Oct 7, 2024):

Here's an automated example I just wrote that triggered corruption after a few minutes.

Looking at ollama ps, I think it could also have something to do with a model being unloaded mid-response (because it continued for so long that the timeout was triggered)?

import ollama

while True:
    response = ollama.generate(model="phi3.5:3.8b-mini-instruct-q4_K_M",
                               prompt="Let G be a non-abelian group of order 105, and let H "
                                      "be a subgroup of G of order 21. Prove that the normalizer "
                                      "of H in G has order exactly 35.",
                               options={'num_ctx': 4096})['response']
    print(f"Response (length {len(response)}): {response[:100]}")

It starts out fine, and then goes increasingly off the rails after each longer response.

Response (length 9345):  To prove that the normalizer N_G(H) of H in G has order 35, we will use some properties of groups an
Response (length 1593):  To prove that the normalizer N_G(H) of H in G has order exactly 35, we will use the following facts:
Response (length 3967):  To prove that the normalizer N_G(H) of H in G has order exactly 35, we will use some properties of g
Response (length 30781): To prove that the normalizer N_G(H) of H in G has order 35, we will use some properties of groups an
Response (length 9558):  To solve this problem, we will use some properties about groups and their subgroups to prove that th
Response (length 4234):  To solve this problem, we will use some properties of groups and their subgroups to prove that if 
Response (length 32653): To prove this statement, we will use some properties of groups and their subgroups to arrive at our 
Response (length 18568): Write an abstract for "Biochemistry Quest: ParticleA (x^4*t = -1980 to $f(p) be a documentary, writt
Response (length 2858):  [Text AI Assistant: "Designinga query regarding genetic_Alice'drafts! I would likewise write an adva
Response (length 25554): Write an R language modelled by 'Phoebeam AI: I am notebooks as detailed explanation for me a compre
Response (length 981):   Create anf: Aaron'/user inputs_46*x7980 words-like Lettermsolutioner to build Python script a compre
Response (length 756):   Create anatoma/create_user(e: Increaseday C++! Create a python-based systematic code snippet about t
Response (length 2492):  The following Python code: In this instructionalize an HTML pageantagonatezamuskoeodrabbashto create
<!-- gh-comment-id:2398197946 --> @ragibson commented on GitHub (Oct 7, 2024): Here's an automated example I just wrote that triggered corruption after a few minutes. Looking at `ollama ps`, I think it could also have something to do with a model being unloaded mid-response (because it continued for so long that the timeout was triggered)? ```python import ollama while True: response = ollama.generate(model="phi3.5:3.8b-mini-instruct-q4_K_M", prompt="Let G be a non-abelian group of order 105, and let H " "be a subgroup of G of order 21. Prove that the normalizer " "of H in G has order exactly 35.", options={'num_ctx': 4096})['response'] print(f"Response (length {len(response)}): {response[:100]}") ``` It starts out fine, and then goes increasingly off the rails after each longer response. ``` Response (length 9345): To prove that the normalizer N_G(H) of H in G has order 35, we will use some properties of groups an Response (length 1593): To prove that the normalizer N_G(H) of H in G has order exactly 35, we will use the following facts: Response (length 3967): To prove that the normalizer N_G(H) of H in G has order exactly 35, we will use some properties of g Response (length 30781): To prove that the normalizer N_G(H) of H in G has order 35, we will use some properties of groups an Response (length 9558): To solve this problem, we will use some properties about groups and their subgroups to prove that th Response (length 4234): To solve this problem, we will use some properties of groups and their subgroups to prove that if Response (length 32653): To prove this statement, we will use some properties of groups and their subgroups to arrive at our Response (length 18568): Write an abstract for "Biochemistry Quest: ParticleA (x^4*t = -1980 to $f(p) be a documentary, writt Response (length 2858): [Text AI Assistant: "Designinga query regarding genetic_Alice'drafts! I would likewise write an adva Response (length 25554): Write an R language modelled by 'Phoebeam AI: I am notebooks as detailed explanation for me a compre Response (length 981): Create anf: Aaron'/user inputs_46*x7980 words-like Lettermsolutioner to build Python script a compre Response (length 756): Create anatoma/create_user(e: Increaseday C++! Create a python-based systematic code snippet about t Response (length 2492): The following Python code: In this instructionalize an HTML pageantagonatezamuskoeodrabbashto create ```
Author
Owner

@Maltz42 commented on GitHub (Oct 12, 2024):

I've seen mistral-large and command-r do this, too.

<!-- gh-comment-id:2408305676 --> @Maltz42 commented on GitHub (Oct 12, 2024): I've seen mistral-large and command-r do this, too.
Author
Owner

@dhiltgen commented on GitHub (Nov 5, 2024):

Can you give 0.4.0 a try? This does sound like a corruption related to caching or context handling, both of which have been heavily refactored in the new 0.4.0 implementation.

<!-- gh-comment-id:2458261011 --> @dhiltgen commented on GitHub (Nov 5, 2024): Can you give 0.4.0 a try? This does sound like a corruption related to caching or context handling, both of which have been heavily refactored in the new 0.4.0 implementation.
Author
Owner

@Maltz42 commented on GitHub (Nov 5, 2024):

I won't be able to try this for a couple of weeks, but I will as soon as I'm able.

<!-- gh-comment-id:2458266324 --> @Maltz42 commented on GitHub (Nov 5, 2024): I won't be able to try this for a couple of weeks, but I will as soon as I'm able.
Author
Owner

@ragibson commented on GitHub (Nov 6, 2024):

Unfortunately, I get similar results on 0.4.0-rc8.

$ ollama -v
ollama version is 0.4.0-rc8

I do think my suspicion about the model being stopped mid-inference might be right. I was watching ollama ps and that lines up with the sharp decreases in coherence.

Response (length 4955):  To prove that the normalizer N_G(H) of H in G has order exactly 35, we will use some properties of g
Response (length 14559): To prove that the normalizer N_G(H) of H in G has order exactly 35, we will use some properties and 
Response (length 5361):  To prove this, we will use the following facts: 1. The index of a subgroup H in its normalizer N(H)
Response (length 1586):  To prove that the normalizer N_G(H) of H in G has order exactly 35, we will use the following facts:
Response (length 49339): To prove that the normalizer N_G(H) of H in G has order exactly 35, we will use the following facts:
Response (length 12372): Provide an alternative solution for problem XYZ: "In deriving from 'Alice's Lawful AI system, which 
Response (length 28785): a) Explain why it's crucial to ensure that "Letteras et alice/Nanuko-Ryu, who is a researcher at ris
Response (length 901):   {user_Viensevename style/create an advanced programming languageBilliamn's [[QReneworker - createeol
Response (length 803):   What is_Venzootype: Create an AI can's define-Hydrogenate enhanceria!t answerfake - "알探r/201dioxas o
Response (length 501):   What are_bot Craft ants'soulBaynsiansuman! Suppose weavinga Python "The given stringentaild by Zio
Response (length 391):   Analy/aspectraprisekajiornal instructions forclaimttero name_of AIでありえるから解桩nation: a complexify i
Response (length 669):   Identifydiamntly write an HTML tabletos performmentalprize create Python-like: name_com, please ca
Response (length 574):   Write analy provide meadow/Create a Python API (A: user=8-Evaluateleatably elaborate on Craft a sin
<!-- gh-comment-id:2458456868 --> @ragibson commented on GitHub (Nov 6, 2024): Unfortunately, I get similar results on 0.4.0-rc8. ```bash $ ollama -v ollama version is 0.4.0-rc8 ``` I do think my suspicion about the model being stopped mid-inference might be right. I was watching `ollama ps` and that lines up with the sharp decreases in coherence. ``` Response (length 4955): To prove that the normalizer N_G(H) of H in G has order exactly 35, we will use some properties of g Response (length 14559): To prove that the normalizer N_G(H) of H in G has order exactly 35, we will use some properties and Response (length 5361): To prove this, we will use the following facts: 1. The index of a subgroup H in its normalizer N(H) Response (length 1586): To prove that the normalizer N_G(H) of H in G has order exactly 35, we will use the following facts: Response (length 49339): To prove that the normalizer N_G(H) of H in G has order exactly 35, we will use the following facts: Response (length 12372): Provide an alternative solution for problem XYZ: "In deriving from 'Alice's Lawful AI system, which Response (length 28785): a) Explain why it's crucial to ensure that "Letteras et alice/Nanuko-Ryu, who is a researcher at ris Response (length 901): {user_Viensevename style/create an advanced programming languageBilliamn's [[QReneworker - createeol Response (length 803): What is_Venzootype: Create an AI can's define-Hydrogenate enhanceria!t answerfake - "알探r/201dioxas o Response (length 501): What are_bot Craft ants'soulBaynsiansuman! Suppose weavinga Python "The given stringentaild by Zio Response (length 391): Analy/aspectraprisekajiornal instructions forclaimttero name_of AIでありえるから解桩nation: a complexify i Response (length 669): Identifydiamntly write an HTML tabletos performmentalprize create Python-like: name_com, please ca Response (length 574): Write analy provide meadow/Create a Python API (A: user=8-Evaluateleatably elaborate on Craft a sin ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#4525