[GH-ISSUE #5641] Ollama Puts out Gibberish After a While. #29277

Closed
opened 2026-04-22 08:00:20 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @chigkim on GitHub (Jul 11, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5641

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

When I run the MMLU Pro benchmark on phi3 or deepseek-coder-v2 with this script that uses OpenAI compatible API, it runs for a while.
Then, all of sudden, it starts to output:

deepseek-coder-v2:16b-lite-instruct-q8_0
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

Phi3:3.8b-mini-128k-instruct-q8_0
<unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk>

The entire response contains nothing else but those characters.
Once it happens once, it outputs the same response for every question until the end.
I have the environment variable: export OLLAMA_NUM_PARALLEL=4, and I'm running the script with --parallel 4 option.
According to token usage Ollama returns, the prompt for each question never goes above 2048 tokens.
So far, I saw this happens on my mac with M3 max 64gb as well as Runpod instances with rtx-3090 and rtx-4090.
This is going to be a hard bug to track, because it only happens sometimes, and you have to run it for a while before it happens.
Does anyone have any suspicion on what might cause this?

OS

Linux, macOS

GPU

Nvidia, Apple

CPU

AMD, Apple

Ollama version

0.1.48, 0.2.1

Originally created by @chigkim on GitHub (Jul 11, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5641 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? When I run the MMLU Pro benchmark on phi3 or deepseek-coder-v2 with [this script](https://github.com/chigkim/Ollama-MMLU-Pro/) that uses OpenAI compatible API, it runs for a while. Then, all of sudden, it starts to output: deepseek-coder-v2:16b-lite-instruct-q8_0 `@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@` Phi3:3.8b-mini-128k-instruct-q8_0 `<unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk>` The entire response contains nothing else but those characters. Once it happens once, it outputs the same response for every question until the end. I have the environment variable: `export OLLAMA_NUM_PARALLEL=4`, and I'm running the script with --parallel 4 option. According to token usage Ollama returns, the prompt for each question never goes above 2048 tokens. So far, I saw this happens on my mac with M3 max 64gb as well as Runpod instances with rtx-3090 and rtx-4090. This is going to be a hard bug to track, because it only happens sometimes, and you have to run it for a while before it happens. Does anyone have any suspicion on what might cause this? ### OS Linux, macOS ### GPU Nvidia, Apple ### CPU AMD, Apple ### Ollama version 0.1.48, 0.2.1
GiteaMirror added the needs more infobug labels 2026-04-22 08:00:21 -05:00
Author
Owner

@chrisoutwright commented on GitHub (Jul 16, 2024):

me as well,
before in v0.2.1
image

and now with latest version 0.2.5 all gibberish (also in deepseeker and qwen2):
image

<!-- gh-comment-id:2231281796 --> @chrisoutwright commented on GitHub (Jul 16, 2024): me as well, before in v0.2.1 ![image](https://github.com/user-attachments/assets/8a77d310-fdb8-4fff-b931-22710164edf6) and now with latest version 0.2.5 all gibberish (also in deepseeker and qwen2): ![image](https://github.com/user-attachments/assets/981c4468-f672-48b7-8e6f-2cdd9318921f)
Author
Owner

@dhiltgen commented on GitHub (Oct 24, 2024):

This might be resolved in the new Go server. Please give the new 0.4.0 RC a try and let us know how it goes.

https://github.com/ollama/ollama/releases

<!-- gh-comment-id:2434130960 --> @dhiltgen commented on GitHub (Oct 24, 2024): This might be resolved in the new Go server. Please give the new 0.4.0 RC a try and let us know how it goes. https://github.com/ollama/ollama/releases
Author
Owner

@pdevine commented on GitHub (Oct 3, 2025):

This issue is pretty stale; I just tested out both models and they're working great. There is a slight issue running deepseek-coder-v2:16b-lite-instruct-q8_0 on the new engine vs the legacy llama.cpp engine that's being ironed out, but it will run fine on the legacy engine.

I'm going to go ahead and close the issue.

<!-- gh-comment-id:3367219827 --> @pdevine commented on GitHub (Oct 3, 2025): This issue is pretty stale; I just tested out both models and they're working great. There is a slight issue running `deepseek-coder-v2:16b-lite-instruct-q8_0` on the new engine vs the legacy llama.cpp engine that's being ironed out, but it will run fine on the legacy engine. I'm going to go ahead and close the issue.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#29277