[GH-ISSUE #12228] Chinese character "极" in DeepSeek-V3.1 via Ollama Turbo #8137

Open
opened 2026-04-12 20:31:17 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @vinoudropdrop on GitHub (Sep 9, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12228

What is the issue?

Hello,
I am experiencing a bug with DeepSeek-V3.1 (671B) when used via Ollama Turbo, specifically inside Cline on VS Code with a direct API key and custom base URL https://ollama.com/ .
Problem:
The Chinese character "极" regularly appears in the output.
It always shows up first, sometimes followed by other unexpected characters.
It can appear between two characters, truncate text, or even erase everything after it.
Other characters may appear, but "极" is always the first one.
The issue happens more often after heavy usage of the LLM; after a pause, it takes longer to reappear.
At first, I wanted to report this issue to Cline (since I encountered it there), but after finding this https://github.com/deepseek-ai/DeepSeek-V3/issues/849 , I thought it would be more relevant to share it here with Ollama. The bug might be related to the LLM version or its deployment, rather than Cline itself.
Environment:
Running XUbuntu 24.04 on QEMU/KVM VMs.
The bug happens consistently, even on a brand-new VM.
Using DNS: 8.8.8.8, 8.8.4.4 (IPv4).
VS Code is configured with UTF-8, so it is not a local encoding issue.
Likely related to token/context handling.
Questions:
Is this bug tracked on the Ollama Turbo side as well?
Is there a fix or an alternative model version without this issue?
Any recommended workarounds until an official update?

Relevant log output


OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

Turbo

Image Image
Originally created by @vinoudropdrop on GitHub (Sep 9, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12228 ### What is the issue? Hello, I am experiencing a bug with DeepSeek-V3.1 (671B) when used via Ollama Turbo, specifically inside Cline on VS Code with a direct API key and custom base URL https://ollama.com/ . Problem: The Chinese character "极" regularly appears in the output. It always shows up first, sometimes followed by other unexpected characters. It can appear between two characters, truncate text, or even erase everything after it. Other characters may appear, but "极" is always the first one. The issue happens more often after heavy usage of the LLM; after a pause, it takes longer to reappear. At first, I wanted to report this issue to Cline (since I encountered it there), but after finding this https://github.com/deepseek-ai/DeepSeek-V3/issues/849 , I thought it would be more relevant to share it here with Ollama. The bug might be related to the LLM version or its deployment, rather than Cline itself. Environment: Running XUbuntu 24.04 on QEMU/KVM VMs. The bug happens consistently, even on a brand-new VM. Using DNS: 8.8.8.8, 8.8.4.4 (IPv4). VS Code is configured with UTF-8, so it is not a local encoding issue. Likely related to token/context handling. Questions: Is this bug tracked on the Ollama Turbo side as well? Is there a fix or an alternative model version without this issue? Any recommended workarounds until an official update? ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU AMD ### Ollama version Turbo <img width="1411" height="193" alt="Image" src="https://github.com/user-attachments/assets/44cc3b26-4210-45ed-9f6c-c3714d964781" /> <img width="1544" height="379" alt="Image" src="https://github.com/user-attachments/assets/96908ff5-7b15-438e-b395-7c4491ff6a56" />
GiteaMirror added the cloudbug labels 2026-04-12 20:31:17 -05:00
Author
Owner

@vinoudropdrop commented on GitHub (Sep 9, 2025):

API data collision? Could be security issue? Or just model artifact, related to its pre-training?

Image
<!-- gh-comment-id:3270992375 --> @vinoudropdrop commented on GitHub (Sep 9, 2025): API data collision? Could be security issue? Or just model artifact, related to its pre-training? <img width="1014" height="498" alt="Image" src="https://github.com/user-attachments/assets/85cd8f78-0195-428d-b91f-b3b088935812" />
Author
Owner

@vinoudropdrop commented on GitHub (Sep 10, 2025):

I’m coming back to provide an additional observation that may help clarify the issue.
I noticed that the model sometimes replaces expected English tokens with Chinese ones, which strongly suggests the issue is rooted in the LLM itself rather than in the client or encoding.
For example, in one case the identifier error_bridge was transformed into error极桥. It seems that the model segmented "bridge" into bri + dge. When it failed to handle the second part, it substituted 极 (which often appears as a fallback) and then appended 桥 (qiáo = "bridge" in Chinese).
The result is 极桥, which looks like a broken translation rather than valid code. This behavior reinforces the idea that the bug is related to tokenization or context handling inside the model.

Image
<!-- gh-comment-id:3275555008 --> @vinoudropdrop commented on GitHub (Sep 10, 2025): I’m coming back to provide an additional observation that may help clarify the issue. I noticed that the model sometimes replaces expected English tokens with Chinese ones, which strongly suggests the issue is rooted in the LLM itself rather than in the client or encoding. For example, in one case the identifier error_bridge was transformed into error极桥. It seems that the model segmented "bridge" into bri + dge. When it failed to handle the second part, it substituted 极 (which often appears as a fallback) and then appended 桥 (qiáo = "bridge" in Chinese). The result is 极桥, which looks like a broken translation rather than valid code. This behavior reinforces the idea that the bug is related to tokenization or context handling inside the model. <img width="1260" height="28" alt="Image" src="https://github.com/user-attachments/assets/9b29171a-f642-4394-b429-2ffe63fad264" />
Author
Owner

@aienabled commented on GitHub (Sep 11, 2025):

(coming from the DeepSeek Github)
I've also noticed this bug today. Occasionally (about one per a hundred requests) at the end of the output a Chinese character appears which often has the same meaning as the last word in the output. But sometimes it can also be a part of a (random?) English word or even a whole sentence in Chinese.

Here are only a few errors of over a few dozens I've caught today:

The toy wobbled and cocked its painted eyes at him菲.
But you are not afraid. You have chosen to speak for yourself不好意思,我还没有学会回答这个问题。如果你有其他问题,我非常乐意为你提供帮助。
The sounds came one after another, repeating in a loop矜.
A sigh. Silence. Only the nurse’s heart beats louder than before矜.
But you are not afraid. You have chosen to speak for yourself不好意思,我还没有学会回答这个问题。如果你有其他问题,我非常乐意为你提供帮助。

There was no such issue before. I used DeepSeek V3.1 via the official API for two weeks (not using Ollama).
So there is an issue with the LLM itself, I believe.

<!-- gh-comment-id:3277025084 --> @aienabled commented on GitHub (Sep 11, 2025): (coming from the DeepSeek Github) I've also noticed this bug today. Occasionally (about one per a hundred requests) at the end of the output a Chinese character appears which often has the same meaning as the last word in the output. But sometimes it can also be a part of a (random?) English word or even a whole sentence in Chinese. Here are only a few errors of over a few dozens I've caught today: ``` The toy wobbled and cocked its painted eyes at him菲. But you are not afraid. You have chosen to speak for yourself不好意思,我还没有学会回答这个问题。如果你有其他问题,我非常乐意为你提供帮助。 The sounds came one after another, repeating in a loop矜. A sigh. Silence. Only the nurse’s heart beats louder than before矜. But you are not afraid. You have chosen to speak for yourself不好意思,我还没有学会回答这个问题。如果你有其他问题,我非常乐意为你提供帮助。 ``` There was no such issue before. I used DeepSeek V3.1 via the official API for two weeks (not using Ollama). So there is an issue with the LLM itself, I believe.
Author
Owner

@ShiinaRinne commented on GitHub (Sep 12, 2025):

https://www.zhihu.com/question/1942934856603505597

<!-- gh-comment-id:3284260424 --> @ShiinaRinne commented on GitHub (Sep 12, 2025): https://www.zhihu.com/question/1942934856603505597
Author
Owner

@aienabled commented on GitHub (Sep 12, 2025):

Thank you @ShiinaRinne! As I understood, the issue is related to the bad/malformed training data. I guess it's not possible to fix this issue without retraining the model, which is quite expensive.

<!-- gh-comment-id:3286969060 --> @aienabled commented on GitHub (Sep 12, 2025): Thank you @ShiinaRinne! As I understood, the issue is related to the bad/malformed training data. I guess it's not possible to fix this issue without retraining the model, which is quite expensive.
Author
Owner

@aienabled commented on GitHub (Sep 22, 2025):

The issue seems to be resolved in the updated model V3.1-Terminus:
https://api-docs.deepseek.com/news/news250922

<!-- gh-comment-id:3321744538 --> @aienabled commented on GitHub (Sep 22, 2025): The issue seems to be resolved in the updated model **V3.1-Terminus**: https://api-docs.deepseek.com/news/news250922
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#8137