[GH-ISSUE #4916] Newer models are having problems #28866

Closed
opened 2026-04-22 07:25:31 -05:00 by GiteaMirror · 11 comments
Owner

Originally created by @iplayfast on GitHub (Jun 7, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4916

What is the issue?

ollama version is 0.1.41
ollama run granite-code
pulling manifest 
pulling 02ab8cd2f514... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████▏ 2.0 GB                         
pulling 0d7c97d535b6... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████▏   26 B                         
pulling e50df8490144... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████▏  123 B                         
pulling 9893bb2c2917... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████▏  108 B                         
pulling 22b176fd8ef6... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████▏  485 B                         
verifying sha256 digest 
writing manifest 
removing any unused layers 
success 
>>> hello
Hola, ¿en qué puedo ayudarte?

>>> please respond in english
Hello! ¿What can I help you with in English?

qwen2 was giving garbage and then after doing the above, started behaving.

ollama run deepseek-v2
>>> hello
你好!有什么我可以帮助你的吗?

>>> please respond in english
当然,我可以用英语回答您的问题。如果您有任何问题或需要帮助,请随时告诉我。

llama3 behaves correctly through all this.

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.1.41

Originally created by @iplayfast on GitHub (Jun 7, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4916 ### What is the issue? ```ollama --version ollama version is 0.1.41 ``` ``` ollama run granite-code pulling manifest pulling 02ab8cd2f514... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████▏ 2.0 GB pulling 0d7c97d535b6... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████▏ 26 B pulling e50df8490144... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████▏ 123 B pulling 9893bb2c2917... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████▏ 108 B pulling 22b176fd8ef6... 100% ▕████████████████████████████████████████████████████████████████████████████████████████████████▏ 485 B verifying sha256 digest writing manifest removing any unused layers success >>> hello Hola, ¿en qué puedo ayudarte? >>> please respond in english Hello! ¿What can I help you with in English? ``` qwen2 was giving garbage and then after doing the above, started behaving. ``` ollama run deepseek-v2 >>> hello 你好!有什么我可以帮助你的吗? >>> please respond in english 当然,我可以用英语回答您的问题。如果您有任何问题或需要帮助,请随时告诉我。 ``` llama3 behaves correctly through all this. ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.1.41
GiteaMirror added the bug label 2026-04-22 07:25:31 -05:00
Author
Owner

@Starlitnightly commented on GitHub (Jun 8, 2024):

I meet the same problem

verifying sha256 digest 
writing manifest 
removing any unused layers 
success 
root@ab-Z370-HD3:/mnt/home/zehuazeng# ollama run qwen2:7b
>>> hello
GGJZ:GDSN

>>> please respond in english
GladGDSGtoNN assistGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

>>> hello
GGJGGGGGK,GGGGGGGGGG!GGG
<!-- gh-comment-id:2155860968 --> @Starlitnightly commented on GitHub (Jun 8, 2024): I meet the same problem ```shell verifying sha256 digest writing manifest removing any unused layers success root@ab-Z370-HD3:/mnt/home/zehuazeng# ollama run qwen2:7b >>> hello GGJZ:GDSN >>> please respond in english GladGDSGtoNN assistGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG >>> hello GGJGGGGGK,GGGGGGGGGG!GGG ```
Author
Owner

@Axenide commented on GitHub (Jun 8, 2024):

I meet the same problem

verifying sha256 digest 
writing manifest 
removing any unused layers 
success 
root@ab-Z370-HD3:/mnt/home/zehuazeng# ollama run qwen2:7b
>>> hello
GGJZ:GDSN

>>> please respond in english
GladGDSGtoNN assistGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

>>> hello
GGJGGGGGK,GGGGGGGGGG!GGG

I have this exact same output.

<!-- gh-comment-id:2156168363 --> @Axenide commented on GitHub (Jun 8, 2024): > I meet the same problem > > ```shell > verifying sha256 digest > writing manifest > removing any unused layers > success > root@ab-Z370-HD3:/mnt/home/zehuazeng# ollama run qwen2:7b > >>> hello > GGJZ:GDSN > > >>> please respond in english > GladGDSGtoNN assistGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG > > >>> hello > GGJGGGGGK,GGGGGGGGGG!GGG > ``` I have this exact same output.
Author
Owner

@igorschlum commented on GitHub (Jun 8, 2024):

I tried with version 0.1.42 of Ollama on MacOS with 32GB Ram.

with granite-code I get answers in english.
But for deepseek-v2, I can reproduce also, it's answering in Chiness even if I ask to answer in English.
For qwen2:7b, the answer is in English

(base) igor@macigor-2 ~ % ollama run granite-code
pulling manifest
pulling 02ab8cd2f514... 100% ▕███████████████████████████████████████████████▏ 2.0 GB
pulling 0d7c97d535b6... 100% ▕███████████████████████████████████████████████▏ 26 B
pulling e50df8490144... 100% ▕███████████████████████████████████████████████▏ 123 B
pulling 9893bb2c2917... 100% ▕███████████████████████████████████████████████▏ 108 B
pulling 22b176fd8ef6... 100% ▕███████████████████████████████████████████████▏ 485 B
verifying sha256 digest
writing manifest
removing any unused layers
success

hello
Hello! How can I assist you today?

(base) igor@macigor-2 ~ % ollama run qwen2:7b
pulling manifest
pulling 43f7a214e532... 100% ▕██████████████████████████████████████████████ 4.4 GB
pulling 62fbfd9ed093... 100% ▕██████████████████████████████████████████████▏ 182 B
pulling c156170b718e... 100% ▕██████████████████████████████████████████████ 11 KB
pulling f02dd72bb242... 100% ▕██████████████████████████████████████████████▏ 59 B
pulling 648f809ced2b... 100% ▕██████████████████████████████████████████████▏ 485 B
verifying sha256 digest
writing manifest
removing any unused layers
success

hello
Hello! How can I assist you today?

(base) igor@macigor-2 ~ % ollama run deepseek-v2
pulling manifest
pulling d8d69f2a1bfa... 100% ▕███████████████████████████████████████████████▏ 8.9 GB
pulling 732caedf08d1... 100% ▕███████████████████████████████████████████████▏ 112 B
pulling 4bb71764481f... 100% ▕███████████████████████████████████████████████▏ 13 KB
pulling 19f2fb9e8bc6... 100% ▕███████████████████████████████████████████████▏ 32 B
pulling efe8e27d9609... 100% ▕███████████████████████████████████████████████▏ 494 B
verifying sha256 digest
writing manifest
removing any unused layers
success

hello
你好!很高兴见到你。有什么我可以帮你的吗?

can you answer in english?
当然可以,有什么问题需要我回答的?

<!-- gh-comment-id:2156216524 --> @igorschlum commented on GitHub (Jun 8, 2024): I tried with version 0.1.42 of Ollama on MacOS with 32GB Ram. with granite-code I get answers in english. But for deepseek-v2, I can reproduce also, it's answering in Chiness even if I ask to answer in English. For qwen2:7b, the answer is in English (base) igor@macigor-2 ~ % ollama run granite-code pulling manifest pulling 02ab8cd2f514... 100% ▕███████████████████████████████████████████████▏ 2.0 GB pulling 0d7c97d535b6... 100% ▕███████████████████████████████████████████████▏ 26 B pulling e50df8490144... 100% ▕███████████████████████████████████████████████▏ 123 B pulling 9893bb2c2917... 100% ▕███████████████████████████████████████████████▏ 108 B pulling 22b176fd8ef6... 100% ▕███████████████████████████████████████████████▏ 485 B verifying sha256 digest writing manifest removing any unused layers success >>> hello Hello! How can I assist you today? (base) igor@macigor-2 ~ % ollama run qwen2:7b pulling manifest pulling 43f7a214e532... 100% ▕██████████████████████████████████████████████ 4.4 GB pulling 62fbfd9ed093... 100% ▕██████████████████████████████████████████████▏ 182 B pulling c156170b718e... 100% ▕██████████████████████████████████████████████ 11 KB pulling f02dd72bb242... 100% ▕██████████████████████████████████████████████▏ 59 B pulling 648f809ced2b... 100% ▕██████████████████████████████████████████████▏ 485 B verifying sha256 digest writing manifest removing any unused layers success >>> hello Hello! How can I assist you today? (base) igor@macigor-2 ~ % ollama run deepseek-v2 pulling manifest pulling d8d69f2a1bfa... 100% ▕███████████████████████████████████████████████▏ 8.9 GB pulling 732caedf08d1... 100% ▕███████████████████████████████████████████████▏ 112 B pulling 4bb71764481f... 100% ▕███████████████████████████████████████████████▏ 13 KB pulling 19f2fb9e8bc6... 100% ▕███████████████████████████████████████████████▏ 32 B pulling efe8e27d9609... 100% ▕███████████████████████████████████████████████▏ 494 B verifying sha256 digest writing manifest removing any unused layers success >>> hello 你好!很高兴见到你。有什么我可以帮你的吗? >>> can you answer in english? 当然可以,有什么问题需要我回答的?
Author
Owner

@Axenide commented on GitHub (Jun 8, 2024):

I just tested with 0.1.42 and at least Qwen 2 seems to work fine. Sometimes it mixes languages but that's from the LLM itself.

I tried deepseek-v2 several times but it always gives me this error:
Error: llama runner process has terminated: signal: aborted (core dumped)
I guess this has to do with my system resources, I have only 6 GB of VRAM, but I thought it could offload to CPU and RAM.

<!-- gh-comment-id:2156219764 --> @Axenide commented on GitHub (Jun 8, 2024): I just tested with 0.1.42 and at least Qwen 2 seems to work fine. Sometimes it mixes languages but that's from the LLM itself. I tried deepseek-v2 several times but it always gives me this error: `Error: llama runner process has terminated: signal: aborted (core dumped)` I guess this has to do with my system resources, I have only 6 GB of VRAM, but I thought it could offload to CPU and RAM.
Author
Owner

@jmorganca commented on GitHub (Jun 9, 2024):

Hi folks, Qwen 2 is fixed now - sorry about that.

@Axenide are you using OLLAMA_FLASH_ATTENTION=1? deepseek-v2 doesn't work with flash attention yet. Let me know if that's not the case and I can re-open this. Sorry you hit an error

<!-- gh-comment-id:2156707546 --> @jmorganca commented on GitHub (Jun 9, 2024): Hi folks, Qwen 2 is fixed now - sorry about that. @Axenide are you using `OLLAMA_FLASH_ATTENTION=1`? `deepseek-v2` doesn't work with flash attention yet. Let me know if that's not the case and I can re-open this. Sorry you hit an error
Author
Owner

@Axenide commented on GitHub (Jun 9, 2024):

Hi folks, Qwen 2 is fixed now - sorry about that.

@Axenide are you using OLLAMA_FLASH_ATTENTION=1? deepseek-v2 doesn't work with flash attention yet. Let me know if that's not the case and I can re-open this. Sorry you hit an error

Hi, no worries. No, I'm not using flash attention. Just the defaults.

<!-- gh-comment-id:2156717411 --> @Axenide commented on GitHub (Jun 9, 2024): > Hi folks, Qwen 2 is fixed now - sorry about that. > > @Axenide are you using `OLLAMA_FLASH_ATTENTION=1`? `deepseek-v2` doesn't work with flash attention yet. Let me know if that's not the case and I can re-open this. Sorry you hit an error Hi, no worries. No, I'm not using flash attention. Just the defaults.
Author
Owner

@igorschlum commented on GitHub (Jun 9, 2024):

@jmorganca Qwen2 is fixed but deepseek-v2 is not. It's always answering in chines.

<!-- gh-comment-id:2156722218 --> @igorschlum commented on GitHub (Jun 9, 2024): @jmorganca Qwen2 is fixed but deepseek-v2 is not. It's always answering in chines.
Author
Owner

@jmorganca commented on GitHub (Jun 9, 2024):

@igorschlum I think may be how the model was trained!

@Axenide Shoot, sorry then – may I ask what hardware you are running it on?

<!-- gh-comment-id:2156723160 --> @jmorganca commented on GitHub (Jun 9, 2024): @igorschlum I think may be how the model was trained! @Axenide Shoot, sorry then – may I ask what hardware you are running it on?
Author
Owner

@Axenide commented on GitHub (Jun 9, 2024):

@Axenide Shoot, sorry then – may I ask what hardware you are running it on?

Sure, here are my specs:

CPU: AMD Ryzen 5 2600 (12) @ 3,40 GHz
GPU: NVIDIA GeForce RTX 2060 (6 GB VRAM)
RAM: 16 GB

I'm using Arch Linux with kernel 6.9.3-arch1-1

<!-- gh-comment-id:2156735760 --> @Axenide commented on GitHub (Jun 9, 2024): > @Axenide Shoot, sorry then – may I ask what hardware you are running it on? Sure, here are my specs: CPU: AMD Ryzen 5 2600 (12) @ 3,40 GHz GPU: NVIDIA GeForce RTX 2060 (6 GB VRAM) RAM: 16 GB I'm using Arch Linux with kernel 6.9.3-arch1-1
Author
Owner

@igorschlum commented on GitHub (Jun 10, 2024):

@jmorganca In Ollama Deepseek answers in Chiness :
Deepseekv2 in Ollama
In chat.deepseek.com it answers in English :
Deepseekv2 in chat deepseek com

<!-- gh-comment-id:2157447496 --> @igorschlum commented on GitHub (Jun 10, 2024): @jmorganca In Ollama Deepseek answers in Chiness : <img width="511" alt="Deepseekv2 in Ollama" src="https://github.com/ollama/ollama/assets/2884312/464cd03c-a353-45ec-9d52-0e2c054170bd"> In [chat.deepseek.com](https://chat.deepseek.com) it answers in English : <img width="632" alt="Deepseekv2 in chat deepseek com" src="https://github.com/ollama/ollama/assets/2884312/9ae5e5fb-015a-4a5f-80e2-16749d03b0c8">
Author
Owner

@rick-github commented on GitHub (Jun 15, 2024):

@Axenide Your problem may be that ollama is trying to load too many layers for deepseek-v2, see https://github.com/ollama/ollama/issues/4799#issuecomment-2148573008

<!-- gh-comment-id:2170944038 --> @rick-github commented on GitHub (Jun 15, 2024): @Axenide Your problem may be that ollama is trying to load too many layers for deepseek-v2, see https://github.com/ollama/ollama/issues/4799#issuecomment-2148573008
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#28866