[GH-ISSUE #6250] 运行glm4-9b模型,对话时间久了会偶发性的回复GGGGGGG #50421

Closed
opened 2026-04-28 15:45:25 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @MdcGIt on GitHub (Aug 8, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6250

Originally assigned to: @jmorganca on GitHub.

What is the issue?

运行glm4-9b模型,对话时间久了会偶发性的回复GGGGGGG
显卡信息如下:
image

OS

Linux

GPU

Intel

CPU

Intel

Ollama version

0.3.0

Originally created by @MdcGIt on GitHub (Aug 8, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6250 Originally assigned to: @jmorganca on GitHub. ### What is the issue? 运行glm4-9b模型,对话时间久了会偶发性的回复GGGGGGG 显卡信息如下: ![image](https://github.com/user-attachments/assets/5ae71376-3667-4d65-8603-549768fccb40) ### OS Linux ### GPU Intel ### CPU Intel ### Ollama version 0.3.0
GiteaMirror added the needs more infobug labels 2026-04-28 15:45:27 -05:00
Author
Owner

@wszgrcy commented on GitHub (Aug 8, 2024):

不一定运行久,有时候第一次正常,再来一次就失败了

<!-- gh-comment-id:2275043542 --> @wszgrcy commented on GitHub (Aug 8, 2024): 不一定运行久,有时候第一次正常,再来一次就失败了
Author
Owner

@AeneasZhu commented on GitHub (Aug 8, 2024):

This might be llama.cpp's problem that causes the bug. I suggest you post your issue in llama.cpp.

<!-- gh-comment-id:2276013002 --> @AeneasZhu commented on GitHub (Aug 8, 2024): This might be `llama.cpp`'s problem that causes the bug. I suggest you post your issue in `llama.cpp`.
Author
Owner

@jmorganca commented on GitHub (Sep 2, 2024):

Having trouble reproducing this one. May I ask what prompt you used and which glm4 model? Was it ollama run glm4?

<!-- gh-comment-id:2325373894 --> @jmorganca commented on GitHub (Sep 2, 2024): Having trouble reproducing this one. May I ask what prompt you used and which glm4 model? Was it `ollama run glm4`?
Author
Owner

@wszgrcy commented on GitHub (Sep 2, 2024):

Having trouble reproducing this one. May I ask what prompt you used and which glm4 model? Was it ollama run glm4?

Llama.cpp seems to have fixed this issue
https://github.com/ggerganov/llama.cpp/pull/9130
But the code is not synchronized?

<!-- gh-comment-id:2325394129 --> @wszgrcy commented on GitHub (Sep 2, 2024): > Having trouble reproducing this one. May I ask what prompt you used and which glm4 model? Was it `ollama run glm4`? Llama.cpp seems to have fixed this issue https://github.com/ggerganov/llama.cpp/pull/9130 But the code is not synchronized?
Author
Owner

@shuye-cheung commented on GitHub (Sep 24, 2024):

I use

ollama run glm4

namely 9b-chat-q4_0

Then, the model will return "GGGGGGGGGGGGGGGGGGG" error。

<!-- gh-comment-id:2371530802 --> @shuye-cheung commented on GitHub (Sep 24, 2024): I use ``` ollama run glm4 ``` namely **9b-chat-q4_0** Then, the model will return "GGGGGGGGGGGGGGGGGGG" error。
Author
Owner

@dhiltgen commented on GitHub (Sep 30, 2024):

Please upgrade to the latest version of Ollama which includes upstream fixes for glm4 in llama.cpp. If you're still having troubles, share an updated server log and we'll reopen the issue.

<!-- gh-comment-id:2384320304 --> @dhiltgen commented on GitHub (Sep 30, 2024): Please upgrade to the latest version of Ollama which includes upstream fixes for glm4 in llama.cpp. If you're still having troubles, share an updated server log and we'll reopen the issue.
Author
Owner

@1003663050 commented on GitHub (Apr 15, 2025):

要看显卡支持的指令集,比如:NVIDIA GeForce RTX 3080不支持INT4指令集‌。RTX 3080使用的是Ampere架构,该架构支持INT8和FP16运算,但不直接支持INT4运算‌,但是Q4模型是需要int4指令集的

<!-- gh-comment-id:2803904972 --> @1003663050 commented on GitHub (Apr 15, 2025): 要看显卡支持的指令集,比如:NVIDIA GeForce RTX 3080不支持INT4指令集‌。RTX 3080使用的是Ampere架构,该架构支持INT8和FP16运算,但不直接支持INT4运算‌,但是Q4模型是需要int4指令集的
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#50421