[GH-ISSUE #3698] Command-R returngs gibberish since update to 0.1.32, logs: ggml_metal_graph_compute: command buffer 6 failed with status 5 #2280

Closed
opened 2026-04-12 12:32:49 -05:00 by GiteaMirror · 11 comments
Owner

Originally created by @phischde on GitHub (Apr 17, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3698

What is the issue?

Since the update, Command-R is no longer producing text, but other models (e.g. openchat) do. Running Command-R from the terminal

$ ollama run command-r
>>> Hey, how are you?
3O>FCMID7BBBM<=>PJT@@FNURWKL=8@N;GWHP6:GJ>F76N86EL5DKLFJFADJ;ESQAV7OBDJTK8HT@Q>Q8@BCJ:I9NJEW=?C>BHIJ3U@87L^C

Looking at the .ollama/logs/server.log logs I seen many lines of

ggml_metal_graph_compute: command buffer 6 failed with status 5

What did you expect to see?

Any kind of English Text Answer.

Steps to reproduce

Start Ollama on the console

ollama run command-r

Ask any question.

Are there any recent changes that introduced the issue?

Upgrading from Ollama 0.1.31 to 0.1.32.

OS

macOS

Architecture

arm64

Platform

No response

Ollama version

0.1.32

GPU

Apple

GPU info

No response

CPU

Apple

Other software

No response

Originally created by @phischde on GitHub (Apr 17, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3698 ### What is the issue? Since the update, Command-R is no longer producing text, but other models (e.g. openchat) do. Running Command-R from the terminal ``` $ ollama run command-r >>> Hey, how are you? 3O>FCMID7BBBM<=>PJT@@FNURWKL=8@N;GWHP6:GJ>F76N86EL5DKLFJFADJ;ESQAV7OBDJTK8HT@Q>Q8@BCJ:I9NJEW=?C>BHIJ3U@87L^C ``` Looking at the .ollama/logs/server.log logs I seen many lines of `ggml_metal_graph_compute: command buffer 6 failed with status 5` ### What did you expect to see? Any kind of English Text Answer. ### Steps to reproduce Start Ollama on the console ollama run command-r Ask any question. ### Are there any recent changes that introduced the issue? Upgrading from Ollama 0.1.31 to 0.1.32. ### OS macOS ### Architecture arm64 ### Platform _No response_ ### Ollama version 0.1.32 ### GPU Apple ### GPU info _No response_ ### CPU Apple ### Other software _No response_
GiteaMirror added the bugmemory labels 2026-04-12 12:32:49 -05:00
Author
Owner

@jmorganca commented on GitHub (Apr 17, 2024):

Hi there, sorry about this. May I ask how much memory this Apple Silicon Mac has?

<!-- gh-comment-id:2061151157 --> @jmorganca commented on GitHub (Apr 17, 2024): Hi there, sorry about this. May I ask how much memory this Apple Silicon Mac has?
Author
Owner

@dims commented on GitHub (Apr 17, 2024):

@jmorganca I can recreate this problem in my Apple M1 Pro with 32GB

<!-- gh-comment-id:2061184036 --> @dims commented on GitHub (Apr 17, 2024): @jmorganca I can recreate this problem in my Apple M1 Pro with 32GB
Author
Owner

@dims commented on GitHub (Apr 17, 2024):

@jmorganca here are debug logs - https://pastebin.com/raw/GiC2mFPz

<!-- gh-comment-id:2061193696 --> @dims commented on GitHub (Apr 17, 2024): @jmorganca here are debug logs - https://pastebin.com/raw/GiC2mFPz
Author
Owner

@phischde commented on GitHub (Apr 17, 2024):

server.log

Sorry for not including all specs.
It is an Apple M1 Pro, 32GB Ram, German macos Sonoma 14.4.1
The Commad-R Model used to run fine with the privous 0.1.31 Version of Ollama.

<!-- gh-comment-id:2061220575 --> @phischde commented on GitHub (Apr 17, 2024): [server.log](https://github.com/ollama/ollama/files/15011494/server.log) Sorry for not including all specs. It is an Apple M1 Pro, 32GB Ram, German macos Sonoma 14.4.1 The Commad-R Model used to run fine with the privous 0.1.31 Version of Ollama.
Author
Owner

@AndreRatzenberger commented on GitHub (Apr 17, 2024):

It is an Apple M1 Pro, 32GB Ram, German macos Sonoma 14.4.1

Same configuration and same problem as OP. Funnily the performance of CommandR is improved by a lot now, haha

Workaround: use "/set parameter num_gpu 0"

Then it seems to work again

Screenshot 2024-04-17 at 15 58 53
<!-- gh-comment-id:2061335598 --> @AndreRatzenberger commented on GitHub (Apr 17, 2024): > It is an Apple M1 Pro, 32GB Ram, German macos Sonoma 14.4.1 Same configuration and same problem as OP. Funnily the performance of CommandR is improved by a lot now, haha **Workaround**: use "/set parameter num_gpu 0" Then it seems to work again <img width="610" alt="Screenshot 2024-04-17 at 15 58 53" src="https://github.com/ollama/ollama/assets/44863088/36d2a804-ed6a-4bfd-a0f9-a239b143e0c7">
Author
Owner

@dims commented on GitHub (Apr 17, 2024):

thanks @AndreRatzenberger I can confirm that /set parameter num_gpu 0 works for me.

<!-- gh-comment-id:2061855725 --> @dims commented on GitHub (Apr 17, 2024): thanks @AndreRatzenberger I can confirm that `/set parameter num_gpu 0` works for me.
Author
Owner

@githop commented on GitHub (Apr 17, 2024):

how can this workaround be applied when running ollama serve or perhaps when calling ollama run command-r "prompt text etc.."?

Reading some of the docs here, one idea that came to mind was to create a custom model with a Modelfile, e.g.

FROM command-r
PARAMETER num_gpu 0

and run that model. I'm not sure if my intuition is right - would this config be merged with / overwrite the existing Modelfile or would it blow away other things already set like SYSTEM or TEMPLATE?

<!-- gh-comment-id:2061889630 --> @githop commented on GitHub (Apr 17, 2024): how can this workaround be applied when running `ollama serve` or perhaps when calling `ollama run command-r "prompt text etc.."`? Reading some of the docs here, one idea that came to mind was to create a custom model with a Modelfile, e.g. ``` FROM command-r PARAMETER num_gpu 0 ``` and run that model. I'm not sure if my intuition is right - would this config be merged with / overwrite the existing Modelfile or would it blow away other things already set like SYSTEM or TEMPLATE?
Author
Owner

@keskinonur commented on GitHub (Apr 17, 2024):

I can also confirm that setting GPU number is also working, but I have some doubts about its token/s results.

<!-- gh-comment-id:2062503467 --> @keskinonur commented on GitHub (Apr 17, 2024): I can also confirm that setting GPU number is also working, but I have some doubts about its token/s results.
Author
Owner

@dhiltgen commented on GitHub (Jun 1, 2024):

I believe this is fixed in the latest Ollama version. If you're still having problems with the latest version, let me know and I'll re-open the issue:

> ollama run command-r --verbose why is the sky blue
The sky appears blue because molecules in the air scatter blue light from the sun more than other colors. This
...

total duration:       29.6602217s
load duration:        16.2283007s
prompt eval count:    11 token(s)
prompt eval duration: 142.614ms
prompt eval rate:     77.13 tokens/s
eval count:           182 token(s)
eval duration:        13.284409s
eval rate:            13.70 tokens/s
> ollama ps
NAME                    ID              SIZE    PROCESSOR       UNTIL
command-r:latest        b8cdfff0263c    24 GB   2%/98% CPU/GPU  4 minutes from now
<!-- gh-comment-id:2143628290 --> @dhiltgen commented on GitHub (Jun 1, 2024): I believe this is fixed in the latest Ollama version. If you're still having problems with the latest version, let me know and I'll re-open the issue: ``` > ollama run command-r --verbose why is the sky blue The sky appears blue because molecules in the air scatter blue light from the sun more than other colors. This ... total duration: 29.6602217s load duration: 16.2283007s prompt eval count: 11 token(s) prompt eval duration: 142.614ms prompt eval rate: 77.13 tokens/s eval count: 182 token(s) eval duration: 13.284409s eval rate: 13.70 tokens/s > ollama ps NAME ID SIZE PROCESSOR UNTIL command-r:latest b8cdfff0263c 24 GB 2%/98% CPU/GPU 4 minutes from now ```
Author
Owner

@Complexity commented on GitHub (Oct 16, 2024):

I want to reopen this (although it may be better as a new issue, as the version is greatly different.)

However I am seeing similar behaviour.

Macbook Pro, Apple M1 Max, 32Gb. Ollama 0.3.13.

Newly pulling Command-R:Latest (18GB model) produces:

ollama run command-r:latest --verbose why is the sky blue
 general firebase firebase firebase primarioası firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase 
firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase 
firebase

and

ollama run command-r:latest
>>> Hello.
 generalası obligatoireasıası firebaseasıasıası 
obligatoireasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıası

Using /set parameter num_gpu 0 (or any figure) appears to fix the issue.

ollama run command-r:latest --verbose                 
>>> /set parameter num_gpu 0
Set parameter 'num_gpu' to '0'
>>> why is the sky blue?
The sky appears blue to human observers due to a phenomenon called Rayleigh scattering. When sunlight enters Earth's atmosphere, it encounters gases and particles 
that cause its light to scatter in different directions. Blue light has shorter wavelengths compared to other colors in the visible spectrum, which makes it more 
prone to scattering by the tiny molecules of air (mostly nitrogen and oxygen). As a result, when you look towards the sky, you see more blue light reaching your 
eyes because it has been scattered in all directions by these atmospheric particles. 

This phenomenon is named after Lord Rayleigh, the British physicist who first described it in the 19th century. However, keep in mind that the sky can appear 
differently at various times of day or under different weather conditions due to other factors like humidity, pollution, and the presence of dust or haze.

total duration:       54.565436458s
load duration:        12.696016792s
prompt eval count:    32 token(s)
prompt eval duration: 9.387385s
prompt eval rate:     3.41 tokens/s
eval count:           166 token(s)
eval duration:        32.472529s
eval rate:            5.11 tokens/s
>>> 
<!-- gh-comment-id:2417097573 --> @Complexity commented on GitHub (Oct 16, 2024): I want to reopen this (although it may be better as a new issue, as the version is greatly different.) However I am seeing similar behaviour. Macbook Pro, Apple M1 Max, 32Gb. Ollama 0.3.13. Newly pulling Command-R:Latest (18GB model) produces: ``` ollama run command-r:latest --verbose why is the sky blue general firebase firebase firebase primarioası firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase firebase ``` and ``` ollama run command-r:latest >>> Hello. generalası obligatoireasıası firebaseasıasıası obligatoireasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıasıası ``` Using /set parameter num_gpu 0 (or any figure) appears to fix the issue. ``` ollama run command-r:latest --verbose >>> /set parameter num_gpu 0 Set parameter 'num_gpu' to '0' >>> why is the sky blue? The sky appears blue to human observers due to a phenomenon called Rayleigh scattering. When sunlight enters Earth's atmosphere, it encounters gases and particles that cause its light to scatter in different directions. Blue light has shorter wavelengths compared to other colors in the visible spectrum, which makes it more prone to scattering by the tiny molecules of air (mostly nitrogen and oxygen). As a result, when you look towards the sky, you see more blue light reaching your eyes because it has been scattered in all directions by these atmospheric particles. This phenomenon is named after Lord Rayleigh, the British physicist who first described it in the 19th century. However, keep in mind that the sky can appear differently at various times of day or under different weather conditions due to other factors like humidity, pollution, and the presence of dust or haze. total duration: 54.565436458s load duration: 12.696016792s prompt eval count: 32 token(s) prompt eval duration: 9.387385s prompt eval rate: 3.41 tokens/s eval count: 166 token(s) eval duration: 32.472529s eval rate: 5.11 tokens/s >>> ```
Author
Owner

@dhiltgen commented on GitHub (Oct 17, 2024):

I think our memory prediction logic is off on this model. On Metal, if we over-allocate, it can lead to gibberish responses like this. As a workaround until we get this fixed, you can use OLLAMA_GPU_OVERHEAD to reserve a portion of VRAM to cause fewer layers to load.

<!-- gh-comment-id:2420012767 --> @dhiltgen commented on GitHub (Oct 17, 2024): I think our memory prediction logic is off on this model. On Metal, if we over-allocate, it can lead to gibberish responses like this. As a workaround until we get this fixed, you can use `OLLAMA_GPU_OVERHEAD` to reserve a portion of VRAM to cause fewer layers to load.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#2280