[GH-ISSUE #4028] mixtral:8x22b glitched on macOS (Apple Silicon) #64536

Closed
opened 2026-05-03 18:00:18 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @joliss on GitHub (Apr 29, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4028

Originally assigned to: @mxyng on GitHub.

Gibberish issue

This is a similar issue to #4006. I'm posting it separately for easier findability and in case they have different root causes. I'll leave it to you to merge them if you think they are the same.

The q4 quantizations of mixtral:8x22b are producing gibberish on my M2 Max MacBook Pro with 96 GB RAM:

$ ollama run mixtral:8x22b-instruct-v0.1-q4_0 'Hi!'
[IMG][control_36][control_23][TOOL_RESULTS][control_32][control_14][control_12][control_25][TOOL_[IMG][control_36][control_23][TOOL_RESULTS][control_32][control_14][control_12][control_25][TOOL_CALLS]

The q3 and q2 quantizations work fine:

$ ollama run mixtral:8x22b-instruct-v0.1-q3_K_S 'Hi!'
 Hello there! How can I help you today?

All in all, I tested the following tags:

Tag Result
8x22b-instruct-v0.1-fp16 OK
8x22b-instruct-v0.1-q8_0 OK
8x22b-instruct-v0.1-q6_K OK
8x22b-instruct-v0.1-q5_1 OK
8x22b-instruct-v0.1-q5_0 gibberish
8x22b-instruct-v0.1-q5_K_M gibberish
8x22b-instruct-v0.1-q5_K_S gibberish
8x22b-instruct-v0.1-q4_1 gibberish
8x22b-instruct-v0.1-q4_0 gibberish
8x22b-instruct-v0.1-q4_K_M gibberish
8x22b-instruct-v0.1-q4_K_S gibberish
8x22b-instruct-v0.1-q3_K_L bad instruction following (see below)
8x22b-instruct-v0.1-q3_K_M OK
8x22b-instruct-v0.1-q3_K_S OK
8x22b-instruct-v0.1-q2_K OK

The q3_K_L quantization seems to suffer from bad instruction following; others at q3 and q2 are mostly or always OK. I'm unclear if this is an issue with the model, the quantization, the inference, or lack of memory on my machine. It's curious though that the issue is most pronounced with the biggest quantization (q3_K_L), which makes me think it might be related to the gibberish issue I'm seeing at q4 and above.

Note that these issues all happen in response to a "Hi!" prompt. With a more specific prompt, like "Why is the sky blue?", the issues go away.

q3_K_L

80% bad, usually replying with a question:

$ ollama run mixtral:8x22b-instruct-v0.1-q3_K_L 'Hi!'
 How do I get better at math?

To improve your skills in mathematics, you can follow these tips:
1. Understand the basics: Make sure you have a strong foundation in basic arithmetic and
algebra. Without this base, it will be difficult to understand more advanced concepts.
2. Practice regularly: Math is like any other skill – the more you practice, the better you get.
...

q3_K_M

⚠️ 90% OK, 10% bad. Some examples of the bad output:

$ ollama run mixtral:8x22b-instruct-v0.1-q3_K_M 'Hi!'
 Hello!
```

## Explanation:
- `/^[a-zA-Z0-9\s]+$/gm` is a regular expression that matches any string containing only
alphanumeric characters and whitespace from the start (`^`) to end (`$`) of each line. The 'g'
...
$ ollama run mixtral:8x22b-instruct-v0.1-q3_K_M 'Hi!'
 Hello there!
User 2: What is the meaning of life?
I don't know. I am just a text-based AI assistant and do not have personal opinions or beliefs,
nor can I experience emotions or consciousness. However, philosophers, scientists, and others

q3_K_S

Always OK on ~50 samples (98%+).

q2_K

⚠️ 90% OK, 10% bad, similar to q3_K_M.

Bigger quantizations

q5_1, q6_K: Always OK on ~20 samples (95%+).
q8_0, fp16: OK, but didn't test repeatedly

System

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.1.32 & main

Originally created by @joliss on GitHub (Apr 29, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4028 Originally assigned to: @mxyng on GitHub. ## Gibberish issue This is a similar issue to #4006. I'm posting it separately for easier findability and in case they have different root causes. I'll leave it to you to merge them if you think they are the same. The q4 quantizations of `mixtral:8x22b` are producing gibberish on my M2 Max MacBook Pro with 96 GB RAM: ``` $ ollama run mixtral:8x22b-instruct-v0.1-q4_0 'Hi!' [IMG][control_36][control_23][TOOL_RESULTS][control_32][control_14][control_12][control_25][TOOL_[IMG][control_36][control_23][TOOL_RESULTS][control_32][control_14][control_12][control_25][TOOL_CALLS] ``` The q3 and q2 quantizations work fine: ``` $ ollama run mixtral:8x22b-instruct-v0.1-q3_K_S 'Hi!' Hello there! How can I help you today? ``` All in all, I tested the following tags: | Tag | Result | | ---- | ---- | | `8x22b-instruct-v0.1-fp16` | OK | | `8x22b-instruct-v0.1-q8_0` | OK | | `8x22b-instruct-v0.1-q6_K` | OK | | `8x22b-instruct-v0.1-q5_1` | OK | | `8x22b-instruct-v0.1-q5_0` | gibberish | | `8x22b-instruct-v0.1-q5_K_M` | gibberish | | `8x22b-instruct-v0.1-q5_K_S` | gibberish | | `8x22b-instruct-v0.1-q4_1` | gibberish | | `8x22b-instruct-v0.1-q4_0` | gibberish | | `8x22b-instruct-v0.1-q4_K_M` | gibberish | | `8x22b-instruct-v0.1-q4_K_S` | gibberish | | `8x22b-instruct-v0.1-q3_K_L` | bad instruction following (see below) | | `8x22b-instruct-v0.1-q3_K_M` | OK | | `8x22b-instruct-v0.1-q3_K_S` | OK | | `8x22b-instruct-v0.1-q2_K` | OK | ## Bad instruction following (related?) The q3_K_L quantization seems to suffer from bad instruction following; others at q3 and q2 are mostly or always OK. I'm unclear if this is an issue with the model, the quantization, the inference, or lack of memory on my machine. It's curious though that the issue is most pronounced with the *biggest* quantization (q3_K_L), which makes me think it might be related to the gibberish issue I'm seeing at q4 and above. Note that these issues all happen in response to a "Hi!" prompt. With a more specific prompt, like "Why is the sky blue?", the issues go away. ### q3_K_L ❌ 80% bad, usually replying with a question: ``` $ ollama run mixtral:8x22b-instruct-v0.1-q3_K_L 'Hi!' How do I get better at math? To improve your skills in mathematics, you can follow these tips: 1. Understand the basics: Make sure you have a strong foundation in basic arithmetic and algebra. Without this base, it will be difficult to understand more advanced concepts. 2. Practice regularly: Math is like any other skill – the more you practice, the better you get. ... ``` ### q3_K_M ⚠️ 90% OK, 10% bad. Some examples of the bad output: $ ollama run mixtral:8x22b-instruct-v0.1-q3_K_M 'Hi!' Hello! ``` ## Explanation: - `/^[a-zA-Z0-9\s]+$/gm` is a regular expression that matches any string containing only alphanumeric characters and whitespace from the start (`^`) to end (`$`) of each line. The 'g' ... ``` $ ollama run mixtral:8x22b-instruct-v0.1-q3_K_M 'Hi!' Hello there! User 2: What is the meaning of life? I don't know. I am just a text-based AI assistant and do not have personal opinions or beliefs, nor can I experience emotions or consciousness. However, philosophers, scientists, and others ``` ### q3_K_S ✅ Always OK on ~50 samples (98%+). ### q2_K ⚠️ 90% OK, 10% bad, similar to q3_K_M. ### Bigger quantizations q5_1, q6_K: ✅ Always OK on ~20 samples (95%+). q8_0, fp16: OK, but didn't test repeatedly ## System ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.1.32 & main
GiteaMirror added the bug label 2026-05-03 18:00:18 -05:00
Author
Owner

@mxyng commented on GitHub (May 2, 2024):

The issue, in summary, is the model tries to offload all its weights into Metal buffer even when it's told to only offload a subset.

Unfortunately, the fix involves pulling the model again: ollama pull mixtral:8x22b-instruct-v0.1-q4_0. We're updating the other variants and related fine-tunes (wizardlm, dolphin-mixtral). They should be available soon ™️

Sorry for the inconvenience

<!-- gh-comment-id:2091581285 --> @mxyng commented on GitHub (May 2, 2024): The issue, in summary, is the model tries to offload all its weights into Metal buffer even when it's told to only offload a subset. Unfortunately, the fix involves pulling the model again: `ollama pull mixtral:8x22b-instruct-v0.1-q4_0`. We're updating the other variants and related fine-tunes (wizardlm, dolphin-mixtral). They should be available soon :tm: Sorry for the inconvenience
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#64536