[GH-ISSUE #3720] wrong inference for mixtral 8*22b #64325

Closed
opened 2026-05-03 17:05:57 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @taozhiyuai on GitHub (Apr 18, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3720

What is the issue?

are there anything wrong with model file or something?

taozhiyu@603e5f4a42f1 ~ % ollama run mixtral:8x22b-instruct-v0.1-q6_K

who is bill gates?
[control_12][control_37][AVAILABLE_TOOLS][control_16][control_30][control_2[control_12][control_37][AVAILABLE_TOOLS][control_16][control_30][control_22][AVAILABLE_TOOLS][TOOL_RESULTS][control_15][control_17][control_12][TOOL_R][AVAILABLE_TOOLS][TOOL_RESULTS][control_15][control_17][control_12][TOOL_RESULTS][control_23][control_34][control_15]

taozhiyu@603e5f4a42f1 ~ % ollama show --modelfile mixtral:8x22b-instruct-v0.1-q6_K

Modelfile generated by "ollama show"

To build a new Modelfile based on this one, replace the FROM line with:

FROM mixtral:8x22b-instruct-v0.1-q6_K

FROM /Users/taozhiyu/.ollama/models/blobs/sha256-8cf07cdca12d6a856f9c9e42be08c088c4e902aaf57805a34cabe2f8fbae5400
TEMPLATE """[INST] {{ if .System }}{{ .System }} {{ end }}{{ .Prompt }} [/INST]"""
PARAMETER stop "[INST]"
PARAMETER stop "[/INST]"

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.1.32

Originally created by @taozhiyuai on GitHub (Apr 18, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3720 ### What is the issue? are there anything wrong with model file or something? taozhiyu@603e5f4a42f1 ~ % ollama run mixtral:8x22b-instruct-v0.1-q6_K >>> who is bill gates? [control_12][control_37][AVAILABLE_TOOLS][control_16][control_30][control_2[control_12][control_37][AVAILABLE_TOOLS][control_16][control_30][control_22][AVAILABLE_TOOLS][TOOL_RESULTS][control_15][control_17][control_12][TOOL_R][AVAILABLE_TOOLS][TOOL_RESULTS][control_15][control_17][control_12][TOOL_RESULTS][control_23][control_34][control_15] taozhiyu@603e5f4a42f1 ~ % ollama show --modelfile mixtral:8x22b-instruct-v0.1-q6_K # Modelfile generated by "ollama show" # To build a new Modelfile based on this one, replace the FROM line with: # FROM mixtral:8x22b-instruct-v0.1-q6_K FROM /Users/taozhiyu/.ollama/models/blobs/sha256-8cf07cdca12d6a856f9c9e42be08c088c4e902aaf57805a34cabe2f8fbae5400 TEMPLATE """[INST] {{ if .System }}{{ .System }} {{ end }}{{ .Prompt }} [/INST]""" PARAMETER stop "[INST]" PARAMETER stop "[/INST]" ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.1.32
GiteaMirror added the bug label 2026-05-03 17:05:57 -05:00
Author
Owner

@taozhiyuai commented on GitHub (Apr 18, 2024):

q6k is weird. works fine for q4 or q5. same model file. so strange

<!-- gh-comment-id:2062927504 --> @taozhiyuai commented on GitHub (Apr 18, 2024): q6k is weird. works fine for q4 or q5. same model file. so strange
Author
Owner

@asmeurer commented on GitHub (Apr 18, 2024):

I'm also getting this from mixtral:8x22b-instruct-v0.1-q3_K_M and mixtral:8x22b-instruct-v0.1-q3_K_S on my 64GB M3 Max. The normal mixtral:8x22b-instruct-v0.1-q4_0 works just fine (albeit very slowly).

<!-- gh-comment-id:2063290132 --> @asmeurer commented on GitHub (Apr 18, 2024): I'm also getting this from mixtral:8x22b-instruct-v0.1-q3_K_M and mixtral:8x22b-instruct-v0.1-q3_K_S on my 64GB M3 Max. The normal mixtral:8x22b-instruct-v0.1-q4_0 works just fine (albeit very slowly).
Author
Owner

@asmeurer commented on GitHub (Apr 18, 2024):

By the way, for the same prompt, the working model gives ~1000 tokens vs. about 50 garbage tokens for the broken ones. So I don't think the issue is in the tokenization. The model itself is just broken somehow.

<!-- gh-comment-id:2063295065 --> @asmeurer commented on GitHub (Apr 18, 2024): By the way, for the same prompt, the working model gives ~1000 tokens vs. about 50 garbage tokens for the broken ones. So I don't think the issue is in the tokenization. The model itself is just broken somehow.
Author
Owner

@taozhiyuai commented on GitHub (Apr 18, 2024):

I'm also getting this from mixtral:8x22b-instruct-v0.1-q3_K_M and mixtral:8x22b-instruct-v0.1-q3_K_S on my 64GB M3 Max. The normal mixtral:8x22b-instruct-v0.1-q4_0 works just fine (albeit very slowly).

Q3KM is OK for me.@asmeurer

<!-- gh-comment-id:2063618389 --> @taozhiyuai commented on GitHub (Apr 18, 2024): > I'm also getting this from mixtral:8x22b-instruct-v0.1-q3_K_M and mixtral:8x22b-instruct-v0.1-q3_K_S on my 64GB M3 Max. The normal mixtral:8x22b-instruct-v0.1-q4_0 works just fine (albeit very slowly). Q3KM is OK for me.@asmeurer
Author
Owner

@Kavan72 commented on GitHub (Apr 18, 2024):

I'm trying to load a model into RAM, and I'm getting the same trash tokens. It's weird.

<!-- gh-comment-id:2063917528 --> @Kavan72 commented on GitHub (Apr 18, 2024): I'm trying to load a model into RAM, and I'm getting the same trash tokens. It's weird.
Author
Owner

@asmeurer commented on GitHub (Apr 18, 2024):

I'm also getting this from mixtral:8x22b-instruct-v0.1-q3_K_M and mixtral:8x22b-instruct-v0.1-q3_K_S on my 64GB M3 Max. The normal mixtral:8x22b-instruct-v0.1-q4_0 works just fine (albeit very slowly).

Q3KM is OK for me.@asmeurer

$ollama run --verbose mixtral:8x22b-instruct-v0.1-q3_K_M
pulling manifest
pulling 0303cc3e524f... 100% ▕███████████████████████████████████████████████████████████████████████████████████████████████████████████▏  67 GB
pulling 43070e2d4e53... 100% ▕███████████████████████████████████████████████████████████████████████████████████████████████████████████▏  11 KB
pulling c43332387573... 100% ▕███████████████████████████████████████████████████████████████████████████████████████████████████████████▏   67 B
pulling ed11eda7790d... 100% ▕███████████████████████████████████████████████████████████████████████████████████████████████████████████▏   30 B
pulling 8527655d54bc... 100% ▕███████████████████████████████████████████████████████████████████████████████████████████████████████████▏  487 B
verifying sha256 digest
writing manifest
removing any unused layers
success
>>> Write a Python function to compute Fibonacci numbers.
[/TOOL_RESULTS][control_17][PREFIX][control_36][SUFFIX][control_26][control_24][TOOL_CALLS][control_37][control_23][TOOL_RESULTS][SUFFIX][control_17][control_26][cont[/TOOL_RESULTS][control_17][PREFIX][control_36][SUFFIX][control_26][control_24][TOOL_CALLS][control_37][control_23][TOOL_RESULTS][SUFFIX][control_17][control_26][control_30][control_21][control_33][MIDDLE][control_17][control_14][control_14][control_33][control_25][control_16][MIDDLE][control_23][SUFFIX][MIDDLE][IMG][TOOL_RESULTS][ol_30][control_21][control_33][MIDDLE][control_17][control_14][control_14][control_33][control_25][control_16][MIDDLE][control_23][SUFFIX][MIDDLE][IMG][TOOL_RESULTS][control_32][TOOL_CALLS][TOOL_RESULTS][control_24][control_37][control_23][control_12][control_22][control_37][control_17][control_20][control_19][control_34][control_3ontrol_32][TOOL_CALLS][TOOL_RESULTS][control_24][control_37][control_23][control_12][control_22][control_37][control_17][control_20][control_19][control_34][control_30][AVAILABLE_TOOLS]

total duration:       11.591028125s
load duration:        6.261704584s
prompt eval count:    16 token(s)
prompt eval duration: 1.136044s
prompt eval rate:     14.08 tokens/s
eval count:           48 token(s)
eval duration:        4.191977s
eval rate:            11.45 tokens/s
<!-- gh-comment-id:2065006144 --> @asmeurer commented on GitHub (Apr 18, 2024): > > I'm also getting this from mixtral:8x22b-instruct-v0.1-q3_K_M and mixtral:8x22b-instruct-v0.1-q3_K_S on my 64GB M3 Max. The normal mixtral:8x22b-instruct-v0.1-q4_0 works just fine (albeit very slowly). > > Q3KM is OK for me.@asmeurer ``` $ollama run --verbose mixtral:8x22b-instruct-v0.1-q3_K_M pulling manifest pulling 0303cc3e524f... 100% ▕███████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 67 GB pulling 43070e2d4e53... 100% ▕███████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 11 KB pulling c43332387573... 100% ▕███████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 67 B pulling ed11eda7790d... 100% ▕███████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 30 B pulling 8527655d54bc... 100% ▕███████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 487 B verifying sha256 digest writing manifest removing any unused layers success >>> Write a Python function to compute Fibonacci numbers. [/TOOL_RESULTS][control_17][PREFIX][control_36][SUFFIX][control_26][control_24][TOOL_CALLS][control_37][control_23][TOOL_RESULTS][SUFFIX][control_17][control_26][cont[/TOOL_RESULTS][control_17][PREFIX][control_36][SUFFIX][control_26][control_24][TOOL_CALLS][control_37][control_23][TOOL_RESULTS][SUFFIX][control_17][control_26][control_30][control_21][control_33][MIDDLE][control_17][control_14][control_14][control_33][control_25][control_16][MIDDLE][control_23][SUFFIX][MIDDLE][IMG][TOOL_RESULTS][ol_30][control_21][control_33][MIDDLE][control_17][control_14][control_14][control_33][control_25][control_16][MIDDLE][control_23][SUFFIX][MIDDLE][IMG][TOOL_RESULTS][control_32][TOOL_CALLS][TOOL_RESULTS][control_24][control_37][control_23][control_12][control_22][control_37][control_17][control_20][control_19][control_34][control_3ontrol_32][TOOL_CALLS][TOOL_RESULTS][control_24][control_37][control_23][control_12][control_22][control_37][control_17][control_20][control_19][control_34][control_30][AVAILABLE_TOOLS] total duration: 11.591028125s load duration: 6.261704584s prompt eval count: 16 token(s) prompt eval duration: 1.136044s prompt eval rate: 14.08 tokens/s eval count: 48 token(s) eval duration: 4.191977s eval rate: 11.45 tokens/s ```
Author
Owner

@Hao-Wu commented on GitHub (Apr 27, 2024):

on Apple M2Max tried 8x22b-instruct-v0.1-q4_K_M and 8x22b-instruct-v0.1-q4_1, both failed.

<!-- gh-comment-id:2081111329 --> @Hao-Wu commented on GitHub (Apr 27, 2024): on Apple M2Max tried [8x22b-instruct-v0.1-q4_K_M](https://ollama.com/library/mixtral:8x22b-instruct-v0.1-q4_K_M) and [8x22b-instruct-v0.1-q4_1](https://ollama.com/library/mixtral:8x22b-instruct-v0.1-q4_1), both failed.
Author
Owner

@mxyng commented on GitHub (May 2, 2024):

related #4028

<!-- gh-comment-id:2091533241 --> @mxyng commented on GitHub (May 2, 2024): related #4028
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#64325