[GH-ISSUE #3695] Splitting layers on macOS gives incorrect output #28037

Closed
opened 2026-04-22 05:46:08 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @sebastiandeutsch on GitHub (Apr 17, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3695

Originally assigned to: @jmorganca on GitHub.

What is the issue?

When running mixtral:8x22b it won't give any meaningful results:

>>> What is the capital of france? ANSWER:
	


>>> hi
	#" #$

>>>

Just jibberish

What did you expect to see?

When I run

ollama run dolphin-mixtral

the output is

>>> What is the capital of france? ANSWER:
 The capital of France is Paris.

Steps to reproduce

I'm running ollama 0.1.32
on OSX 14.1.1 (M2 Max / 96 GB RAM)

Are there any recent changes that introduced the issue?

No response

OS

macOS

Architecture

arm64

Platform

No response

Ollama version

0.1.32

GPU

Apple

GPU info

M2 MAX / 96GB RAM

CPU

Apple

Other software

No response

Originally created by @sebastiandeutsch on GitHub (Apr 17, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3695 Originally assigned to: @jmorganca on GitHub. ### What is the issue? When running mixtral:8x22b it won't give any meaningful results: ``` >>> What is the capital of france? ANSWER: >>> hi #" #$ >>> ``` Just jibberish ### What did you expect to see? When I run ``` ollama run dolphin-mixtral ``` the output is ``` >>> What is the capital of france? ANSWER: The capital of France is Paris. ``` ### Steps to reproduce I'm running ollama 0.1.32 on OSX 14.1.1 (M2 Max / 96 GB RAM) ### Are there any recent changes that introduced the issue? _No response_ ### OS macOS ### Architecture arm64 ### Platform _No response_ ### Ollama version 0.1.32 ### GPU Apple ### GPU info M2 MAX / 96GB RAM ### CPU Apple ### Other software _No response_
GiteaMirror added the bug label 2026-04-22 05:46:08 -05:00
Author
Owner

@tosh commented on GitHub (Apr 17, 2024):

Getting the same result w/ 0.1.32: garbled response w/ hashtags on macOS with Wizard

maybe something about the template? 👋 @sebastiandeutsch

<!-- gh-comment-id:2061122733 --> @tosh commented on GitHub (Apr 17, 2024): Getting the same result w/ 0.1.32: garbled response w/ hashtags on macOS with Wizard maybe something about the template? 👋 @sebastiandeutsch
Author
Owner

@jmorganca commented on GitHub (Apr 17, 2024):

Thanks for the issue and sorry about this. I think it might be from splitting the model over cpu/gpu layers, in the meantime you try /set parameter num_gpu 0 does the answer become comprehensible? (noting that Mixtral 8x22b is a text completion model)

Also, do you see similar issues with wizardlm2:8x22b?

<!-- gh-comment-id:2061139776 --> @jmorganca commented on GitHub (Apr 17, 2024): Thanks for the issue and sorry about this. I think it might be from splitting the model over cpu/gpu layers, in the meantime you try `/set parameter num_gpu 0` does the answer become comprehensible? (noting that Mixtral 8x22b is a text completion model) Also, do you see similar issues with `wizardlm2:8x22b`?
Author
Owner

@sebastiandeutsch commented on GitHub (Apr 17, 2024):

/set parameter num_gpu 0 did the trick to get output ❤️

<!-- gh-comment-id:2061316832 --> @sebastiandeutsch commented on GitHub (Apr 17, 2024): `/set parameter num_gpu 0` did the trick to get output ❤️
Author
Owner

@antonme commented on GitHub (Apr 17, 2024):

I have the same problem with all 8x22 models, num_gpu 0 helps, but makes them very slow (since GPU does not seem to work full throttle in that case)

<!-- gh-comment-id:2061904303 --> @antonme commented on GitHub (Apr 17, 2024): I have the same problem with all 8x22 models, num_gpu 0 helps, but makes them very slow (since GPU does not seem to work full throttle in that case)
Author
Owner

@jmorganca commented on GitHub (Nov 17, 2024):

Hi there, this was from an issue with Ollama's memory estimation. It should largely be fixed now, but let me know if you're continuing to see the issue.

<!-- gh-comment-id:2481432015 --> @jmorganca commented on GitHub (Nov 17, 2024): Hi there, this was from an issue with Ollama's memory estimation. It should largely be fixed now, but let me know if you're continuing to see the issue.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#28037