[GH-ISSUE #15307] mlx runner failed: Error: unsupported architecture: Gemma4ForConditionalGeneration #35553

Closed
opened 2026-04-22 20:07:30 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @chigkim on GitHub (Apr 3, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15307

What is the issue?

When I import a finetuned Gemma4 to mxfp8, it says successfully imported gemma4:26b-a4b-heretic-mxfp8 with 1018 layers.
However I get this error when I try to use it.
Error: 500 Internal Server Error: mlx runner failed: Error: unsupported architecture: Gemma4ForConditionalGeneration (exit: exit status 1)

Relevant log output

GOMAXPROCS=1 ollama serve
ollama create gemma4:26b-a4b-heretic-mxfp8 --experimental -f gemma4-26b.modelfile -q mxfp8
importing safetensors model
importing safetensors model
importing model-00001-of-00006.safetensors (112 tensors, quantizing to mxfp8)
importing model-00002-of-00006.safetensors (131 tensors, quantizing to mxfp8)
importing model-00003-of-00006.safetensors (131 tensors, quantizing to mxfp8)
importing model-00004-of-00006.safetensors (131 tensors, quantizing to mxfp8)
importing model-00005-of-00006.safetensors (131 tensors, quantizing to mxfp8)
importing model-00006-of-00006.safetensors (377 tensors, quantizing to mxfp8)
importing config config.json
importing config generation_config.json
importing config processor_config.json
importing config tokenizer.json
importing config tokenizer_config.json
writing manifest for gemma4:26b-a4b-heretic-mxfp8
successfully imported gemma4:26b-a4b-heretic-mxfp8 with 1018 layers

ollama run gemma4:26b-a4b-heretic-mxfp8
Error: 500 Internal Server Error: mlx runner failed: Error: unsupported architecture: Gemma4ForConditionalGeneration (exit: exit status 1)

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.20.0

Originally created by @chigkim on GitHub (Apr 3, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15307 ### What is the issue? When I import a finetuned Gemma4 to mxfp8, it says `successfully imported gemma4:26b-a4b-heretic-mxfp8 with 1018 layers`. However I get this error when I try to use it. Error: 500 Internal Server Error: mlx runner failed: Error: unsupported architecture: Gemma4ForConditionalGeneration (exit: exit status 1) ### Relevant log output ```shell GOMAXPROCS=1 ollama serve ollama create gemma4:26b-a4b-heretic-mxfp8 --experimental -f gemma4-26b.modelfile -q mxfp8 importing safetensors model importing safetensors model importing model-00001-of-00006.safetensors (112 tensors, quantizing to mxfp8) importing model-00002-of-00006.safetensors (131 tensors, quantizing to mxfp8) importing model-00003-of-00006.safetensors (131 tensors, quantizing to mxfp8) importing model-00004-of-00006.safetensors (131 tensors, quantizing to mxfp8) importing model-00005-of-00006.safetensors (131 tensors, quantizing to mxfp8) importing model-00006-of-00006.safetensors (377 tensors, quantizing to mxfp8) importing config config.json importing config generation_config.json importing config processor_config.json importing config tokenizer.json importing config tokenizer_config.json writing manifest for gemma4:26b-a4b-heretic-mxfp8 successfully imported gemma4:26b-a4b-heretic-mxfp8 with 1018 layers ollama run gemma4:26b-a4b-heretic-mxfp8 Error: 500 Internal Server Error: mlx runner failed: Error: unsupported architecture: Gemma4ForConditionalGeneration (exit: exit status 1) ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.20.0
GiteaMirror added the bug label 2026-04-22 20:07:30 -05:00
Author
Owner

@rick-github commented on GitHub (Apr 4, 2026):

#15244

<!-- gh-comment-id:4186062822 --> @rick-github commented on GitHub (Apr 4, 2026): #15244
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#35553