[GH-ISSUE #11796] Type code for MXFP4 Quantization different from llama.cpp #33584

Closed
opened 2026-04-22 16:26:35 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @chaserhkj on GitHub (Aug 7, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11796

What is the issue?

I just discovered that type code used for MXFP4 quantization defined in ollama is different then defined in llama.cpp:

f2e9c9aff5/fs/ggml/ggml.go (L303)

Defined in llama.cpp:

MXFP4   = 39

This is causing fine-tuned gpt-oss models converted using llama.cpp unable to be properly loaded by ollama, see logs below.

Relevant log output

time=2025-08-07T21:42:35.398Z level=INFO source=server.go:175 msg=offload library=rocm layers.requested=-1 layers.model=25 layers.offload=25 layers.split="" memory.available="[16.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="2.2 GiB" memory.required.partial="2.2 GiB" memory.required.kv="192.0 MiB" memory.required.allocations="[2.2 GiB]" memory.weights.total="1.2 GiB" memory.weights.repeating="680.5 MiB" memory.weights.nonrepeating="586.8 MiB" memory.graph.full="256.0 MiB" memory.graph.partial="256.0 MiB"
gguf_init_from_file_impl: tensor 'blk.0.ffn_down_exps.weight' has invalid ggml type 39 (NONE)
gguf_init_from_file_impl: failed to read tensor info
[GIN] 2025/08/07 - 21:42:35 | 500 |  148.708508ms |   192.168.110.1 | POST     "/api/chat"
[GIN] 2025/08/07 - 21:43:38 | 200 |    5.979947ms |   192.168.110.1 | GET      "/api/tags"
llama_model_load: error loading model: llama_model_loader: failed to load model from /root/.ollama/models/blobs/sha256-aa130697e8a9d565bf6d4eb72efa65005269e69705232aab42e39c1fd10276ea

llama_model_load_from_file_impl: failed to load model

OS

Docker

GPU

AMD

CPU

AMD

Ollama version

0.11.3

Originally created by @chaserhkj on GitHub (Aug 7, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11796 ### What is the issue? I just discovered that type code used for MXFP4 quantization defined in ollama is different then defined in llama.cpp: https://github.com/ollama/ollama/blob/f2e9c9aff5f59b21a5d9a9668408732b3de01e20/fs/ggml/ggml.go#L303 [Defined](https://github.com/ggml-org/llama.cpp/blob/50aa9389014bba2dd12234132aa6b8ca3601a17f/gguf-py/gguf/constants.py#L2730C1-L2730C17) in llama.cpp: ```cpp MXFP4 = 39 ``` This is causing fine-tuned gpt-oss models converted using llama.cpp unable to be properly loaded by ollama, see logs below. ### Relevant log output ```shell time=2025-08-07T21:42:35.398Z level=INFO source=server.go:175 msg=offload library=rocm layers.requested=-1 layers.model=25 layers.offload=25 layers.split="" memory.available="[16.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="2.2 GiB" memory.required.partial="2.2 GiB" memory.required.kv="192.0 MiB" memory.required.allocations="[2.2 GiB]" memory.weights.total="1.2 GiB" memory.weights.repeating="680.5 MiB" memory.weights.nonrepeating="586.8 MiB" memory.graph.full="256.0 MiB" memory.graph.partial="256.0 MiB" gguf_init_from_file_impl: tensor 'blk.0.ffn_down_exps.weight' has invalid ggml type 39 (NONE) gguf_init_from_file_impl: failed to read tensor info [GIN] 2025/08/07 - 21:42:35 | 500 | 148.708508ms | 192.168.110.1 | POST "/api/chat" [GIN] 2025/08/07 - 21:43:38 | 200 | 5.979947ms | 192.168.110.1 | GET "/api/tags" llama_model_load: error loading model: llama_model_loader: failed to load model from /root/.ollama/models/blobs/sha256-aa130697e8a9d565bf6d4eb72efa65005269e69705232aab42e39c1fd10276ea llama_model_load_from_file_impl: failed to load model ``` ### OS Docker ### GPU AMD ### CPU AMD ### Ollama version 0.11.3
GiteaMirror added the bug label 2026-04-22 16:26:35 -05:00
Author
Owner

@rick-github commented on GitHub (Aug 7, 2025):

#11714, #11710

<!-- gh-comment-id:3166048022 --> @rick-github commented on GitHub (Aug 7, 2025): #11714, #11710
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#33584