[GH-ISSUE #15175] Add MLX prequantized import support for Nemotron-H architecture #9714

Open
opened 2026-04-12 22:35:44 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @FaisalFehad on GitHub (Mar 31, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15175

Summary

ollama create fails when importing MLX-quantized SafeTensors for Nemotron-H models with Error: unknown data type: U32.

PR #14878 added the tensorImportTransform framework with Qwen3.5 support. Requesting the same for NemotronHForCausalLM (model_type: nemotron_h). The architecture class for the registry would be NemotronHForCausalLM.

This also highlights that any MLX-quantized model outside of Qwen3.5 currently hits this same U32 error, since MLX quantization universally packs weights into U32 containers.

Model

Architecture

  • Hybrid Mamba-2 + Transformer Attention + Latent MoE
  • 120B total params, 12B active per token
  • 512 routed experts, 22 active, 1 shared
  • 88 layers alternating Mamba (M) and Attention+MoE (E)
  • Tensor types in SafeTensors: BF16 (weights), F32 (norms), U32 (quantized packed weights)

Steps to reproduce

# Ollama v0.19.0, macOS Apple Silicon

cat > Modelfile <<EOF2
FROM /path/to/Nemotron-3-Super-120B-A12B-MLX-6bit

TEMPLATE """{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""

PARAMETER stop "<|im_end|>"
PARAMETER stop "</s>"
PARAMETER num_ctx 8192
EOF2

ollama create nemotron-120b -f Modelfile
# Error: unknown data type: U32

Notes

  • The GGUF path works — llama.cpp added Nemotron 3 Super support in ggml-org/llama.cpp#20411
  • Native MLX import would let Apple Silicon users skip GGUF conversion
  • The tensorImportTransform framework from #14878 should make this straightforward to add
Originally created by @FaisalFehad on GitHub (Mar 31, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15175 ## Summary `ollama create` fails when importing MLX-quantized SafeTensors for Nemotron-H models with `Error: unknown data type: U32`. PR #14878 added the `tensorImportTransform` framework with Qwen3.5 support. Requesting the same for `NemotronHForCausalLM` (model_type: `nemotron_h`). The architecture class for the registry would be `NemotronHForCausalLM`. This also highlights that any MLX-quantized model outside of Qwen3.5 currently hits this same U32 error, since MLX quantization universally packs weights into U32 containers. ## Model - [FF-01/Nemotron-3-Super-120B-A12B-MLX-6bit](https://huggingface.co/FF-01/Nemotron-3-Super-120B-A12B-MLX-6bit) - Base: [nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16) - 6-bit affine quantization, ~92GB, 20 SafeTensors shards - Runs at ~43.6 tok/s on M5 Pro Max via mlx-lm ## Architecture - Hybrid Mamba-2 + Transformer Attention + Latent MoE - 120B total params, 12B active per token - 512 routed experts, 22 active, 1 shared - 88 layers alternating Mamba (M) and Attention+MoE (E) - Tensor types in SafeTensors: BF16 (weights), F32 (norms), U32 (quantized packed weights) ## Steps to reproduce ```bash # Ollama v0.19.0, macOS Apple Silicon cat > Modelfile <<EOF2 FROM /path/to/Nemotron-3-Super-120B-A12B-MLX-6bit TEMPLATE """{{- if .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}<|im_start|>user {{ .Prompt }}<|im_end|> <|im_start|>assistant """ PARAMETER stop "<|im_end|>" PARAMETER stop "</s>" PARAMETER num_ctx 8192 EOF2 ollama create nemotron-120b -f Modelfile # Error: unknown data type: U32 ``` ## Notes - The GGUF path works — llama.cpp added Nemotron 3 Super support in ggml-org/llama.cpp#20411 - Native MLX import would let Apple Silicon users skip GGUF conversion - The `tensorImportTransform` framework from #14878 should make this straightforward to add
GiteaMirror added the mlx label 2026-04-12 22:35:44 -05:00
Author
Owner

@CJSen commented on GitHub (Apr 2, 2026):

same error: “Error: unknown data type: U32”,it seems nobody notice。

Ollama v0.19.0/v0.20.0 preview, macOS Apple Silicon, M2

FROM /path/.omlx/models/MLX-Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-4bit
TEMPLATE """{{ .Prompt }}"""

the model url:https://huggingface.co/Jackrong/MLX-Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-4bit

ollama create qwen3_5-mlx-opus4_6:9b -f Modelfile
gathering model components
copying file sha256:87a7830d63fcfxxxd8e2de4 100%
copying file sha256:215f9abadxxxxxxxxxx6ed0b 100%
copying file sha256:f88842172xxxxx8bf4c0cef9 100%
copying file sha256:a05ffa41xxxx8cb847v8ae96 100%
copying file sha256:6a0bd42f49xxxxb0f6319190 100%
copying file sha256:ba8536f5386a3xxxx42f6f58 100%
converting model
Error: unknown data type: U32
<!-- gh-comment-id:4179256230 --> @CJSen commented on GitHub (Apr 2, 2026): # same error: “Error: unknown data type: U32”,it seems nobody notice。 ## Ollama v0.19.0/v0.20.0 preview, macOS Apple Silicon, M2 ``` Modelfile FROM /path/.omlx/models/MLX-Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-4bit TEMPLATE """{{ .Prompt }}""" ``` the model url:https://huggingface.co/Jackrong/MLX-Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-4bit ``` bash ollama create qwen3_5-mlx-opus4_6:9b -f Modelfile gathering model components copying file sha256:87a7830d63fcfxxxd8e2de4 100% copying file sha256:215f9abadxxxxxxxxxx6ed0b 100% copying file sha256:f88842172xxxxx8bf4c0cef9 100% copying file sha256:a05ffa41xxxx8cb847v8ae96 100% copying file sha256:6a0bd42f49xxxxb0f6319190 100% copying file sha256:ba8536f5386a3xxxx42f6f58 100% converting model Error: unknown data type: U32 ```
Author
Owner

@toughcoding commented on GitHub (Apr 8, 2026):

indeed some models do not work
this works mlx-community/Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit
but this not
mlx-community/gemma-4-26b-a4b-it-mxfp4

ollama list

NAME                                                     ID              SIZE     MODIFIED           

gemma-4-26b-a4b-it-mxfp4:latest                          2d1b607d3ef8    14 GB    About a minute ago

ollama --verbose run gemma-4-26b-a4b-it-mxfp4:latest 

Error: 500 Internal Server Error: mlx runner failed: Error: unsupported architecture: Gemma4ForConditionalGeneration (exit: exit status 1)
<!-- gh-comment-id:4210015370 --> @toughcoding commented on GitHub (Apr 8, 2026): indeed some models do not work this works `mlx-community/Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit` but this not `mlx-community/gemma-4-26b-a4b-it-mxfp4` ``` ollama list NAME                                                     ID              SIZE     MODIFIED            gemma-4-26b-a4b-it-mxfp4:latest                          2d1b607d3ef8    14 GB    About a minute ago ollama --verbose run gemma-4-26b-a4b-it-mxfp4:latest  Error: 500 Internal Server Error: mlx runner failed: Error: unsupported architecture: Gemma4ForConditionalGeneration (exit: exit status 1) ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9714