[GH-ISSUE #15172] Can't pull nvfp4 #9711

Open
opened 2026-04-12 22:35:24 -05:00 by GiteaMirror · 10 comments
Owner

Originally created by @MarkWard0110 on GitHub (Mar 31, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15172

What is the issue?

ollama pull qwen3.5:27b-nvfp4
pulling manifest
Error: pull model manifest: 412: this model requires macOS

Relevant log output


OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

No response

Originally created by @MarkWard0110 on GitHub (Mar 31, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15172 ### What is the issue? ollama pull qwen3.5:27b-nvfp4 pulling manifest Error: pull model manifest: 412: this model requires macOS ### Relevant log output ```shell ``` ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version _No response_
GiteaMirror added the bug label 2026-04-12 22:35:24 -05:00
Author
Owner

@rick-github commented on GitHub (Mar 31, 2026):

nvfp4 is an MLX model.

<!-- gh-comment-id:4163001939 --> @rick-github commented on GitHub (Mar 31, 2026): nvfp4 is an [MLX](https://ollama.com/blog/mlx#:~:text=Please%20make%20sure%20you%20have%20a%20Mac%20with%20more%20than%2032GB%20of%20unified%20memory) model.
Author
Owner

@mmartial commented on GitHub (Mar 31, 2026):

This might explain why I can not find the nvfp4 version of nemotron-3-super (which is published on HF).
The MLX models rely on nvfp4 encoding?
nvfp4 is also the recommended format for Nvidia's Blackwell series GPU; I had the same question as OP

<!-- gh-comment-id:4164206447 --> @mmartial commented on GitHub (Mar 31, 2026): This might explain why I can not find the `nvfp4` version of `nemotron-3-super` (which is published on HF). The MLX models rely on nvfp4 encoding? `nvfp4` is also the recommended format for Nvidia's Blackwell series GPU; I had the same question as OP
Author
Owner

@GO1984 commented on GitHub (Mar 31, 2026):

NVFP4 is a 4-bit floating-point format introduced with NVIDIA Blackwell GPUs to maintain model accuracy while reducing memory bandwidth and storage requirements for inference workloads.

https://build.nvidia.com/spark/nvfp4-quantization

Therefore should be running fon on a nvidia dgx spark with an GB10. GB stands for grace blackwell.

<!-- gh-comment-id:4164417025 --> @GO1984 commented on GitHub (Mar 31, 2026): NVFP4 is a 4-bit floating-point format introduced with NVIDIA Blackwell GPUs to maintain model accuracy while reducing memory bandwidth and storage requirements for inference workloads. https://build.nvidia.com/spark/nvfp4-quantization Therefore should be running fon on a nvidia dgx spark with an GB10. GB stands for grace blackwell.
Author
Owner

@rick-github commented on GitHub (Mar 31, 2026):

The MLX backend is the only backend that currently supports NVFP4. The MLX backend currently only runs on macOS.

<!-- gh-comment-id:4164433151 --> @rick-github commented on GitHub (Mar 31, 2026): The MLX backend is the only backend that currently supports NVFP4. The MLX backend currently only runs on macOS.
Author
Owner

@GO1984 commented on GitHub (Mar 31, 2026):

since vllm on dgx spark already accepts nvfp4-images. there shouldn't be a problem to support nvfp4 in future ollama versions.

<!-- gh-comment-id:4164459153 --> @GO1984 commented on GitHub (Mar 31, 2026): since vllm on dgx spark already accepts nvfp4-images. there shouldn't be a problem to support nvfp4 in future ollama versions.
Author
Owner

@MarkWard0110 commented on GitHub (Apr 1, 2026):

nvfp4 is an MLX model.

The problem is claiming nvfp4 is an MLX model. I think what you mean is the nvfp4 variant was created because the MLX engine in Ollama will support it. Unless you mean the model itself is a variant tightly coupled to MLX and macOS? If that is the case, why not use a more expressive tag?

nvfp4 does not imply MLX when knowing that NVFP4 is a 4-bit floating-point format introduced with NVIDIA Blackwell GPUs.

The confusion is the release of model tags that will not run unless you have Ollama running on a specific platform (macOS), and tags that identify with a specific hardware vendor, Nvidia.

<!-- gh-comment-id:4166770037 --> @MarkWard0110 commented on GitHub (Apr 1, 2026): > nvfp4 is an [MLX](https://ollama.com/blog/mlx#:~:text=Please%20make%20sure%20you%20have%20a%20Mac%20with%20more%20than%2032GB%20of%20unified%20memory) model. The problem is claiming nvfp4 is an MLX model. I think what you mean is the nvfp4 variant was created because the MLX engine in Ollama will support it. Unless you mean the model itself is a variant tightly coupled to MLX and macOS? If that is the case, why not use a more expressive tag? nvfp4 does not imply MLX when knowing that NVFP4 is a 4-bit floating-point format introduced with NVIDIA Blackwell GPUs. The confusion is the release of model tags that will not run unless you have Ollama running on a specific platform (macOS), and tags that identify with a specific hardware vendor, Nvidia.
Author
Owner

@rick-github commented on GitHub (Apr 1, 2026):

The MLX backend is the only backend that currently supports NVFP4. The MLX backend currently only runs on macOS.

<!-- gh-comment-id:4166789941 --> @rick-github commented on GitHub (Apr 1, 2026): The MLX backend is the only backend that currently supports NVFP4. The MLX backend currently only runs on macOS.
Author
Owner

@MarkWard0110 commented on GitHub (Apr 1, 2026):

Even with MLX built locally and runner available, you still can’t pull that variant because the registry enforces platform gating at pull time and is rejecting Linux for that tag.

I built MLX on Linux and I can't pull the model when I have the MLX runner built?

<!-- gh-comment-id:4167108009 --> @MarkWard0110 commented on GitHub (Apr 1, 2026): Even with MLX built locally and runner available, you still can’t pull that variant because the registry enforces platform gating at pull time and is rejecting Linux for that tag. I built MLX on Linux and I can't pull the model when I have the MLX runner built?
Author
Owner

@MarkWard0110 commented on GitHub (Apr 1, 2026):

Probe command:
go run ./server/tools/registry_ua_probe

Observed output:
UA="ollama/v0.0.0 (amd64 linux) Go/go1.24.5" resolve error: registry responded with status 412: this model requires macOS
UA="ollama/v0.0.0 (arm64 darwin) Go/go1.24.5" resolve success

<!-- gh-comment-id:4167164784 --> @MarkWard0110 commented on GitHub (Apr 1, 2026): Probe command: go run ./server/tools/registry_ua_probe Observed output: UA="ollama/v0.0.0 (amd64 linux) Go/go1.24.5" resolve error: registry responded with status 412: this model requires macOS UA="ollama/v0.0.0 (arm64 darwin) Go/go1.24.5" resolve success
Author
Owner

@anirbanbasu commented on GitHub (Apr 1, 2026):

I created a tool called ODIR to download models for Ollama because it fails to download models when using a man-in-the-middle authenticated proxy with a custom certificate (despite configuring that correctly and despite curl working fine with such configuration). However, ODIR also fails with the HTTP 412 error even when I run it on macOS on Apple Silicon (M3 Ultra). See: https://github.com/anirbanbasu/odir/issues/27.

@MarkWard0110 passing "ollama/v0.19.0 (arm64 darwin) Go/go1.24.5" to curl for the URL https://registry.ollama.ai/v2/library/qwen3.5/manifests/35b-a3b-coding-nvfp4 now leads to a 401 error. Have you encountered this error?

Adding an Ollama API key to the authentication header does not work either. Running curl https://registry.ollama.ai/v2/library/qwen3.5/manifests/35b-a3b-coding-nvfp4 -H "Authorization: Bearer <valid-api-key>" -H "User-Agent: ollama/v0.19.0 (arm64 darwin) Go/go1.24.5" returns {"errors":[{"code":"UNAUTHORIZED","message":"unauthorized"}]}.

Further update, the unauthorised error goes away if the user-agent is not pretending to be Ollama. Running curl https://registry.ollama.ai/v2/library/qwen3.5/manifests/35b-a3b-coding-nvfp4 -H "User-Agent: abcd/v0.19.0 (arm64 darwin)" returns the expected list of artefacts that can be downloaded.

<!-- gh-comment-id:4167330262 --> @anirbanbasu commented on GitHub (Apr 1, 2026): I created a tool called ODIR to download models for Ollama because it fails to download models when using a man-in-the-middle authenticated proxy with a custom certificate (despite configuring that correctly and despite `curl` working fine with such configuration). However, ODIR also fails with the HTTP 412 error even when I run it on macOS on Apple Silicon (M3 Ultra). See: https://github.com/anirbanbasu/odir/issues/27. @MarkWard0110 passing "ollama/v0.19.0 (arm64 darwin) Go/go1.24.5" to `curl` for the URL https://registry.ollama.ai/v2/library/qwen3.5/manifests/35b-a3b-coding-nvfp4 now leads to a 401 error. Have you encountered this error? Adding an Ollama API key to the authentication header does not work either. Running `curl https://registry.ollama.ai/v2/library/qwen3.5/manifests/35b-a3b-coding-nvfp4 -H "Authorization: Bearer <valid-api-key>" -H "User-Agent: ollama/v0.19.0 (arm64 darwin) Go/go1.24.5"` returns `{"errors":[{"code":"UNAUTHORIZED","message":"unauthorized"}]}`. Further update, the unauthorised error goes away if the user-agent is not pretending to be Ollama. Running `curl https://registry.ollama.ai/v2/library/qwen3.5/manifests/35b-a3b-coding-nvfp4 -H "User-Agent: abcd/v0.19.0 (arm64 darwin)"` returns the expected list of artefacts that can be downloaded.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9711