[GH-ISSUE #15399] Add the ability to quantize a model with nvfp4 #56362

Closed
opened 2026-04-29 10:42:46 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @ErwanMAS on GitHub (Apr 7, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15399

We can not --quantize a model to nvfp4 .

$ uname
Darwin
$ollama --version
ollama version is 0.20.0
$ollama create my-modele-nvfp4  --quantize nvfp4 -f my-modele
gathering model components
Error: 500 Internal Server Error: unsupported quantization type NVFP4 - supported types are F32, F16, Q4_K_S, Q4_K_M, Q8_0

Originally created by @ErwanMAS on GitHub (Apr 7, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15399 We can not --quantize a model to `nvfp4` . ``` $ uname Darwin $ollama --version ollama version is 0.20.0 $ollama create my-modele-nvfp4 --quantize nvfp4 -f my-modele gathering model components Error: 500 Internal Server Error: unsupported quantization type NVFP4 - supported types are F32, F16, Q4_K_S, Q4_K_M, Q8_0 ```
GiteaMirror added the feature request label 2026-04-29 10:42:46 -05:00
Author
Owner

@rick-github commented on GitHub (Apr 7, 2026):

ollama create --experimental my-modele-nvfp4  --quantize nvfp4 -f my-modele
<!-- gh-comment-id:4201824445 --> @rick-github commented on GitHub (Apr 7, 2026): ``` ollama create --experimental my-modele-nvfp4 --quantize nvfp4 -f my-modele ```
Author
Owner

@pdevine commented on GitHub (Apr 7, 2026):

@rick-github has the correct answer. We'll take away the --experimental flag as the mlxrunner matures.

<!-- gh-comment-id:4202467555 --> @pdevine commented on GitHub (Apr 7, 2026): @rick-github has the correct answer. We'll take away the `--experimental` flag as the mlxrunner matures.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#56362