[GH-ISSUE #14155] FP16/BF16 Qwen3-next-code on main Ollama repository #9229

Closed
opened 2026-04-12 22:05:48 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @rjmalagon on GitHub (Feb 8, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14155

Hi, I kindly ask if you can upload qwen3-coder-next:fp16 or qwen3-coder-next:bf16 on the official repo.

I would like to test the precision of the active parameters on the unquantized model, just like the available qwen3-next:80b-a3b-instruct-fp16 and qwen3-next:80b-a3b-thinking-fp16 models.

Originally created by @rjmalagon on GitHub (Feb 8, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14155 Hi, I kindly ask if you can upload qwen3-coder-next:fp16 or qwen3-coder-next:bf16 on the official repo. I would like to test the precision of the active parameters on the unquantized model, just like the available qwen3-next:80b-a3b-instruct-fp16 and qwen3-next:80b-a3b-thinking-fp16 models.
Author
Owner

@rick-github commented on GitHub (Feb 8, 2026):

Imported from Qwen/Qwen3-Coder-Next-GGUF: frob/qwen3-coder-next:80b-a3b-f16

<!-- gh-comment-id:3868191624 --> @rick-github commented on GitHub (Feb 8, 2026): Imported from [Qwen/Qwen3-Coder-Next-GGUF](https://huggingface.co/Qwen/Qwen3-Coder-Next-GGUF): [frob/qwen3-coder-next:80b-a3b-f16](https://ollama.com/frob/qwen3-coder-next:80b-a3b-f16)
Author
Owner

@Eb7CAPJi commented on GitHub (Feb 8, 2026):

Please add the ability to download the model unsloth/Qwen3-Coder-Next-GGUF.
Currently, when attempting to use it, the following error appears:
API Error: Error from Ollama: AI_RetryError: Failed after 5 attempts. Last error: API Error: Status Code 500

<!-- gh-comment-id:3868202738 --> @Eb7CAPJi commented on GitHub (Feb 8, 2026): Please add the ability to download the model unsloth/Qwen3-Coder-Next-GGUF. Currently, when attempting to use it, the following error appears: API Error: Error from Ollama: AI_RetryError: Failed after 5 attempts. Last error: API Error: Status Code 500
Author
Owner

@rick-github commented on GitHub (Feb 8, 2026):

@Eb7CAPJi https://github.com/ollama/ollama/issues/14086

<!-- gh-comment-id:3868219196 --> @rick-github commented on GitHub (Feb 8, 2026): @Eb7CAPJi https://github.com/ollama/ollama/issues/14086
Author
Owner

@rjmalagon commented on GitHub (Feb 8, 2026):

Imported from Qwen/Qwen3-Coder-Next-GGUF: frob/qwen3-coder-next:80b-a3b-f16

Thanks, @rick-github , enough for my use case.
Just to learn, what are your steps for importing the official GGUF releases? (if they are any different from the available docs)
And abusing your consideration, do you know how to import visual/text models with GGUF? I am not so sure about how to include the projection files in the Ollama model files.

<!-- gh-comment-id:3868249940 --> @rjmalagon commented on GitHub (Feb 8, 2026): > Imported from [Qwen/Qwen3-Coder-Next-GGUF](https://huggingface.co/Qwen/Qwen3-Coder-Next-GGUF): [frob/qwen3-coder-next:80b-a3b-f16](https://ollama.com/frob/qwen3-coder-next:80b-a3b-f16) Thanks, @rick-github , enough for my use case. Just to learn, what are your steps for importing the official GGUF releases? (if they are any different from the available docs) And abusing your consideration, do you know how to import visual/text models with GGUF? I am not so sure about how to include the projection files in the Ollama model files.
Author
Owner

@rick-github commented on GitHub (Feb 8, 2026):

The available docs are correct, but in the case of large models (like qwen3-coder-next) an additional step is required. Large models on HuggingFace are typically sharded into smaller pieces, so the final GGUF has to be assembled from the shards with the llama.cpp tool llama-gguf-split:

$ hf download --local-dir . Qwen/Qwen3-Coder-Next-GGUF --include Qwen3-Coder-Next-F16/*
$ llama-gguf-split --merge Qwen3-Coder-Next-F16/*00001-of-*.gguf Qwen3-Coder-Next-F16.gguf
$ ollama show --modelfile qwen3-next | grep -v FROM > Modelfile
$ echo FROM Qwen3-Coder-Next-F16.gguf >> Modelfile
$ ollama create qwen/qwen3-coder-next:f16

If you are importing a multi-modal model, you just need to add a FROM line for each of the GGUFs:

$ ollama show --modelfile qwen3-vl:30b-a3b-instruct-q4_K_M | grep -v FROM > Modelfile
$ echo FROM Qwen3-VL-30B-A3B-Instruct-Q4_K_M.gguf >> Modelfile
$ echo FROM mmproj-F16.gguf >> Modelfile
$ ollama create unsloth/qwen3-vl:30b-a3b-instruct-q4_K_M

However, split models (text and vision weights in different files) must run with the llama.cpp backend, and if that doesn't support the model, it will not be loaded.

<!-- gh-comment-id:3868318729 --> @rick-github commented on GitHub (Feb 8, 2026): The available docs are correct, but in the case of large models (like qwen3-coder-next) an additional step is required. Large models on HuggingFace are typically sharded into smaller pieces, so the final GGUF has to be assembled from the shards with the [llama.cpp](https://github.com/ggml-org/llama.cpp/releases) tool `llama-gguf-split`: ```console $ hf download --local-dir . Qwen/Qwen3-Coder-Next-GGUF --include Qwen3-Coder-Next-F16/* $ llama-gguf-split --merge Qwen3-Coder-Next-F16/*00001-of-*.gguf Qwen3-Coder-Next-F16.gguf $ ollama show --modelfile qwen3-next | grep -v FROM > Modelfile $ echo FROM Qwen3-Coder-Next-F16.gguf >> Modelfile $ ollama create qwen/qwen3-coder-next:f16 ``` If you are importing a multi-modal model, you just need to add a FROM line for each of the GGUFs: ```console $ ollama show --modelfile qwen3-vl:30b-a3b-instruct-q4_K_M | grep -v FROM > Modelfile $ echo FROM Qwen3-VL-30B-A3B-Instruct-Q4_K_M.gguf >> Modelfile $ echo FROM mmproj-F16.gguf >> Modelfile $ ollama create unsloth/qwen3-vl:30b-a3b-instruct-q4_K_M ``` However, split models (text and vision weights in different files) must run with the llama.cpp backend, and if that doesn't support the model, it will not be loaded.
Author
Owner

@rjmalagon commented on GitHub (Feb 10, 2026):

Thanks for the info, @rick-github. Confirm my current knowledge and expand this little fact I didn't know. Text/vision models require import from source weights in order to use the Ollama engine.

<!-- gh-comment-id:3878605890 --> @rjmalagon commented on GitHub (Feb 10, 2026): Thanks for the info, @rick-github. Confirm my current knowledge and expand this little fact I didn't know. Text/vision models require import from source weights in order to use the Ollama engine.
Author
Owner

@rick-github commented on GitHub (Feb 10, 2026):

Text/vision models require import from source weights in order to use the Ollama engine.

Yes. Unfortunately not all model architectures are supported by the safetensors conversion code in ollama, so there are still models that need to be converted to GGUF by llama.cpp and hence cannot run on the ollama engine.

<!-- gh-comment-id:3878728341 --> @rick-github commented on GitHub (Feb 10, 2026): > Text/vision models require import from source weights in order to use the Ollama engine. Yes. Unfortunately not all model architectures are supported by the safetensors conversion code in ollama, so there are still models that need to be converted to GGUF by llama.cpp and hence cannot run on the ollama engine.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9229