[GH-ISSUE #10459] Support Issue of some Qwen3 Series #6877

Closed
opened 2026-04-12 18:43:50 -05:00 by GiteaMirror · 27 comments
Owner

Originally created by @xuanzhec on GitHub (Apr 29, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10459

Hi all,

Recently I just tested the qwen3 series on mac ollama, and I do believe some models are not being well supported to run, eg. the error indicating Error: POST predict: Post "http://127.0.0.1:50386/completion": EOF when running qwen3:14b-q8_0, but the default 8b is ready and well running.

BTW, the version shall be fine as ollama version is 0.6.6.

Originally created by @xuanzhec on GitHub (Apr 29, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10459 Hi all, Recently I just tested the qwen3 series on mac ollama, and I do believe some models are not being well supported to run, eg. the error indicating ```Error: POST predict: Post "http://127.0.0.1:50386/completion": EOF``` when running ``` qwen3:14b-q8_0```, but the default 8b is ready and well running. BTW, the version shall be fine as ```ollama version is 0.6.6```.
Author
Owner

@BradKML commented on GitHub (Apr 29, 2025):

Also referenced here I think? https://github.com/ollama/ollama/issues/10454#issuecomment-2837175276

<!-- gh-comment-id:2837295592 --> @BradKML commented on GitHub (Apr 29, 2025): Also referenced here I think? https://github.com/ollama/ollama/issues/10454#issuecomment-2837175276
Author
Owner

@iSky172 commented on GitHub (Apr 29, 2025):

Agree. Same error on my setup. 3090ti 24GB, Win 10 pro, WSL2, Ubuntu, Ollama. All drivers and scripts up to date. None of the reasonable-for-home-use on an nVidia GPU (Qwen3 .6 up to 14b) work - all produce the same error. Other models work just fine as always through simple model change in Ollama and without a restart or reload.

<!-- gh-comment-id:2837327998 --> @iSky172 commented on GitHub (Apr 29, 2025): Agree. Same error on my setup. 3090ti 24GB, Win 10 pro, WSL2, Ubuntu, Ollama. All drivers and scripts up to date. None of the reasonable-for-home-use on an nVidia GPU (Qwen3 .6 up to 14b) work - all produce the same error. Other models work just fine as always through simple model change in Ollama and without a restart or reload.
Author
Owner

@yjwu-leadstec commented on GitHub (Apr 29, 2025):

Unable to load model

<!-- gh-comment-id:2837435659 --> @yjwu-leadstec commented on GitHub (Apr 29, 2025): Unable to load model
Author
Owner

@MrWangChong commented on GitHub (Apr 29, 2025):

I can't run Qwen3

Image

Image

<!-- gh-comment-id:2837492172 --> @MrWangChong commented on GitHub (Apr 29, 2025): I can't run Qwen3 ![Image](https://github.com/user-attachments/assets/df359742-f432-4b39-8a0a-9bf25a912f20) ![Image](https://github.com/user-attachments/assets/5a099d42-a44a-4f51-91d3-fe00e41b9649)
Author
Owner

@Meshwa428 commented on GitHub (Apr 29, 2025):

One more thing to note is that in transformers qwen3 supports a toggle option to enable or disable thinking.

But ollama doesn't have it?!?!

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True  # True is the default value for enable_thinking
)
<!-- gh-comment-id:2837493420 --> @Meshwa428 commented on GitHub (Apr 29, 2025): One more thing to note is that in transformers qwen3 supports a toggle option to enable or disable thinking. But ollama doesn't have it?!?! ```py text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, enable_thinking=True # True is the default value for enable_thinking ) ```
Author
Owner

@xuanzhec commented on GitHub (Apr 29, 2025):

And the Qwen3:30b-a3b MoE model can be running on Mac, and I am testing the gguf Q8 one among my Macs and CUDA devices to double-check the performance.

<!-- gh-comment-id:2837682717 --> @xuanzhec commented on GitHub (Apr 29, 2025): And the Qwen3:30b-a3b MoE model can be running on Mac, and I am testing the gguf Q8 one among my Macs and CUDA devices to double-check the performance.
Author
Owner

@yjwu-leadstec commented on GitHub (Apr 29, 2025):

And the Qwen3:30b-a3b MoE model can be running on Mac, and I am testing the gguf Q8 one among my Macs and CUDA devices to double-check the performance.

Tried this model but still unable to run.

Image

<!-- gh-comment-id:2837829989 --> @yjwu-leadstec commented on GitHub (Apr 29, 2025): > And the Qwen3:30b-a3b MoE model can be running on Mac, and I am testing the gguf Q8 one among my Macs and CUDA devices to double-check the performance. Tried this model but still unable to run. ![Image](https://github.com/user-attachments/assets/93ea491a-6a08-4e6d-8805-fbf28dc872bc)
Author
Owner

@junjundaidai commented on GitHub (Apr 29, 2025):

我无法运行 Qwen3

Image

Image

I can't run Qwen3

Image

Image

Me too

Error: unable to load model: /root/.ollama/models/blobs/sha256-3291abe70f16ee9682de7bfae08db5373ea9d6497e614aaad63340ad421d6312

<!-- gh-comment-id:2837831485 --> @junjundaidai commented on GitHub (Apr 29, 2025): > 我无法运行 Qwen3 > > ![Image](https://github.com/user-attachments/assets/df359742-f432-4b39-8a0a-9bf25a912f20) > > ![Image](https://github.com/user-attachments/assets/5a099d42-a44a-4f51-91d3-fe00e41b9649) > I can't run Qwen3 > > ![Image](https://github.com/user-attachments/assets/df359742-f432-4b39-8a0a-9bf25a912f20) > > ![Image](https://github.com/user-attachments/assets/5a099d42-a44a-4f51-91d3-fe00e41b9649) Me too Error: unable to load model: /root/.ollama/models/blobs/sha256-3291abe70f16ee9682de7bfae08db5373ea9d6497e614aaad63340ad421d6312
Author
Owner

@Meshwa428 commented on GitHub (Apr 29, 2025):

我无法运行 Qwen3

Image

Update ollama to V 0.6.6

<!-- gh-comment-id:2837835285 --> @Meshwa428 commented on GitHub (Apr 29, 2025): > > 我无法运行 Qwen3 > > > > ![Image](https://github.com/user-attachments/assets/df359742-f432-4b39-8a0a-9bf25a912f20) > > Update ollama to V 0.6.6
Author
Owner

@jeffrey-cwj commented on GitHub (Apr 29, 2025):

https://ollama.com/library/qwen3

qwen3 requires Ollama v0.6.6 or later

<!-- gh-comment-id:2837835289 --> @jeffrey-cwj commented on GitHub (Apr 29, 2025): https://ollama.com/library/qwen3 qwen3 requires Ollama v0.6.6 or later
Author
Owner

@jeffrey-cwj commented on GitHub (Apr 29, 2025):

One more thing to note is that in transformers qwen3 supports a toggle option to enable or disable thinking.

But ollama doesn't have it?!?!

text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True # True is the default value for enable_thinking
)

So far, you may try this: https://github.com/ollama/ollama/issues/10456#issuecomment-2837486127

add /nothink before your prompt to disable thinking.

It works on api or openwebui.

<!-- gh-comment-id:2837848089 --> @jeffrey-cwj commented on GitHub (Apr 29, 2025): > One more thing to note is that in transformers qwen3 supports a toggle option to enable or disable thinking. > > But ollama doesn't have it?!?! > > text = tokenizer.apply_chat_template( > messages, > tokenize=False, > add_generation_prompt=True, > enable_thinking=True # True is the default value for enable_thinking > ) So far, you may try this: https://github.com/ollama/ollama/issues/10456#issuecomment-2837486127 add `/nothink` before your prompt to disable thinking. It works on api or openwebui.
Author
Owner

@yjwu-leadstec commented on GitHub (Apr 29, 2025):

https://ollama.com/library/qwen3

qwen3 requires Ollama v0.6.6 or later

The update of Ollama is still not visible this morning. Now I saw a update button. Let try.

<!-- gh-comment-id:2837848497 --> @yjwu-leadstec commented on GitHub (Apr 29, 2025): > https://ollama.com/library/qwen3 > > qwen3 requires Ollama v0.6.6 or later The update of Ollama is still not visible this morning. Now I saw a update button. Let try.
Author
Owner

@zhou668899 commented on GitHub (Apr 29, 2025):

update ollama

<!-- gh-comment-id:2837860455 --> @zhou668899 commented on GitHub (Apr 29, 2025): update ollama
Author
Owner

@yjwu-leadstec commented on GitHub (Apr 29, 2025):

Updated to 0.6.6. Great! It's work.

Image

<!-- gh-comment-id:2837861988 --> @yjwu-leadstec commented on GitHub (Apr 29, 2025): Updated to 0.6.6. Great! It's work. ![Image](https://github.com/user-attachments/assets/73c5d9f9-74fb-4aa2-9f5d-3080d82e19b5)
Author
Owner

@waqarahmed6095 commented on GitHub (Apr 29, 2025):

updare ollama

I can't run Qwen3

Image

Image

<!-- gh-comment-id:2837900278 --> @waqarahmed6095 commented on GitHub (Apr 29, 2025): updare ollama > I can't run Qwen3 > > ![Image](https://github.com/user-attachments/assets/df359742-f432-4b39-8a0a-9bf25a912f20) > > ![Image](https://github.com/user-attachments/assets/5a099d42-a44a-4f51-91d3-fe00e41b9649)
Author
Owner

@kingproud commented on GitHub (Apr 29, 2025):

Image
<!-- gh-comment-id:2837914846 --> @kingproud commented on GitHub (Apr 29, 2025): <img width="647" alt="Image" src="https://github.com/user-attachments/assets/339d226d-69a3-40f2-9277-a5735b1b8b26" />
Author
Owner

@FawadAbbas12 commented on GitHub (Apr 29, 2025):

https://ollama.com/library/qwen3

qwen3 requires Ollama v0.6.6 or later

Thanks working now

<!-- gh-comment-id:2837919616 --> @FawadAbbas12 commented on GitHub (Apr 29, 2025): > https://ollama.com/library/qwen3 > > qwen3 requires Ollama v0.6.6 or later Thanks working now
Author
Owner

@nix18 commented on GitHub (Apr 29, 2025):

Image

Me too. This bug occured shortly after running qwen3 model. Then all model can not run with the same error msg.

<!-- gh-comment-id:2837923149 --> @nix18 commented on GitHub (Apr 29, 2025): > <img alt="Image" width="647" src="https://private-user-images.githubusercontent.com/50533784/438639738-339d226d-69a3-40f2-9277-a5735b1b8b26.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NDU5MTU0MDAsIm5iZiI6MTc0NTkxNTEwMCwicGF0aCI6Ii81MDUzMzc4NC80Mzg2Mzk3MzgtMzM5ZDIyNmQtNjlhMy00MGYyLTkyNzctYTU3MzViMWI4YjI2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTA0MjklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwNDI5VDA4MjUwMFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWFmNjMwODMzMGFjMDlmNzQzMGVhY2JjNzI2NjhkNTk2NDc0MmFhZDE1ODBiNDFlMTliOGE3OTMzN2I0ZTViZTkmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0._NpLDT50wSBT3AJqYh4CrTGHgdgspzv5msZjn-dWlwA"> Me too. This bug occured shortly after running qwen3 model. Then all model can not run with the same error msg.
Author
Owner

@joeldrapper commented on GitHub (Apr 29, 2025):

I had this issue and it was fixed for me by upgrading Ollama itself and then running the model again.

<!-- gh-comment-id:2838092039 --> @joeldrapper commented on GitHub (Apr 29, 2025): I had this issue and it was fixed for me by upgrading Ollama itself and then running the model again.
Author
Owner

@kingproud commented on GitHub (Apr 29, 2025):

I had this issue and it was fixed for me by upgrading Ollama itself and then running the model again.

How to upgrade? My ollama is already the latest version.Can you show a picture, thank you?

<!-- gh-comment-id:2838937155 --> @kingproud commented on GitHub (Apr 29, 2025): > I had this issue and it was fixed for me by upgrading Ollama itself and then running the model again. How to upgrade? My ollama is already the latest version.Can you show a picture, thank you?
Author
Owner

@kwilkins-82 commented on GitHub (Apr 29, 2025):

Isn't qwen3 supposed to me multimodel? The "describe this picture" command line example isn't working for qwen3, though it works fine for gemma3.

<!-- gh-comment-id:2839768825 --> @kwilkins-82 commented on GitHub (Apr 29, 2025): Isn't qwen3 supposed to me multimodel? The "describe this picture" command line example isn't working for qwen3, though it works fine for gemma3.
Author
Owner

@Meshwa428 commented on GitHub (Apr 29, 2025):

It's a mixture of experts model not multi-modal model.

Qwen3 doesn't support vision.

It's an MoE architecture model.

<!-- gh-comment-id:2839780982 --> @Meshwa428 commented on GitHub (Apr 29, 2025): It's a mixture of experts model not multi-modal model. Qwen3 doesn't support vision. It's an MoE architecture model.
Author
Owner

@kwilkins-82 commented on GitHub (Apr 29, 2025):

OK, thanks, some of the marketing materials / news stories stated it was multimodel, but it seems either that was a mistake - or at least they didn't release that functionality. https://huggingface.co/Qwen/Qwen3-32B/discussions/2

<!-- gh-comment-id:2839845476 --> @kwilkins-82 commented on GitHub (Apr 29, 2025): OK, thanks, some of the marketing materials / news stories stated it was multimodel, but it seems either that was a mistake - or at least they didn't release that functionality. https://huggingface.co/Qwen/Qwen3-32B/discussions/2
Author
Owner

@BradKML commented on GitHub (Apr 30, 2025):

@kwilkins-82 they likely pointed to the wrong model (maybe the next Omni will be coming?) https://qwenlm.github.io/blog/qwen2.5-omni/

<!-- gh-comment-id:2840516774 --> @BradKML commented on GitHub (Apr 30, 2025): @kwilkins-82 they likely pointed to the wrong model (maybe the next Omni will be coming?) https://qwenlm.github.io/blog/qwen2.5-omni/
Author
Owner

@xuanzhec commented on GitHub (Apr 30, 2025):

And the Qwen3:30b-a3b MoE model can be running on Mac, and I am testing the gguf Q8 one among my Macs and CUDA devices to double-check the performance.

Tried this model but still unable to run.

Image

Hi, have you just solved your issues right now.? Perhaps you can try the unsloth-based qwen3 gguf files on llama.cpp if available.

<!-- gh-comment-id:2840935156 --> @xuanzhec commented on GitHub (Apr 30, 2025): > > And the Qwen3:30b-a3b MoE model can be running on Mac, and I am testing the gguf Q8 one among my Macs and CUDA devices to double-check the performance. > > Tried this model but still unable to run. > > ![Image](https://github.com/user-attachments/assets/93ea491a-6a08-4e6d-8805-fbf28dc872bc) Hi, have you just solved your issues right now.? Perhaps you can try the unsloth-based qwen3 gguf files on llama.cpp if available.
Author
Owner

@yjwu-leadstec commented on GitHub (Apr 30, 2025):

And the Qwen3:30b-a3b MoE model can be running on Mac, and I am testing the gguf Q8 one among my Macs and CUDA devices to double-check the performance.

Tried this model but still unable to run.
Image

Hi, have you just solved your issues right now.? Perhaps you can try the unsloth-based qwen3 gguf files on llama.cpp if available.

After upgrading Ollama to version 0.6.6, everything works fine, and all models are functioning properly.

<!-- gh-comment-id:2840952847 --> @yjwu-leadstec commented on GitHub (Apr 30, 2025): > > > And the Qwen3:30b-a3b MoE model can be running on Mac, and I am testing the gguf Q8 one among my Macs and CUDA devices to double-check the performance. > > > > > > Tried this model but still unable to run. > > ![Image](https://github.com/user-attachments/assets/93ea491a-6a08-4e6d-8805-fbf28dc872bc) > > Hi, have you just solved your issues right now.? Perhaps you can try the unsloth-based qwen3 gguf files on llama.cpp if available. After upgrading Ollama to version 0.6.6, everything works fine, and all models are functioning properly.
Author
Owner

@jmorganca commented on GitHub (Apr 30, 2025):

Hey folks, sorry about that. Qwen 3 requires Ollama 0.6.6 or later. Running ollama pull should now check for this

<!-- gh-comment-id:2842972968 --> @jmorganca commented on GitHub (Apr 30, 2025): Hey folks, sorry about that. Qwen 3 requires Ollama 0.6.6 or later. Running `ollama pull` should now check for this
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#6877