[GH-ISSUE #9575] Endless generations in QwQ quants #6245

Open
opened 2026-04-12 17:40:43 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @wsbagnsv1 on GitHub (Mar 7, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9575

What is the issue?

Unsloth discovered that the ordering of the samplers somehow breaks the model, you can read up on it here:
https://docs.unsloth.ai/basics/tutorial-how-to-run-qwq-32b-without-bugs

This causes endless generations and since you cant change the order of the samplers, to my knowledge at least, QwQ is broken. I would think that a rather simple line in the Modelfile that enables to pass "--samplers "top_k;top_p;min_p;temperature;dry;typ_p;xtc" " to llama.cpp should fix the issue.

Relevant log output


OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @wsbagnsv1 on GitHub (Mar 7, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9575 ### What is the issue? Unsloth discovered that the ordering of the samplers somehow breaks the model, you can read up on it here: https://docs.unsloth.ai/basics/tutorial-how-to-run-qwq-32b-without-bugs This causes endless generations and since you cant change the order of the samplers, to my knowledge at least, QwQ is broken. I would think that a rather simple line in the Modelfile that enables to pass "--samplers "top_k;top_p;min_p;temperature;dry;typ_p;xtc" " to llama.cpp should fix the issue. ### Relevant log output ```shell ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the bug label 2026-04-12 17:40:43 -05:00
Author
Owner

@crazyi commented on GitHub (Mar 7, 2025):

I find the same issue, how can I fix it in Ollama?

<!-- gh-comment-id:2706492873 --> @crazyi commented on GitHub (Mar 7, 2025): I find the same issue, how can I fix it in Ollama?
Author
Owner

@sultanqasim commented on GitHub (Mar 7, 2025):

You can run it as suggested by Qwen developers with a repetition penalty of 1.0 which avoids the sampler ordering issue. To use a higher repetition penalty, you will need the ordering modification they described.

<!-- gh-comment-id:2706572324 --> @sultanqasim commented on GitHub (Mar 7, 2025): You can run it as suggested by Qwen developers with a repetition penalty of 1.0 which avoids the sampler ordering issue. To use a higher repetition penalty, you will need the ordering modification they described.
Author
Owner

@yorktownting commented on GitHub (Mar 8, 2025):

@crazyi @wsbagnsv1

Change the Modelfile can solve this problem. This is my version, it works well:

FROM ./QwQ-32B-Q5_K_M.gguf
PARAMETER top_k 40
PARAMETER top_p 0.95
PARAMETER min_p 0.1
PARAMETER temperature 0.6
PARAMETER num_predict 32768
PARAMETER repeat_penalty 1.1
TEMPLATE """<|im_start|>user\n{{ .Prompt }}<|im_end|>\n<|im_start|>assistant\n<think>\n"""
<!-- gh-comment-id:2707875151 --> @yorktownting commented on GitHub (Mar 8, 2025): @crazyi @wsbagnsv1 Change the Modelfile can solve this problem. This is my version, it works well: ``` FROM ./QwQ-32B-Q5_K_M.gguf PARAMETER top_k 40 PARAMETER top_p 0.95 PARAMETER min_p 0.1 PARAMETER temperature 0.6 PARAMETER num_predict 32768 PARAMETER repeat_penalty 1.1 TEMPLATE """<|im_start|>user\n{{ .Prompt }}<|im_end|>\n<|im_start|>assistant\n<think>\n""" ```
Author
Owner

@sultanqasim commented on GitHub (Mar 8, 2025):

@yorktownting While those settings would give usable results, that doesn't fix the sampler ordering issue that makes chains of thought worse than they need to be. See https://docs.unsloth.ai/basics/tutorial-how-to-run-qwq-32b-effectively.

<!-- gh-comment-id:2707910793 --> @sultanqasim commented on GitHub (Mar 8, 2025): @yorktownting While those settings would give usable results, that doesn't fix the sampler ordering issue that makes chains of thought worse than they need to be. See <https://docs.unsloth.ai/basics/tutorial-how-to-run-qwq-32b-effectively>.
Author
Owner

@yorktownting commented on GitHub (Mar 10, 2025):

@yorktownting While those settings would give usable results, that doesn't fix the sampler ordering issue that makes chains of thought worse than they need to be. See https://docs.unsloth.ai/basics/tutorial-how-to-run-qwq-32b-effectively.虽然这些设置可以提供可用的结果,但这并不能解决使连贯思维比需要的更差的采样顺序问题。请参阅 https://docs.unsloth.ai/basics/tutorial-how-to-run-qwq-32b-effectively

Thank you, I change the sampler order in my modelfile as follows, it do perform better:

FROM ./QwQ-32B-Q5_K_M.gguf
PARAMETER temperature 0.6
PARAMETER min_p 0.0
PARAMETER repeat_penalty 1.0
PARAMETER top_k 40
PARAMETER top_p 0.95
PARAMETER num_predict 32768
TEMPLATE """<|im_start|>user\n{{ .Prompt }}<|im_end|>\n<|im_start|>assistant\n<think>\n"""
<!-- gh-comment-id:2709194580 --> @yorktownting commented on GitHub (Mar 10, 2025): > [@yorktownting](https://github.com/yorktownting) While those settings would give usable results, that doesn't fix the sampler ordering issue that makes chains of thought worse than they need to be. See https://docs.unsloth.ai/basics/tutorial-how-to-run-qwq-32b-effectively.虽然这些设置可以提供可用的结果,但这并不能解决使连贯思维比需要的更差的采样顺序问题。请参阅 https://docs.unsloth.ai/basics/tutorial-how-to-run-qwq-32b-effectively。 Thank you, I change the sampler order in my modelfile as follows, it do perform better: ``` FROM ./QwQ-32B-Q5_K_M.gguf PARAMETER temperature 0.6 PARAMETER min_p 0.0 PARAMETER repeat_penalty 1.0 PARAMETER top_k 40 PARAMETER top_p 0.95 PARAMETER num_predict 32768 TEMPLATE """<|im_start|>user\n{{ .Prompt }}<|im_end|>\n<|im_start|>assistant\n<think>\n""" ```
Author
Owner

@Nukepayload2 commented on GitHub (Mar 17, 2025):

I have a question after reading https://docs.unsloth.ai/basics/tutorial-how-to-run-qwq-32b-effectively .

--samplers "top_k;top_p;min_p;temperature;dry;typ_p;xtc"

How can I set --samplers in modelfile?

I couldn't find it in https://github.com/ollama/ollama/blob/main/docs/modelfile.md#valid-parameters-and-values

<!-- gh-comment-id:2729307077 --> @Nukepayload2 commented on GitHub (Mar 17, 2025): I have a question after reading https://docs.unsloth.ai/basics/tutorial-how-to-run-qwq-32b-effectively . > --samplers "top_k;top_p;min_p;temperature;dry;typ_p;xtc" How can I set `--samplers` in `modelfile`? I couldn't find it in https://github.com/ollama/ollama/blob/main/docs/modelfile.md#valid-parameters-and-values
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#6245