mirror of
https://github.com/open-webui/open-webui.git
synced 2026-03-22 06:02:06 -05:00
issue: QwQ 32b never stops calculating #4536
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @PaulWeinsberg on GitHub (Mar 23, 2025).
Check Existing Issues
Installation Method
Git Clone
Open WebUI Version
v0.5.20
Ollama Version (if applicable)
v0.6.2
Operating System
Linux Ubuntu (with docker official image)
Browser (if applicable)
No response
Confirmation
README.md.Expected Behavior
Reply and stop.
Actual Behavior
Reply and keep running calculation.
Steps to Reproduce
Configuration :
Ollama: latest (as a linux service)
OpenWeb UI: latest docker cuda
Model : QwQ:32b
Custom parameters: num_ctx: 16000
Server : Ubuntu server 128gb RAM, NVIDIA 4500 ADA + 1070, Ryzen 5k
Actions
Start a new chat and ask whatever you want.
Logs & Screenshots
nvtop shows it never stops, keep calculating without any output in the UI.
Nothing special in console.
Additional Information
Hello,
I tried with the new digest of QwQ published on Ollama in 32B version.
When I run the model, it loads, replies and closes the conversion as expected.
Then it still running in Ollama (the calculation still running, I mean the GPU is calculating like it was replying).
When I use ollama run, asking the same question with the same context (num_ctx 16000), no issue, everything work as excepted. The calculation stops when the answer stops.
No issue with R1 or other models.
Also tried with the previous disgest, previous ollama and previous OpenWeb UI, same problem.