issue: QwQ 32b never stops calculating #4536

Closed
opened 2025-11-11 15:56:24 -06:00 by GiteaMirror · 0 comments
Owner

Originally created by @PaulWeinsberg on GitHub (Mar 23, 2025).

Check Existing Issues

  • I have searched the existing issues and discussions.
  • I am using the latest version of Open WebUI.

Installation Method

Git Clone

Open WebUI Version

v0.5.20

Ollama Version (if applicable)

v0.6.2

Operating System

Linux Ubuntu (with docker official image)

Browser (if applicable)

No response

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have listed steps to reproduce the bug in detail.

Expected Behavior

Reply and stop.

Actual Behavior

Reply and keep running calculation.

Steps to Reproduce

Configuration :

Ollama: latest (as a linux service)
OpenWeb UI: latest docker cuda
Model : QwQ:32b
Custom parameters: num_ctx: 16000
Server : Ubuntu server 128gb RAM, NVIDIA 4500 ADA + 1070, Ryzen 5k

Actions

Start a new chat and ask whatever you want.

Logs & Screenshots

nvtop shows it never stops, keep calculating without any output in the UI.
Nothing special in console.

Additional Information

Hello,

I tried with the new digest of QwQ published on Ollama in 32B version.

When I run the model, it loads, replies and closes the conversion as expected.
Then it still running in Ollama (the calculation still running, I mean the GPU is calculating like it was replying).

When I use ollama run, asking the same question with the same context (num_ctx 16000), no issue, everything work as excepted. The calculation stops when the answer stops.

No issue with R1 or other models.
Also tried with the previous disgest, previous ollama and previous OpenWeb UI, same problem.

Originally created by @PaulWeinsberg on GitHub (Mar 23, 2025). ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Git Clone ### Open WebUI Version v0.5.20 ### Ollama Version (if applicable) v0.6.2 ### Operating System Linux Ubuntu (with docker official image) ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have listed steps to reproduce the bug in detail. ### Expected Behavior Reply and stop. ### Actual Behavior Reply and keep running calculation. ### Steps to Reproduce **Configuration :** Ollama: latest (as a linux service) OpenWeb UI: latest docker cuda Model : QwQ:32b Custom parameters: num_ctx: 16000 Server : Ubuntu server 128gb RAM, NVIDIA 4500 ADA + 1070, Ryzen 5k **Actions** Start a new chat and ask whatever you want. ### Logs & Screenshots nvtop shows it never stops, keep calculating without any output in the UI. Nothing special in console. ### Additional Information Hello, I tried with the new digest of QwQ published on Ollama in 32B version. When I run the model, it loads, replies and closes the conversion as expected. Then it still running in Ollama (the calculation still running, I mean the GPU is calculating like it was replying). When I use ollama run, asking the same question with the same context (num_ctx 16000), no issue, everything work as excepted. The calculation stops when the answer stops. No issue with R1 or other models. Also tried with the previous disgest, previous ollama and previous OpenWeb UI, same problem.
GiteaMirror added the bug label 2025-11-11 15:56:24 -06:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#4536