mirror of
https://github.com/open-webui/open-webui.git
synced 2026-03-14 11:05:45 -05:00
Pressing stop doesn't actually stop ollama from generating #474
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @nilsherzig on GitHub (Mar 14, 2024).
Bug Report
Description
Bug Summary:
Pressing the stop button next to the input field doesn't actually stop ollama from generating.
According to this issue, it should be sufficient to just stop the stream https://github.com/ollama/ollama/issues/1695
Steps to Reproduce:
prompt a random llm, press the stop button while its generating & check your gpu / cpu usage
Expected Behavior:
Ollama should stop the llm from generating to save resources and stop blocking the next answer.
Actual Behavior:
The llm keeps running
Environment
Reproduction Details
Confirmation:
Installation Method
docker
@tjbck commented on GitHub (Mar 14, 2024):
Hmm, we fixed this issue a while back #456. There might be some delays on when the chat request stops but this should already be implemented so I'll be closing this issue. If not feel free to make a PR, thanks.
@nilsherzig commented on GitHub (Mar 14, 2024):
Oh sorry, i didn't check closed issues, since it still was a problem with my up to date Version. Will try to debug this a bit more :) Maybe I'm just hallucinating haha
@tjbck commented on GitHub (Mar 14, 2024):
Keep us updated!
@nilsherzig commented on GitHub (Mar 16, 2024):
I'm sorry, looks like it was my mistake or something with my setup (reverse proxies?) caused a problem. Can confirm that everything works as expected with the current open webui and ollama docker. :) thanks for the great software
@nkeilar commented on GitHub (Mar 24, 2024):
I've found that it doesn't stop in all cases. If some of the work is offloaded to the CPU, it seems to continue executing. This may have fried my CPU as it kept running without my knowledge while I was away, and probably didn't have adequate cooling in the room (13900k). Just replaced cpu and mobo. And am anxious that its its just going to run endlessly doing long generations on CPU. Need confidence that it stops when stream is closed, or stop button pressed.
@nilsherzig commented on GitHub (Mar 24, 2024):
Oh I think I might be able to confirm your story. I didn't have GPU passthrough finished in my virtual machine when I made my first comment. My second comment was while running via GPU.
@nilsherzig commented on GitHub (Mar 24, 2024):
Yes can confirm, running without a GPU causes this problem, but that might be due to ollama or llamacpp. Sorry for your CPU / motherboard tho :/
@Arche151 commented on GitHub (May 31, 2024):
Same issue. I am using OpenWeb UI's Docker image, that's bundled with Ollama for pure CPU interference. When I click on stop, the generation only stops in the UI, the backend keeps going.
@tjbck Could you please re-open the issue, since the bug isn't fixed.
@DerBroader71 commented on GitHub (Jan 17, 2025):
I experienced this yesterday. Running smallthinker, I pressed the stop button as it had started to repeat itself. It wasn't until later that day while looking at my Grafana dashboards that I noticed the server had continued high CPU usage. When I checked, ollama was streaming output from the original request some hours before. Stopping and restarting ollama resolved the issue. Again - this is CPU only
Open WebUI Version
v0.4.8
Ollama Version
0.5.5-0-g32bd37a-dirty
@VanceVagell commented on GitHub (Mar 25, 2025):
I'm seeing the same issue, using CPU-only inference. In my case, I have a local LLM running on llama.cpp's llama-server, and I'm connecting to it via IP in Open WebUI using the OpenAI API. The server uses CPU-only inference.
When I try to stop a long response with the stop button, I can see using htop that the token generation process is still happening on the LLM server, even though it appears to stop on the Open WebUI side (the output doesn't continue streaming). But the LLM server is actually still tied up producing that long reply, hindering its ability to serve future requests.
I think this issue should be reopened.
@bjoerns1983 commented on GitHub (Nov 11, 2025):
Same Problem here