Pressing stop doesn't actually stop ollama from generating #474

Closed
opened 2025-11-11 14:22:22 -06:00 by GiteaMirror · 11 comments
Owner

Originally created by @nilsherzig on GitHub (Mar 14, 2024).

Bug Report

Description

Bug Summary:

Pressing the stop button next to the input field doesn't actually stop ollama from generating.
According to this issue, it should be sufficient to just stop the stream https://github.com/ollama/ollama/issues/1695

Steps to Reproduce:

prompt a random llm, press the stop button while its generating & check your gpu / cpu usage

Expected Behavior:

Ollama should stop the llm from generating to save resources and stop blocking the next answer.

Actual Behavior:

The llm keeps running

Environment

  • open-webui runs in a docker container
  • ollama runs on a fresh ubuntu 22-04 vm

Reproduction Details

Confirmation:

  • I have read and followed all the instructions provided in the README.md.
  • I am on the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.

Installation Method

docker

Originally created by @nilsherzig on GitHub (Mar 14, 2024). # Bug Report ## Description **Bug Summary:** Pressing the stop button next to the input field doesn't actually stop ollama from generating. According to this issue, it should be sufficient to just stop the stream https://github.com/ollama/ollama/issues/1695 **Steps to Reproduce:** prompt a random llm, press the stop button while its generating & check your gpu / cpu usage **Expected Behavior:** Ollama should stop the llm from generating to save resources and stop blocking the next answer. **Actual Behavior:** The llm keeps running ## Environment - open-webui runs in a docker container - ollama runs on a fresh ubuntu 22-04 vm ## Reproduction Details **Confirmation:** - [x] I have read and followed all the instructions provided in the README.md. - [x] I am on the latest version of both Open WebUI and Ollama. - [ ] I have included the browser console logs. - [ ] I have included the Docker container logs. ## Installation Method docker
Author
Owner

@tjbck commented on GitHub (Mar 14, 2024):

Hmm, we fixed this issue a while back #456. There might be some delays on when the chat request stops but this should already be implemented so I'll be closing this issue. If not feel free to make a PR, thanks.

@tjbck commented on GitHub (Mar 14, 2024): Hmm, we fixed this issue a while back #456. There might be some delays on when the chat request stops but this should already be implemented so I'll be closing this issue. If not feel free to make a PR, thanks.
Author
Owner

@nilsherzig commented on GitHub (Mar 14, 2024):

Hmm, we fixed this issue a while back #456. There might be some delays on when the chat request stops but this should already be implemented so I'll be closing this issue. If not feel free to make a PR, thanks.

Oh sorry, i didn't check closed issues, since it still was a problem with my up to date Version. Will try to debug this a bit more :) Maybe I'm just hallucinating haha

@nilsherzig commented on GitHub (Mar 14, 2024): > Hmm, we fixed this issue a while back #456. There might be some delays on when the chat request stops but this should already be implemented so I'll be closing this issue. If not feel free to make a PR, thanks. Oh sorry, i didn't check closed issues, since it still was a problem with my up to date Version. Will try to debug this a bit more :) Maybe I'm just hallucinating haha
Author
Owner

@tjbck commented on GitHub (Mar 14, 2024):

Keep us updated!

@tjbck commented on GitHub (Mar 14, 2024): Keep us updated!
Author
Owner

@nilsherzig commented on GitHub (Mar 16, 2024):

I'm sorry, looks like it was my mistake or something with my setup (reverse proxies?) caused a problem. Can confirm that everything works as expected with the current open webui and ollama docker. :) thanks for the great software

@nilsherzig commented on GitHub (Mar 16, 2024): I'm sorry, looks like it was my mistake or something with my setup (reverse proxies?) caused a problem. Can confirm that everything works as expected with the current open webui and ollama docker. :) thanks for the great software
Author
Owner

@nkeilar commented on GitHub (Mar 24, 2024):

I've found that it doesn't stop in all cases. If some of the work is offloaded to the CPU, it seems to continue executing. This may have fried my CPU as it kept running without my knowledge while I was away, and probably didn't have adequate cooling in the room (13900k). Just replaced cpu and mobo. And am anxious that its its just going to run endlessly doing long generations on CPU. Need confidence that it stops when stream is closed, or stop button pressed.

@nkeilar commented on GitHub (Mar 24, 2024): I've found that it doesn't stop in all cases. If some of the work is offloaded to the CPU, it seems to continue executing. This may have fried my CPU as it kept running without my knowledge while I was away, and probably didn't have adequate cooling in the room (13900k). Just replaced cpu and mobo. And am anxious that its its just going to run endlessly doing long generations on CPU. Need confidence that it stops when stream is closed, or stop button pressed.
Author
Owner

@nilsherzig commented on GitHub (Mar 24, 2024):

Oh I think I might be able to confirm your story. I didn't have GPU passthrough finished in my virtual machine when I made my first comment. My second comment was while running via GPU.

@nilsherzig commented on GitHub (Mar 24, 2024): Oh I think I might be able to confirm your story. I didn't have GPU passthrough finished in my virtual machine when I made my first comment. My second comment was while running via GPU.
Author
Owner

@nilsherzig commented on GitHub (Mar 24, 2024):

Yes can confirm, running without a GPU causes this problem, but that might be due to ollama or llamacpp. Sorry for your CPU / motherboard tho :/

@nilsherzig commented on GitHub (Mar 24, 2024): Yes can confirm, running without a GPU causes this problem, but that might be due to ollama or llamacpp. Sorry for your CPU / motherboard tho :/
Author
Owner

@Arche151 commented on GitHub (May 31, 2024):

Same issue. I am using OpenWeb UI's Docker image, that's bundled with Ollama for pure CPU interference. When I click on stop, the generation only stops in the UI, the backend keeps going.

@tjbck Could you please re-open the issue, since the bug isn't fixed.

@Arche151 commented on GitHub (May 31, 2024): Same issue. I am using OpenWeb UI's Docker image, that's bundled with Ollama for pure CPU interference. When I click on stop, the generation only stops in the UI, the backend keeps going. @tjbck Could you please re-open the issue, since the bug isn't fixed.
Author
Owner

@DerBroader71 commented on GitHub (Jan 17, 2025):

I experienced this yesterday. Running smallthinker, I pressed the stop button as it had started to repeat itself. It wasn't until later that day while looking at my Grafana dashboards that I noticed the server had continued high CPU usage. When I checked, ollama was streaming output from the original request some hours before. Stopping and restarting ollama resolved the issue. Again - this is CPU only
Open WebUI Version
v0.4.8

Ollama Version
0.5.5-0-g32bd37a-dirty

@DerBroader71 commented on GitHub (Jan 17, 2025): I experienced this yesterday. Running smallthinker, I pressed the stop button as it had started to repeat itself. It wasn't until later that day while looking at my Grafana dashboards that I noticed the server had continued high CPU usage. When I checked, ollama was streaming output from the original request some hours before. Stopping and restarting ollama resolved the issue. Again - this is CPU only Open WebUI Version v0.4.8 Ollama Version 0.5.5-0-g32bd37a-dirty
Author
Owner

@VanceVagell commented on GitHub (Mar 25, 2025):

I'm seeing the same issue, using CPU-only inference. In my case, I have a local LLM running on llama.cpp's llama-server, and I'm connecting to it via IP in Open WebUI using the OpenAI API. The server uses CPU-only inference.

When I try to stop a long response with the stop button, I can see using htop that the token generation process is still happening on the LLM server, even though it appears to stop on the Open WebUI side (the output doesn't continue streaming). But the LLM server is actually still tied up producing that long reply, hindering its ability to serve future requests.

I think this issue should be reopened.

@VanceVagell commented on GitHub (Mar 25, 2025): I'm seeing the same issue, using CPU-only inference. In my case, I have a local LLM running on llama.cpp's llama-server, and I'm connecting to it via IP in Open WebUI using the OpenAI API. The server uses CPU-only inference. When I try to stop a long response with the stop button, I can see using htop that the token generation process is still happening on the LLM server, even though it appears to stop on the Open WebUI side (the output doesn't continue streaming). But the LLM server is actually still tied up producing that long reply, hindering its ability to serve future requests. I think this issue should be reopened.
Author
Owner

@bjoerns1983 commented on GitHub (Nov 11, 2025):

Same issue. I am using OpenWeb UI's Docker image, that's bundled with Ollama for pure CPU interference. When I click on stop, the generation only stops in the UI, the backend keeps going.

@tjbck Could you please re-open the issue, since the bug isn't fixed.

Same Problem here

@bjoerns1983 commented on GitHub (Nov 11, 2025): > Same issue. I am using OpenWeb UI's Docker image, that's bundled with Ollama for pure CPU interference. When I click on stop, the generation only stops in the UI, the backend keeps going. > > [@tjbck](https://github.com/tjbck) Could you please re-open the issue, since the bug isn't fixed. Same Problem here
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#474