Close connection to ollama when stop button is pressed #171

Closed
opened 2025-11-11 14:09:30 -06:00 by GiteaMirror · 8 comments
Owner

Originally created by @robertvazan on GitHub (Jan 12, 2024).

Originally assigned to: @tjbck on GitHub.

Bug Report

Description

Bug Summary:
Stop button (#48) does not really work, because WebUI backend keeps streaming the response from ollama. This also causes #444 and the same problem contributes to poor UX in #452. WebUI backend should close connection to ollama to stop generation.

Steps to Reproduce:

  1. Ask a slower model to write something longer.
  2. Press stop button.

Expected Behavior:
CPU usage drops immediately. New question can be asked immediately.

Actual Behavior:
While the client indeed stops receiving tokens immediately, ollama apparently continues to generate the whole response in the background, which blocks other chat requests. The only way to stop the generation is to restart ollama process.

Reproduction Details

Confirmation:

  • I have read and followed all the instructions provided in the README.md.
  • I have reviewed the troubleshooting.md document.
  • I have included the browser console logs.
  • I have included the Docker container logs.

Additional Information

The root cause of the issue is that WebUI backend fails to close connection to ollama. My experiments with ollama API show that closing the connection cancels generation immediately with no further CPU usage. Ollama's built-in CLI client apparently does this when you press Ctrl+C.

As I understand it, there's a connection chain:

WebUI frontend <-> WebUI backend <-> ollama

Frontend-backend connection is closed/cancelled properly, judging by browser console messages. But backend-ollama connection apparently stays open.

Originally created by @robertvazan on GitHub (Jan 12, 2024). Originally assigned to: @tjbck on GitHub. # Bug Report ## Description **Bug Summary:** Stop button (#48) does not really work, because WebUI backend keeps streaming the response from ollama. This also causes #444 and the same problem contributes to poor UX in #452. WebUI backend should close connection to ollama to stop generation. **Steps to Reproduce:** 1. Ask a slower model to write something longer. 2. Press stop button. **Expected Behavior:** CPU usage drops immediately. New question can be asked immediately. **Actual Behavior:** While the client indeed stops receiving tokens immediately, ollama apparently continues to generate the whole response in the background, which blocks other chat requests. The only way to stop the generation is to restart ollama process. ## Reproduction Details **Confirmation:** - [x] I have read and followed all the instructions provided in the README.md. - [x] I have reviewed the troubleshooting.md document. - [ ] I have included the browser console logs. - [ ] I have included the Docker container logs. ## Additional Information The root cause of the issue is that WebUI backend fails to close connection to ollama. My experiments with ollama API show that closing the connection cancels generation immediately with no further CPU usage. Ollama's built-in CLI client apparently does this when you press Ctrl+C. As I understand it, there's a connection chain: WebUI frontend <-> WebUI backend <-> ollama Frontend-backend connection is closed/cancelled properly, judging by browser console messages. But backend-ollama connection apparently stays open.
GiteaMirror added the enhancementgood first issuehelp wanted labels 2025-11-11 14:09:30 -06:00
Author
Owner

@jukofyork commented on GitHub (Jan 12, 2024):

Yeah, this can be very problematic if you are using a model that sometimes goes wrong and gives out an infinite repeating response. I had to ssh into the host machine and do a kill - 9 on the ollama process to get it working again.

@jukofyork commented on GitHub (Jan 12, 2024): Yeah, this can be very problematic if you are using a model that sometimes goes wrong and gives out an infinite repeating response. I had to ssh into the host machine and do a kill - 9 on the ollama process to get it working again.
Author
Owner

@MarvinJWendt commented on GitHub (Jan 15, 2024):

Yeah, this can be very problematic if you are using a model that sometimes goes wrong and gives out an infinite repeating response. I had to ssh into the host machine and do a kill - 9 on the ollama process to get it working again.

I would also like to have a restart button in the WebUI. Sometimes when Ollama hangs, I need to do the same.

@MarvinJWendt commented on GitHub (Jan 15, 2024): > Yeah, this can be very problematic if you are using a model that sometimes goes wrong and gives out an infinite repeating response. I had to ssh into the host machine and do a kill - 9 on the ollama process to get it working again. I would also like to have a restart button in the WebUI. Sometimes when Ollama hangs, I need to do the same.
Author
Owner

@tjbck commented on GitHub (Jan 18, 2024):

Might be relevant: https://github.com/tiangolo/fastapi/discussions/8805

@tjbck commented on GitHub (Jan 18, 2024): Might be relevant: https://github.com/tiangolo/fastapi/discussions/8805
Author
Owner

@robertvazan commented on GitHub (Jan 18, 2024):

@tjbck This mostly works. Thanks! One issue though: Why does it take 15 seconds for ollama to go quiet after cancelling the response in UI? My experiments on API level show that ollama CPU usage should drop in under 3 seconds after connection is closed.

@robertvazan commented on GitHub (Jan 18, 2024): @tjbck This mostly works. Thanks! One issue though: Why does it take 15 seconds for ollama to go quiet after cancelling the response in UI? My experiments on API level show that ollama CPU usage should drop in under 3 seconds after connection is closed.
Author
Owner

@robertvazan commented on GitHub (Jan 18, 2024):

PS: It takes much longer with larger models. Do you have a buffer somewhere that needs to process a certain number of tokens before it can close the connection?

@robertvazan commented on GitHub (Jan 18, 2024): PS: It takes much longer with larger models. Do you have a buffer somewhere that needs to process a certain number of tokens before it can close the connection?
Author
Owner

@robertvazan commented on GitHub (Feb 12, 2024):

This is broken again. Ollama WebUI keeps streaming the rest of the response after I press stop button.

@robertvazan commented on GitHub (Feb 12, 2024): This is broken again. Ollama WebUI keeps streaming the rest of the response after I press stop button.
Author
Owner

@tjbck commented on GitHub (Feb 12, 2024):

@robertvazan Hmm, AFAIK there hasn't been any changes from the webui side. Could you verify that the issue is from the webui? Thanks!

@tjbck commented on GitHub (Feb 12, 2024): @robertvazan Hmm, AFAIK there hasn't been any changes from the webui side. Could you verify that the issue is from the webui? Thanks!
Author
Owner

@robertvazan commented on GitHub (Feb 12, 2024):

@tjbck After some testing, I see that it is only broken under some circumstances. The last fix worked, but it did not cover all cases. Repro steps:

  1. Start a new chat. Press Stop button. Generation stops immediately. OK.
  2. Press Regenerate button. Press Stop button. Generation stops immediately. OK.
  3. Switch to another chat and back. Press Regenerate button. Press Stop button. UI stops streaming, but generation continues in the background on ollama side. Fail.
@robertvazan commented on GitHub (Feb 12, 2024): @tjbck After some testing, I see that it is only broken under some circumstances. The last fix worked, but it did not cover all cases. Repro steps: 1. Start a new chat. Press Stop button. Generation stops immediately. OK. 2. Press Regenerate button. Press Stop button. Generation stops immediately. OK. 3. Switch to another chat and back. Press Regenerate button. Press Stop button. UI stops streaming, but generation continues in the background on ollama side. Fail.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#171