[GH-ISSUE #452] Switching between conversations kills response streaming #27592

New Issue

GiteaMirror · 2026-04-25T02:18:09-05:00

GiteaMirror commented

2026-04-25 02:18:09 -05:00

Originally created by @robertvazan on GitHub (Jan 11, 2024).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/452

Bug Report

Description

Bug Summary:
Response streaming works only if you stay in one conversation.

Steps to Reproduce:

Start a new chat. Model starts streaming the response.
Switch to an older chat while the model is still responding.
Switch back to the new chat.

Expected Behavior:
Response streaming should continue in the background. Switching back should show current state of the response.

Actual Behavior:
The response is gone along with all controls.

Reproduction Details

Confirmation:

I have read and followed all the instructions provided in the README.md.
I have reviewed the troubleshooting.md document.
I have included the browser console logs.
I have included the Docker container logs.

I will provide additional information if you cannot reproduce the bug with the information provided so far.

Originally created by @robertvazan on GitHub (Jan 11, 2024). Original GitHub issue: https://github.com/open-webui/open-webui/issues/452 # Bug Report ## Description **Bug Summary:** Response streaming works only if you stay in one conversation. **Steps to Reproduce:** 1. Start a new chat. Model starts streaming the response. 2. Switch to an older chat while the model is still responding. 3. Switch back to the new chat. **Expected Behavior:** Response streaming should continue in the background. Switching back should show current state of the response. **Actual Behavior:** The response is gone along with all controls. ## Reproduction Details **Confirmation:** - [x] I have read and followed all the instructions provided in the README.md. - [x] I have reviewed the troubleshooting.md document. - [ ] I have included the browser console logs. - [ ] I have included the Docker container logs. I will provide additional information if you cannot reproduce the bug with the information provided so far.

GiteaMirror closed this issue

2026-04-25 02:18:10 -05:00

GiteaMirror commented

2026-04-25 02:18:10 -05:00

@tjbck commented on GitHub (Jan 11, 2024):

Hi, Thanks for creating this issue! Unfortunately, because of the nature of the apis provided by Ollama (LLM providers in general), there isn't a quick fix for this. FYI, ChatGPT also experiences this issue as well. I'll close this issue as not planned for now, but I'll see if what I can do anything to accomodate your use case, thanks!

@tjbck commented on GitHub (Jan 11, 2024): Hi, Thanks for creating this issue! Unfortunately, because of the nature of the apis provided by Ollama (LLM providers in general), there isn't a quick fix for this. FYI, ChatGPT also experiences this issue as well. I'll close this issue as not planned for now, but I'll see if what I can do anything to accomodate your use case, thanks!

GiteaMirror commented

2026-04-25 02:18:11 -05:00

@robertvazan commented on GitHub (Jan 11, 2024):

ChatGPT is much faster though. It's surprising that it's hard to resume streaming when the backend keeps consuming the stream anyway even after the client disconnects. I guess the best workaround here is to open separate browser tab.

@robertvazan commented on GitHub (Jan 11, 2024): ChatGPT is much faster though. It's surprising that it's hard to resume streaming when the backend keeps consuming the stream anyway even after the client disconnects. I guess the best workaround here is to open separate browser tab.

GiteaMirror commented

2026-04-25 02:18:11 -05:00

@robertvazan commented on GitHub (Jan 11, 2024):

BTW, ChatGPT does not discard the response but rather cancels it, keeping what was generated so far. It also makes it easier to regenerate the response. Ollama WebUI instead keeps consuming the response in the background with no way to see what's going on and no way to stop the process.

@robertvazan commented on GitHub (Jan 11, 2024): BTW, ChatGPT does not discard the response but rather cancels it, keeping what was generated so far. It also makes it easier to regenerate the response. Ollama WebUI instead keeps consuming the response in the background with no way to see what's going on and no way to stop the process.

GiteaMirror commented

2026-04-25 02:18:12 -05:00

@justinh-rahb commented on GitHub (Jan 11, 2024):

@robertvazan it's important to note that ChatGPT runs on an infrastructure of 10,000+ GPUs, one of which is dedicated to generating your response, even if you think you've canceled the request from your end. In contrast, the Ollama WebUI operates using a single instance of Ollama, and does not support high concurrency or initiating a new generation while another is still being processed.

This limitation arises from the upstream project that provides the API utilized by this webui. As @tjbck suggested, there isn't a quick solution to this issue, as it originates from the core functionality of that project.

@justinh-rahb commented on GitHub (Jan 11, 2024): @robertvazan it's important to note that ChatGPT runs on an infrastructure of 10,000+ GPUs, one of which is dedicated to generating your response, even if you think you've canceled the request from your end. In contrast, the Ollama WebUI operates using a single instance of Ollama, and does not support high concurrency or initiating a new generation while another is still being processed. This limitation arises from the upstream project that provides the API utilized by this webui. As @tjbck suggested, there isn't a quick solution to this issue, as it originates from the core functionality of that project.

GiteaMirror commented

2026-04-25 02:18:12 -05:00

@robertvazan commented on GitHub (Jan 11, 2024):

@justinh-rahb Isn't Ollama CLI using the same APIs? If I press Ctrl+C in the CLI, the response is cancelled immediately and CPU load drops to zero in a second or two. Ditto if I invoke the API directly via curl and press Ctrl+C to cause curl to close the connection. So as far as I can tell, closing the connection is an easy way to cancel the whole process on ollama side.

@robertvazan commented on GitHub (Jan 11, 2024): @justinh-rahb Isn't Ollama CLI using the same APIs? If I press Ctrl+C in the CLI, the response is cancelled immediately and CPU load drops to zero in a second or two. Ditto if I invoke the API directly via curl and press Ctrl+C to cause curl to close the connection. So as far as I can tell, closing the connection is an easy way to cancel the whole process on ollama side.

GiteaMirror commented

2026-04-25 02:18:13 -05:00

@tjbck commented on GitHub (Jan 12, 2024):

@robertvazan I'll see what can be done, It's just that I'm incredibly busy this week so I don't have the capacity to take a look at the moment. If you manage to find a solution in the meantime, feel free to make a PR!

@tjbck commented on GitHub (Jan 12, 2024): @robertvazan I'll see what can be done, It's just that I'm incredibly busy this week so I don't have the capacity to take a look at the moment. If you manage to find a solution in the meantime, feel free to make a PR!

GiteaMirror commented

2026-04-25 02:18:14 -05:00

@robertvazan commented on GitHub (Jan 12, 2024):

@tjbck Just fixing #456 and stopping generation automatically when user switches chats (like ChatGPT does it) would be sufficient as a fix. Keeping streaming in the background and showing an up-to-date response when user navigates back would be a new feature, a convenient and useful one.

@robertvazan commented on GitHub (Jan 12, 2024): @tjbck Just fixing #456 and stopping generation automatically when user switches chats (like ChatGPT does it) would be sufficient as a fix. Keeping streaming in the background and showing an up-to-date response when user navigates back would be a new feature, a convenient and useful one.

GiteaMirror commented

2026-04-25 02:18:14 -05:00

@tjbck commented on GitHub (Jan 12, 2024):

@robertvazan If you could test that it works as expected with only using APIs provided by Ollama, instead of comparing it to ollama CLI, that would be tremendously helpful! Also, feel free to make a PR with a possible fix :)

@tjbck commented on GitHub (Jan 12, 2024): @robertvazan If you could test that it works as expected with only using APIs provided by Ollama, instead of comparing it to ollama CLI, that would be tremendously helpful! Also, feel free to make a PR with a possible fix :)

GiteaMirror commented

2026-04-25 02:18:15 -05:00

@robertvazan commented on GitHub (Jan 12, 2024):

@tjbck I did test it with the API directly. I executed the first curl example from ollama chat API docs with "Write a long blog post" prompt and pressed Ctrl+C while it was in the middle of the first paragraph. Curl exited, connection was closed, and ollama CPU usage dropped to zero almost immediately.

@robertvazan commented on GitHub (Jan 12, 2024): @tjbck I did test it with the API directly. I executed the first curl example from [ollama chat API docs](https://github.com/jmorganca/ollama/blob/main/docs/api.md#generate-a-chat-completion) with "Write a long blog post" prompt and pressed Ctrl+C while it was in the middle of the first paragraph. Curl exited, connection was closed, and ollama CPU usage dropped to zero almost immediately.

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/open-webui#27592