bug: Open WebUI is slow to submit and process responses in long chats #1939

Closed
opened 2025-11-11 14:56:51 -06:00 by GiteaMirror · 6 comments
Owner

Originally created by @pstengel on GitHub (Aug 29, 2024).

Bug Report

Installation Method

Docker

Environment

  • Open WebUI Version: v0.3.16

  • Ollama (if applicable): v0.3.8

  • Operating System: macOS 14.6.1 (23G93) on M1 MacBook Pro w/ 64 GB RAM, iOS 17.5.1 (21F90) on iPhone 12 Pro

  • Browser (if applicable):

    • iOS
      • Brave 1.68.1
      • Chrome 128.0.6613.98
      • Safari (whatever the latest version is on iOS)
      • Vivaldi 6.8.3388.154
    • macOS
      • Brave 1.69.153
      • Chrome 128.0.6613.114
      • Vivaldi 6.8.3381.57

Confirmation:

  • I have read and followed all the instructions provided in the README.md.
  • I am on the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided the exact steps to reproduce the bug in the "Steps to Reproduce" section below.

Expected Behavior:

  1. Engage in a lengthy chat with the chatbot.
  2. Message submission and response time are about as fast as Ollama can respond.

Actual Behavior:

  1. Engage in a lengthy chat with the chatbot.
  2. Message submission and response time begin taking minutes.

Description

Bug Summary:

The UI begins slowing down significantly as you use Open WebUI for longer chats. This is especially noticeable on a mobile device, with message submission and response times taking minutes for a chat with about 90 messages (~45 user messages, ~45 assistant messages) and minimal, if any, branching. Refreshing the UI will sometimes increase speed marginally, but submissions and responses still take far longer than when you first start, and it quickly degrades back to taking minutes.

Note that the response generation info indicates that Ollama is passing back the information in a reasonable amount of time. For example, for a response that may take minutes to load on iOS, Ollama took 10 seconds to infer and decode, indicating an issue with the UI itself.

Reproduction Details

Steps to Reproduce:

This is easiest to reproduce on a mobile device. I notice the same slowdown on my laptop, but it never reaches the point that mobile does.

  1. Open up Open WebUI and select your favorite model.
  2. Engage in a lengthy chat in a single session (i.e., don't close the tab and reopen, refresh, etc.—it's most noticeable with one continuous chat session) that produces responses from the LLM that are about 300 tokens in length.
  3. You'll notice slowness in 10-15 user prompts (20-30 total messages), grinding to a halt after 20+ user prompts.

Logs and Screenshots

Browser Console Logs:
The issue is most noticeable on mobile browsers, and I have yet to connect my phone to my computer to get a developer console to check.

Docker Container Logs:
I don't believe these will be useful for this issue—the HTTP requests and UI function without issues. This problem might be due to too many SvelteKit components displayed in longer message threads.

Screenshots/Screen Recordings (if applicable):
If the developers have trouble reproducing this issue on their hardware, I can spend some time recording a video to demonstrate. However, it should be easily reproducible.

Additional Information

N/A

Originally created by @pstengel on GitHub (Aug 29, 2024). # Bug Report ## Installation Method Docker ## Environment - **Open WebUI Version:** v0.3.16 - **Ollama (if applicable):** v0.3.8 - **Operating System:** macOS 14.6.1 (23G93) on M1 MacBook Pro w/ 64 GB RAM, iOS 17.5.1 (21F90) on iPhone 12 Pro - **Browser (if applicable):** - iOS - Brave 1.68.1 - Chrome 128.0.6613.98 - Safari (whatever the latest version is on iOS) - Vivaldi 6.8.3388.154 - macOS - Brave 1.69.153 - Chrome 128.0.6613.114 - Vivaldi 6.8.3381.57 **Confirmation:** - [x] I have read and followed all the instructions provided in the README.md. - [x] I am on the latest version of both Open WebUI and Ollama. - [ ] I have included the browser console logs. - [ ] I have included the Docker container logs. - [x] I have provided the exact steps to reproduce the bug in the "Steps to Reproduce" section below. ## Expected Behavior: 1. Engage in a lengthy chat with the chatbot. 2. Message submission and response time are about as fast as Ollama can respond. ## Actual Behavior: 1. Engage in a lengthy chat with the chatbot. 2. Message submission and response time begin taking minutes. ## Description **Bug Summary:** The UI begins slowing down significantly as you use Open WebUI for longer chats. This is especially noticeable on a mobile device, with message submission and response times taking minutes for a chat with about 90 messages (~45 user messages, ~45 assistant messages) and minimal, if any, branching. Refreshing the UI will sometimes increase speed marginally, but submissions and responses still take far longer than when you first start, and it quickly degrades back to taking minutes. Note that the response generation info indicates that Ollama is passing back the information in a reasonable amount of time. For example, for a response that may take minutes to load on iOS, Ollama took 10 seconds to infer and decode, indicating an issue with the UI itself. ## Reproduction Details **Steps to Reproduce:** *This is easiest to reproduce on a mobile device. I notice the same slowdown on my laptop, but it never reaches the point that mobile does.* 1. Open up Open WebUI and select your favorite model. 2. Engage in a lengthy chat in a single session (i.e., don't close the tab and reopen, refresh, etc.—it's most noticeable with one continuous chat session) that produces responses from the LLM that are about 300 tokens in length. 3. You'll notice slowness in 10-15 user prompts (20-30 total messages), grinding to a halt after 20+ user prompts. ## Logs and Screenshots **Browser Console Logs:** The issue is most noticeable on mobile browsers, and I have yet to connect my phone to my computer to get a developer console to check. **Docker Container Logs:** I don't believe these will be useful for this issue—the HTTP requests and UI function without issues. This problem might be due to too many SvelteKit components displayed in longer message threads. **Screenshots/Screen Recordings (if applicable):** If the developers have trouble reproducing this issue on their hardware, I can spend some time recording a video to demonstrate. However, it should be easily reproducible. ## Additional Information N/A
Author
Owner

@jojasadventure commented on GitHub (Aug 30, 2024):

Same on 2015 macbook pro, cannot use it for longer chats

@jojasadventure commented on GitHub (Aug 30, 2024): Same on 2015 macbook pro, cannot use it for longer chats
Author
Owner

@forever-lwy commented on GitHub (Aug 30, 2024):

I also have a similar issue. It works fine on the computer, but when using it on my phone, if the context is very long, the response becomes very laggy. The streaming response even turns into a non-streaming effect completely.

@forever-lwy commented on GitHub (Aug 30, 2024): I also have a similar issue. It works fine on the computer, but when using it on my phone, if the context is very long, the response becomes very laggy. The streaming response even turns into a non-streaming effect completely.
Author
Owner

@JamesBedwell commented on GitHub (Aug 30, 2024):

I have been experiencing this too, for a few weeks I think. I've always kept Open WebUI reasonably up to date, and I am running the latest version (v0.3.16) right now and still have this issue.

Looks like the same issue as #3620

Seems to only affect web app mode (i.e. added to the iPhone home screen or macOS dock via Safari)

@JamesBedwell commented on GitHub (Aug 30, 2024): I have been experiencing this too, for a few weeks I think. I've always kept Open WebUI reasonably up to date, and I am running the latest version (v0.3.16) right now and still have this issue. Looks like the same issue as #3620 Seems to only affect web app mode (i.e. added to the iPhone home screen or macOS dock via Safari)
Author
Owner

@tjbck commented on GitHub (Aug 30, 2024):

Please do not create duplicate issues: #3620

@tjbck commented on GitHub (Aug 30, 2024): Please do not create duplicate issues: #3620
Author
Owner

@kevinhq commented on GitHub (Feb 16, 2025):

in my experience,it's slow from the beginning. ryzen 5 3600 16gb of ram, debian 12. i think i gave up on openweb ui and will use ollama directly

@kevinhq commented on GitHub (Feb 16, 2025): in my experience,it's slow from the beginning. ryzen 5 3600 16gb of ram, debian 12. i think i gave up on openweb ui and will use ollama directly
Author
Owner

@firestrife23 commented on GitHub (Oct 2, 2025):

I came here looking for answers, any solutions yet?

@firestrife23 commented on GitHub (Oct 2, 2025): I came here looking for answers, any solutions yet?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#1939