issue: Potential Root Cause for Data Loss (Multi-Device) and Performance Issues (Multimodal) #6875

New Issue

GiteaMirror · 2025-11-11T17:08:23-06:00

GiteaMirror commented

2025-11-11 17:08:23 -06:00

Originally created by @2erTwo6 on GitHub (Nov 9, 2025).

Check Existing Issues

I have searched for any existing and/or related issues.
I have searched for any existing and/or related discussions.
I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.6.36

Ollama Version (if applicable)

N/A

Operating System

Debian 12

Browser (if applicable)

Chrome

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

1.Multi-Device Sync: When a chat is updated on one device, the changes should be reflected on other devices viewing the same chat. At the very least, a device with an older, stale state should not be able to overwrite the entire up-to-date chat history on the server.
2.Multimodal Performance: After an image is uploaded in the first turn of a conversation, subsequent messages in the same chat should only send the new text prompt and context, not re-upload the entire, large image data with every single message.

Actual Behavior

I have observed two distinct but possibly related critical issues:

Catastrophic Data Loss: A complete chat history is overwritten and lost when operating on the same chat from two different devices. The device with the older (stale) state overwrites the entire chat history on the server as soon as it sends a new message.
Severe Performance Degradation: In a multimodal chat, the image data (as a large Base64 string) appears to be re-sent from the client to the server with every single message in the conversation, not just the initial upload. This causes extremely long loading times and high bandwidth usage, especially over a real network (not localhost). It seems to happen both when sending the user's prompt and again when the frontend saves the AI's response.

Steps to Reproduce

Scenario A: Data Loss with Multiple Devices

Open a specific chat on Device A (e.g., a desktop browser).
Open the exact same chat on Device B (e.g., a mobile browser).
On Device A, proceed to have a multi-turn conversation with the AI (e.g., 5-10 exchanges).
Observe that the UI on Device B does not update and still shows the old chat state from step 2.
Now, on Device B, send a simple new message (e.g., "Hello"). The AI will respond normally on Device B.
Finally, go back to Device A and refresh the browser page.
Actual Result: The entire, long conversation from Device A is gone. It has been completely replaced by the single "Hello" exchange from Device B.

Scenario B: Performance Issue with Image Uploads

Open your browser's developer tools (F12) and switch to the "Network" tab.
Start a new chat.
Upload a reasonably large image (e.g., >5MB) and ask a question about it (e.g., "What is this?").
In the Network tab, observe the POST request sent to the backend. Note its large size (e.g., ~8MB).
After the AI responds, ask a simple, text-only follow-up question (e.g., "Tell me more.").
Actual Result: Observe the Network tab again. A new POST request is sent, and its payload size is again very large (~8MB), indicating the image data was sent a second time. Often, after the AI's response is streamed back, another large request is sent, presumably to save the conversation history.

Logs & Screenshots

As you can see in Screenshot 2, which shows the new elements added after I sent 'test', two large requests—each up to 6MB—appeared. This is the exact issue I mentioned earlier regarding Scenario B, and it's causing severe performance problems when using the multimodal model.

Additional Information

I suspect these two seemingly separate issues may stem from the same underlying architectural design.

It appears that the frontend client might be treated as the "single source of truth" for the entire chat history. With every interaction, the client seems to send the complete chat history (including large Base64 image data) back to the server, which then likely overwrites the existing record in the database.

This would explain the performance issue (Scenario B), as the large image Base64 string is part of that "complete history" and is therefore resent every time.
It would also explain the data loss (Scenario A), as the client on Device B holds an outdated "complete history." Its submission of this stale history to the server overwrites the more recent and longer history from Device A in a classic "last write wins" conflict.
I understand that Open WebUI may have been initially designed as a local "Ollama manager," where these issues might be less apparent (especially performance issues over localhost). However, as the project evolves into a more general-purpose WebUI for self-hosted website scenarios like mine, this client-centric state management could be a significant architectural challenge.

This is just my observation as a user trying to deploy Open WebUI in a distributed environment. I hope this detailed analysis is helpful for future architectural considerations. Thank you for your amazing work on this project.

P.S. As English is not my first language, I have utilized an AI assistant to help draft this issue. I apologize if the wording appears to be overly verbose or unnatural at times. My primary goal was to convey the technical details as clearly as possible, and I hope the core points are understandable.

Originally created by @2erTwo6 on GitHub (Nov 9, 2025). ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.6.36 ### Ollama Version (if applicable) N/A ### Operating System Debian 12 ### Browser (if applicable) Chrome ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior 1.Multi-Device Sync: When a chat is updated on one device, the changes should be reflected on other devices viewing the same chat. At the very least, a device with an older, stale state should not be able to overwrite the entire up-to-date chat history on the server. 2.Multimodal Performance: After an image is uploaded in the first turn of a conversation, subsequent messages in the same chat should only send the new text prompt and context, not re-upload the entire, large image data with every single message. ### Actual Behavior I have observed two distinct but possibly related critical issues: 1. **Catastrophic Data Loss:** A complete chat history is overwritten and lost when operating on the same chat from two different devices. The device with the older (stale) state overwrites the entire chat history on the server as soon as it sends a new message. 2. **Severe Performance Degradation:** In a multimodal chat, the image data (as a large Base64 string) appears to be re-sent from the client to the server with *every single message* in the conversation, not just the initial upload. This causes extremely long loading times and high bandwidth usage, especially over a real network (not localhost). It seems to happen both when sending the user's prompt and again when the frontend saves the AI's response. ### Steps to Reproduce #### **Scenario A: Data Loss with Multiple Devices** 1. Open a specific chat on **Device A** (e.g., a desktop browser). 2. Open the *exact same chat* on **Device B** (e.g., a mobile browser). 3. On **Device A**, proceed to have a multi-turn conversation with the AI (e.g., 5-10 exchanges). 4. Observe that the UI on **Device B** does not update and still shows the old chat state from step 2. 5. Now, on **Device B**, send a simple new message (e.g., "Hello"). The AI will respond normally on Device B. 6. Finally, go back to **Device A** and refresh the browser page. 7. **Actual Result:** The entire, long conversation from Device A is gone. It has been completely replaced by the single "Hello" exchange from Device B. #### **Scenario B: Performance Issue with Image Uploads** 1. Open your browser's developer tools (F12) and switch to the "Network" tab. 2. Start a new chat. 3. Upload a reasonably large image (e.g., >5MB) and ask a question about it (e.g., "What is this?"). 4. In the Network tab, observe the `POST` request sent to the backend. Note its large size (e.g., ~8MB). 5. After the AI responds, ask a simple, text-only follow-up question (e.g., "Tell me more."). 6. **Actual Result:** Observe the Network tab again. A *new* `POST` request is sent, and its payload size is again very large (~8MB), indicating the image data was sent a second time. Often, after the AI's response is streamed back, *another* large request is sent, presumably to save the conversation history. ### Logs & Screenshots <img width="1603" height="785" alt="Image" src="https://github.com/user-attachments/assets/f6c55b91-db43-4e1f-8713-5485f3d40960" /> <img width="1600" height="788" alt="Image" src="https://github.com/user-attachments/assets/64fba5c8-1003-42ba-b0d4-4c1df6b2367f" /> As you can see in Screenshot 2, which shows the new elements added after I sent 'test', two large requests—each up to 6MB—appeared. This is the exact issue I mentioned earlier regarding Scenario B, and it's causing severe performance problems when using the multimodal model. ### Additional Information I suspect these two seemingly separate issues may stem from the same underlying architectural design. It appears that the frontend client might be treated as the "single source of truth" for the entire chat history. With every interaction, the client seems to send the complete chat history (including large Base64 image data) back to the server, which then likely overwrites the existing record in the database. This would explain the performance issue (Scenario B), as the large image Base64 string is part of that "complete history" and is therefore resent every time. It would also explain the data loss (Scenario A), as the client on Device B holds an outdated "complete history." Its submission of this stale history to the server overwrites the more recent and longer history from Device A in a classic "last write wins" conflict. I understand that Open WebUI may have been initially designed as a local "Ollama manager," where these issues might be less apparent (especially performance issues over localhost). However, as the project evolves into a more general-purpose WebUI for self-hosted website scenarios like mine, this client-centric state management could be a significant architectural challenge. This is just my observation as a user trying to deploy Open WebUI in a distributed environment. I hope this detailed analysis is helpful for future architectural considerations. Thank you for your amazing work on this project. P.S. As English is not my first language, I have utilized an AI assistant to help draft this issue. I apologize if the wording appears to be overly verbose or unnatural at times. My primary goal was to convey the technical details as clearly as possible, and I hope the core points are understandable.

GiteaMirror added the bug label 2025-11-11 17:08:23 -06:00

GiteaMirror commented

2025-11-11 17:08:25 -06:00

@silentoplayz commented on GitHub (Nov 9, 2025):

I am able to reproduce both issues described with the provided reproduction steps (great job on those by the way!) on the latest dev.

@silentoplayz commented on GitHub (Nov 9, 2025): I am able to reproduce both issues described with the provided reproduction steps (great job on those by the way!) on the latest `dev`.

GiteaMirror commented

2025-11-11 17:08:25 -06:00

@2erTwo6 commented on GitHub (Nov 9, 2025):

Following up on my initial report, I've refined some details to make the issue even clearer, especially regarding the real-world impact of Scenario B.
Refined Analysis of Scenario B's Impact:
To be more specific, the performance degradation I mentioned has two critical consequences for anyone deploying Open WebUI in a real-world, distributed environment (i.e., not on localhost):

Poor User Experience on Low-Bandwidth Networks: For end-users, the repeated multi-megabyte uploads make the chat feel extremely slow and unresponsive.
High Costs on Metered VPS/Cloud Platforms: This behavior generates a massive amount of server traffic. On any cloud provider where bandwidth is metered (like AWS, GCP, etc.), this can lead to unexpectedly high and unsustainable hosting costs.
Corrected & More Precise Steps to Reproduce Scenario B:
I've also realized there is a more precise way to reproduce and isolate the performance issue. The key is to focus only on the second-round interaction:
Start a new chat.
Upload a large image (e.g., >5MB) and ask an initial question (e.g., "What is this?"). Let this first interaction complete normally.
Now, open your browser's developer tools (F12) and switch to the "Network" tab.
Ask a simple, text-only follow-up question (e.g., "Tell me more.").
Observe: A new, very large POST request (e.g., ~8MB) is sent to the backend.
This revised method clearly demonstrates that the image data is being re-sent unnecessarily on the second turn, which is the core of the performance/cost issue.
I hope these clarifications are even more helpful.

@2erTwo6 commented on GitHub (Nov 9, 2025): Following up on my initial report, I've refined some details to make the issue even clearer, especially regarding the real-world impact of Scenario B. Refined Analysis of Scenario B's Impact: To be more specific, the performance degradation I mentioned has two critical consequences for anyone deploying Open WebUI in a real-world, distributed environment (i.e., not on localhost): Poor User Experience on Low-Bandwidth Networks: For end-users, the repeated multi-megabyte uploads make the chat feel extremely slow and unresponsive. High Costs on Metered VPS/Cloud Platforms: This behavior generates a massive amount of server traffic. On any cloud provider where bandwidth is metered (like AWS, GCP, etc.), this can lead to unexpectedly high and unsustainable hosting costs. Corrected & More Precise Steps to Reproduce Scenario B: I've also realized there is a more precise way to reproduce and isolate the performance issue. The key is to focus only on the second-round interaction: Start a new chat. Upload a large image (e.g., >5MB) and ask an initial question (e.g., "What is this?"). Let this first interaction complete normally. Now, open your browser's developer tools (F12) and switch to the "Network" tab. Ask a simple, text-only follow-up question (e.g., "Tell me more."). Observe: A new, very large POST request (e.g., ~8MB) is sent to the backend. This revised method clearly demonstrates that the image data is being re-sent unnecessarily on the second turn, which is the core of the performance/cost issue. I hope these clarifications are even more helpful.

GiteaMirror commented

2025-11-11 17:08:25 -06:00

@tjbck commented on GitHub (Nov 10, 2025):

Issue with having multiple tabs open should be something that can be addressed, the other payload "issue" you described is how completion endpoints are supposed to work.

@tjbck commented on GitHub (Nov 10, 2025): Issue with having multiple tabs open should be something that can be addressed, the other payload "issue" you described is how completion endpoints are supposed to work.

GiteaMirror commented

2025-11-11 17:08:25 -06:00

@2erTwo6 commented on GitHub (Nov 10, 2025):

Issue with having multiple tabs open should be something that can be addressed, the other payload "issue" you described is how completion endpoints are supposed to work.

So, is the idea of "using Open WebUI to build an AI website like any of the commercial ones out there" just not a use case you're considering?

@2erTwo6 commented on GitHub (Nov 10, 2025): > Issue with having multiple tabs open should be something that can be addressed, the other payload "issue" you described is how completion endpoints are supposed to work. So, is the idea of "using Open WebUI to build an AI website like any of the commercial ones out there" just not a use case you're considering?

GiteaMirror referenced this issue

2026-04-19 20:51:38 -05:00

[GH-ISSUE #6875] Can I launch stable-diffusion-v1.5 in xinference, and add the url in open-webui? #14518

GiteaMirror referenced this issue

2026-04-25 04:22:23 -05:00

[GH-ISSUE #6875] Can I launch stable-diffusion-v1.5 in xinference, and add the url in open-webui? #30046

GiteaMirror referenced this issue

2026-05-05 14:25:05 -05:00

[GH-ISSUE #6875] Can I launch stable-diffusion-v1.5 in xinference, and add the url in open-webui? #53184

Sign in to join this conversation.