mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 11:28:35 -05:00
[GH-ISSUE #13103] feat: Large base64 images embedded in session data cause performance issue #55477
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @ShadyLeaf on GitHub (Apr 21, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/13103
Check Existing Issues
Problem Description
Similar to #11934 but in chat sessions.
Currently, when using image inputs in a chat, Open WebUI stores the image as a Base64-encoded string. When a session is reopened, the entire base64 string is transmitted as plain text in session json and loaded into the frontend, which causes some issues:
Slow session load times: Sessions containing large images (several MBs) take significantly longer to open, since the UI waits for the entire image data to load before rendering the conversation.
Base64 encoding overhead: Embedding base64 data inflates the payload by ~33%, impacting both memory usage and network performance.
No browser caching: Because the image is not served via a standard URL, the browser cannot cache it. This means every time a session is reopened, the same image must be reloaded in full, even if it has not changed.
This becomes especially noticeable in slower connections.
⸻
Steps to Reproduce
⸻
Expected Behavior
• The system should quickly retrieve chat session text information without incurring large overhead from image data.
• Images should be stored or served in a more performant manner.
Desired Solution you'd like
A better handling of image data in conversations. Some possible improvements could be:
Store and serve image files separately (e.g., from a cache folder or backend endpoint), and only store a URL or reference path in the session JSON.
Defer loading of images until needed (lazy loading), to prioritize rendering the actual conversation content.
Optionally compress or downscale large images during upload or display, depending on configuration.
Alternatives Considered
No response
Additional Context
Example: A single PNG image encoded into base64 bloated the session JSON by ~9.8MB, delaying session render time by several seconds.

@spammenotinoz commented on GitHub (Apr 21, 2025):
Until you can hard-code it, encourage the users to use the client side compression.
It's late here, but after your post I did a quick network trace and appears each image is being transmitted multiple times between the Open-WebUI client and server. Haven't worked out why, but it's quite strange, and this repeats for each conversation turn.
Actually something similar is happening with large texts in the conversation, I guess it's just how the architecture is and would require refactoring to change the approach.
Can't say it impacts myself, due to decent network connection, but yes that's a lot of client side processing and a lot of data being "tromboning".
Pro's \ Cons to each approach, keeping the conversation history on the server, increases the server resource requirements, creates scalability challenges. Conversely relying on the client to store the entire conversation in memory\browser, is going to start hitting limits\chewing up network bandwidth.
@spammenotinoz commented on GitHub (Apr 21, 2025):
PS: Using a function can be a good way to prune\remove old images from Conversations, to minimise impact,
@ShadyLeaf commented on GitHub (Apr 21, 2025):
Thanks for your insights! I agree that there are trade-offs with how conversations and resources are managed between the client and server.
That said, the core issue I’m pointing out isn’t about whether the client or server should hold the state, but rather how image data is being loaded and delivered.
Let me give a concrete example:
Imagine a conversation with 20 turns — the first 10 are text-only, totaling around 30KB. The next 10 each include an image, say ~10MB each.
In the current setup, when I reopen this session, the browser has to download and parse a single massive JSON payload — around 100MB — before rendering any part of the conversation. The majority of that data is just base64-encoded image blobs.
In contrast, if image delivery were decoupled from the main chat history (e.g., served separately and loaded on demand), the initial payload would be only ~30KB. The browser could instantly render the text portion, and load images progressively as the user scrolls — just like most modern web apps do.
This separation would greatly improve perceived performance, reduce memory pressure, and make sessions with image-heavy turns much more manageable.
@spammenotinoz commented on GitHub (Apr 22, 2025):
Thankyou for the clear explanation