[GH-ISSUE #13103] feat: Large base64 images embedded in session data cause performance issue #55477

Closed
opened 2026-05-05 17:35:21 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @ShadyLeaf on GitHub (Apr 21, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/13103

Check Existing Issues

  • I have searched the existing issues and discussions.

Problem Description

Similar to #11934 but in chat sessions.

Currently, when using image inputs in a chat, Open WebUI stores the image as a Base64-encoded string. When a session is reopened, the entire base64 string is transmitted as plain text in session json and loaded into the frontend, which causes some issues:

  • Slow session load times: Sessions containing large images (several MBs) take significantly longer to open, since the UI waits for the entire image data to load before rendering the conversation.

  • Base64 encoding overhead: Embedding base64 data inflates the payload by ~33%, impacting both memory usage and network performance.

  • No browser caching: Because the image is not served via a standard URL, the browser cannot cache it. This means every time a session is reopened, the same image must be reloaded in full, even if it has not changed.

This becomes especially noticeable in slower connections.

Steps to Reproduce

  1. Create a new chat session.
  2. Upload a large image (several megabytes in size) and send it to model.
  3. Switch to another chat session.
  4. Switch back to the chat session with large image.
  5. Notice the extended response time due to the embedded base64 images.

Expected Behavior
• The system should quickly retrieve chat session text information without incurring large overhead from image data.
• Images should be stored or served in a more performant manner.

Desired Solution you'd like

A better handling of image data in conversations. Some possible improvements could be:

  • Store and serve image files separately (e.g., from a cache folder or backend endpoint), and only store a URL or reference path in the session JSON.

  • Defer loading of images until needed (lazy loading), to prioritize rendering the actual conversation content.

  • Optionally compress or downscale large images during upload or display, depending on configuration.

Alternatives Considered

No response

Additional Context

Example: A single PNG image encoded into base64 bloated the session JSON by ~9.8MB, delaying session render time by several seconds.
Image

Originally created by @ShadyLeaf on GitHub (Apr 21, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/13103 ### Check Existing Issues - [x] I have searched the existing issues and discussions. ### Problem Description Similar to #11934 but in chat sessions. Currently, when using image inputs in a chat, Open WebUI stores the image as a Base64-encoded string. When a session is reopened, the entire base64 string is transmitted as plain text in session json and loaded into the frontend, which causes some issues: - Slow session load times: Sessions containing large images (several MBs) take significantly longer to open, since the UI waits for the entire image data to load before rendering the conversation. - Base64 encoding overhead: Embedding base64 data inflates the payload by ~33%, impacting both memory usage and network performance. - No browser caching: Because the image is not served via a standard URL, the browser cannot cache it. This means every time a session is reopened, the same image must be reloaded in full, even if it has not changed. This becomes especially noticeable in slower connections. ⸻ Steps to Reproduce 1. Create a new chat session. 2. Upload a large image (several megabytes in size) and send it to model. 3. Switch to another chat session. 4. Switch back to the chat session with large image. 5. Notice the extended response time due to the embedded base64 images. ⸻ Expected Behavior • The system should quickly retrieve chat session text information without incurring large overhead from image data. • Images should be stored or served in a more performant manner. ### Desired Solution you'd like A better handling of image data in conversations. Some possible improvements could be: - Store and serve image files separately (e.g., from a cache folder or backend endpoint), and only store a URL or reference path in the session JSON. - Defer loading of images until needed (lazy loading), to prioritize rendering the actual conversation content. - Optionally compress or downscale large images during upload or display, depending on configuration. ### Alternatives Considered _No response_ ### Additional Context Example: A single PNG image encoded into base64 bloated the session JSON by ~9.8MB, delaying session render time by several seconds. <img width="392" alt="Image" src="https://github.com/user-attachments/assets/81054e81-cae7-470d-ba5d-b3ac65bbe30b" />
Author
Owner

@spammenotinoz commented on GitHub (Apr 21, 2025):

Until you can hard-code it, encourage the users to use the client side compression.
It's late here, but after your post I did a quick network trace and appears each image is being transmitted multiple times between the Open-WebUI client and server. Haven't worked out why, but it's quite strange, and this repeats for each conversation turn.

Actually something similar is happening with large texts in the conversation, I guess it's just how the architecture is and would require refactoring to change the approach.
Can't say it impacts myself, due to decent network connection, but yes that's a lot of client side processing and a lot of data being "tromboning".
Pro's \ Cons to each approach, keeping the conversation history on the server, increases the server resource requirements, creates scalability challenges. Conversely relying on the client to store the entire conversation in memory\browser, is going to start hitting limits\chewing up network bandwidth.

<!-- gh-comment-id:2818292727 --> @spammenotinoz commented on GitHub (Apr 21, 2025): Until you can hard-code it, encourage the users to use the client side compression. It's late here, but after your post I did a quick network trace and appears each image is being transmitted multiple times between the Open-WebUI client and server. Haven't worked out why, but it's quite strange, and this repeats for each conversation turn. Actually something similar is happening with large texts in the conversation, I guess it's just how the architecture is and would require refactoring to change the approach. Can't say it impacts myself, due to decent network connection, but yes that's a lot of client side processing and a lot of data being "tromboning". Pro's \ Cons to each approach, keeping the conversation history on the server, increases the server resource requirements, creates scalability challenges. Conversely relying on the client to store the entire conversation in memory\browser, is going to start hitting limits\chewing up network bandwidth.
Author
Owner

@spammenotinoz commented on GitHub (Apr 21, 2025):

PS: Using a function can be a good way to prune\remove old images from Conversations, to minimise impact,

<!-- gh-comment-id:2818297108 --> @spammenotinoz commented on GitHub (Apr 21, 2025): PS: Using a function can be a good way to prune\remove old images from Conversations, to minimise impact,
Author
Owner

@ShadyLeaf commented on GitHub (Apr 21, 2025):

Until you can hard-code it, encourage the users to use the client side compression. It's late here, but after your post I did a quick network trace and appears each image is being transmitted multiple times between the Open-WebUI client and server. Haven't worked out why, but it's quite strange, and this repeats for each conversation turn.

Actually something similar is happening with large texts in the conversation, I guess it's just how the architecture is and would require refactoring to change the approach. Can't say it impacts myself, due to decent network connection, but yes that's a lot of client side processing and a lot of data being "tromboning". Pro's \ Cons to each approach, keeping the conversation history on the server, increases the server resource requirements, creates scalability challenges. Conversely relying on the client to store the entire conversation in memory\browser, is going to start hitting limits\chewing up network bandwidth.

Thanks for your insights! I agree that there are trade-offs with how conversations and resources are managed between the client and server.

That said, the core issue I’m pointing out isn’t about whether the client or server should hold the state, but rather how image data is being loaded and delivered.

Let me give a concrete example:
Imagine a conversation with 20 turns — the first 10 are text-only, totaling around 30KB. The next 10 each include an image, say ~10MB each.

In the current setup, when I reopen this session, the browser has to download and parse a single massive JSON payload — around 100MB — before rendering any part of the conversation. The majority of that data is just base64-encoded image blobs.

In contrast, if image delivery were decoupled from the main chat history (e.g., served separately and loaded on demand), the initial payload would be only ~30KB. The browser could instantly render the text portion, and load images progressively as the user scrolls — just like most modern web apps do.

This separation would greatly improve perceived performance, reduce memory pressure, and make sessions with image-heavy turns much more manageable.

<!-- gh-comment-id:2819220551 --> @ShadyLeaf commented on GitHub (Apr 21, 2025): > Until you can hard-code it, encourage the users to use the client side compression. It's late here, but after your post I did a quick network trace and appears each image is being transmitted multiple times between the Open-WebUI client and server. Haven't worked out why, but it's quite strange, and this repeats for each conversation turn. > > Actually something similar is happening with large texts in the conversation, I guess it's just how the architecture is and would require refactoring to change the approach. Can't say it impacts myself, due to decent network connection, but yes that's a lot of client side processing and a lot of data being "tromboning". Pro's \ Cons to each approach, keeping the conversation history on the server, increases the server resource requirements, creates scalability challenges. Conversely relying on the client to store the entire conversation in memory\browser, is going to start hitting limits\chewing up network bandwidth. Thanks for your insights! I agree that there are trade-offs with how conversations and resources are managed between the client and server. That said, the core issue I’m pointing out isn’t about whether the client or server should hold the state, but rather how image data is being loaded and delivered. Let me give a concrete example: Imagine a conversation with 20 turns — the first 10 are text-only, totaling around 30KB. The next 10 each include an image, say ~10MB each. In the current setup, when I reopen this session, the browser has to download and parse a single massive JSON payload — around 100MB — **before rendering any part of the conversation**. The majority of that data is just base64-encoded image blobs. In contrast, if image delivery were decoupled from the main chat history (e.g., served separately and loaded on demand), the initial payload would be only ~30KB. The browser could instantly render the text portion, and load images progressively as the user scrolls — just like most modern web apps do. This separation would greatly improve perceived performance, reduce memory pressure, and make sessions with image-heavy turns much more manageable.
Author
Owner

@spammenotinoz commented on GitHub (Apr 22, 2025):

Thankyou for the clear explanation

<!-- gh-comment-id:2819745887 --> @spammenotinoz commented on GitHub (Apr 22, 2025): Thankyou for the clear explanation
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#55477