enh: Efficient Image Handling for Multi-Modal Chats #1088

Closed
opened 2025-11-11 14:36:59 -06:00 by GiteaMirror · 4 comments
Owner

Originally created by @wanderingmeow on GitHub (Jun 1, 2024).

Description

The current implementation of open-webui stores full-resolution images in base64 within the webui.db SQLite database. While this approach may be convenient, it can lead to:

  1. Database Bloating: Storing images as base64 strings can significantly increase the size of the database, particularly when using multi-modal models frequently.
  2. No Garbage Collection: Large image entries remain in the database after chats are deleted, leaving unused space. Currently, users can only do garbage collection manually with this shell command:
    sqlite3 open-webui/backend/data/webui.db 'VACUUM;'
    
  3. Bandwidth and Storage Limitations: Only uploading full-resolution images is supported, which can consume a lot of bandwidth and storage, especially for remote connections and uploading photos taken on phones (commonly 12MP or 48MP). Moreover, most vision models only support low-resolution images:
    Model Supported Resolutions (width, height) in px Source
    moondream2 (378, 378), (378, 756), (756, 378), (756, 756) vision_encoder.py
    llava-1.6 (336, 672), (672, 336), (672, 672), (1008, 336), (336, 1008) config.json

Possible Solutions

  1. Client-Side Image Resizing: Implement image resizing on client side using JavaScript to scale the resolution of images down to 2048x2048, or user-defined maximum size, before uploading them.
  • Alternatively, it could convert images to the best resolution supported by the underlying vision transformer, but this information is not currently exposed from ollama.
  1. Image Storage as Files or Binary Blobs:
  • Store images as separate files inside a designated folder (e.g., backend/data/uploads/images). Image entries in the database would then be URI links or hashes to these images, enabling easier cleanup and organization.
  • Alternatively, store images as binary blobs within the database.
  1. Database Cleanup Mechanisms:
  • Manual Cleanup: Add a button in "Settings" -> "Chats" to manually call SQLite VACUUM command and remove unlinked images. Preferably, display the current size of the database and potential reduction in size from cleanup.
  • Automatic Cleanup: Alternatively, implement a mechanism to detect large chunks of deletion commits (e.g., when users and their chat histories are removed). Periodically reclaim unused space in the database and perform cleanup tasks. Note that VACUUM should not be called frequently, as it just reconstructs the database.
Originally created by @wanderingmeow on GitHub (Jun 1, 2024). ## Description The current implementation of open-webui stores full-resolution images in base64 within the `webui.db` SQLite database. While this approach may be convenient, it can lead to: 1. **Database Bloating**: Storing images as base64 strings can significantly increase the size of the database, particularly when using multi-modal models frequently. 2. **No Garbage Collection**: Large image entries remain in the database after chats are deleted, leaving unused space. Currently, users can only do garbage collection manually with this shell command: ```bash sqlite3 open-webui/backend/data/webui.db 'VACUUM;' ``` 3. **Bandwidth and Storage Limitations**: Only uploading full-resolution images is supported, which can consume a lot of bandwidth and storage, especially for remote connections and uploading photos taken on phones (commonly 12MP or 48MP). Moreover, most vision models only support low-resolution images: |Model|Supported Resolutions (width, height) in px|Source| |:-|:-|-:| |moondream2|(378, 378), (378, 756), (756, 378), (756, 756)|[vision_encoder.py](https://github.com/vikhyat/moondream/blob/3f7e3cb930a019ec861e938152c4dfb5a4e86f33/moondream/vision_encoder.py#L216)| |llava-1.6|(336, 672), (672, 336), (672, 672), (1008, 336), (336, 1008)|[config.json](https://huggingface.co/liuhaotian/llava-v1.6-vicuna-7b/blob/deae57a8c0ccb0da4c2661cc1891cc9d06503d11/config.json#L16)| ## Possible Solutions 1. **Client-Side Image Resizing**: Implement image resizing on client side using JavaScript to scale the resolution of images down to 2048x2048, or user-defined maximum size, before uploading them. - Alternatively, it could convert images to the best resolution supported by the underlying vision transformer, but this information is not currently exposed from ollama. 2. **Image Storage as Files or Binary Blobs**: - Store images as separate files inside a designated folder (e.g., `backend/data/uploads/images`). Image entries in the database would then be URI links or hashes to these images, enabling easier cleanup and organization. - Alternatively, store images as binary blobs within the database. 3. **Database Cleanup Mechanisms**: - **Manual Cleanup**: Add a button in "Settings" -> "Chats" to manually call SQLite [VACUUM](https://www.sqlite.org/lang_vacuum.html) command and remove unlinked images. Preferably, display the current size of the database and potential reduction in size from cleanup. - **Automatic Cleanup**: Alternatively, implement a mechanism to detect large chunks of deletion commits (e.g., when users and their chat histories are removed). Periodically reclaim unused space in the database and perform cleanup tasks. Note that `VACUUM` should not be called frequently, as it just reconstructs the database.
Author
Owner

@spammenotinoz commented on GitHub (Jun 3, 2024):

  1. Client Size Resizing would fix a lot of challenges with Anthropic and GPT-4-Turbo, especially when using mobiles. The resolution on phones these days is just too good.
@spammenotinoz commented on GitHub (Jun 3, 2024): 1. Client Size Resizing would fix a lot of challenges with Anthropic and GPT-4-Turbo, especially when using mobiles. The resolution on phones these days is just too good.
Author
Owner

@Fusseldieb commented on GitHub (Sep 1, 2024):

Yes, this has been an ongoing issue for me as well.
I encounter myself quite frequently in situations where I take a photo using my phone, and ask GPT-4o something. Since each photo is like 5MB, it takes an eternity to upload, and also wastes A LOT of tokens since the image received by it will be absolutely GIGANTIC.

@Fusseldieb commented on GitHub (Sep 1, 2024): Yes, this has been an ongoing issue for me as well. I encounter myself quite frequently in situations where I take a photo using my phone, and ask GPT-4o something. Since each photo is like 5MB, it takes an eternity to upload, and also wastes A LOT of tokens since the image received by it will be absolutely GIGANTIC.
Author
Owner

@ice6 commented on GitHub (Oct 29, 2024):

vote for Client Size Resizing.

@ice6 commented on GitHub (Oct 29, 2024): vote for `Client Size Resizing`.
Author
Owner

@tjbck commented on GitHub (Dec 2, 2024):

Closing in favour of #6848

@tjbck commented on GitHub (Dec 2, 2024): Closing in favour of #6848
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#1088