enh: Efficient Image Handling for Multi-Modal Chats #1088

New Issue

GiteaMirror · 2025-11-11T14:36:59-06:00

GiteaMirror commented

2025-11-11 14:36:59 -06:00

Originally created by @wanderingmeow on GitHub (Jun 1, 2024).

Description

The current implementation of open-webui stores full-resolution images in base64 within the webui.db SQLite database. While this approach may be convenient, it can lead to:

Database Bloating: Storing images as base64 strings can significantly increase the size of the database, particularly when using multi-modal models frequently.
No Garbage Collection: Large image entries remain in the database after chats are deleted, leaving unused space. Currently, users can only do garbage collection manually with this shell command:
```
sqlite3 open-webui/backend/data/webui.db 'VACUUM;'
```

Bandwidth and Storage Limitations: Only uploading full-resolution images is supported, which can consume a lot of bandwidth and storage, especially for remote connections and uploading photos taken on phones (commonly 12MP or 48MP). Moreover, most vision models only support low-resolution images:

Model	Supported Resolutions (width, height) in px	Source
moondream2	(378, 378), (378, 756), (756, 378), (756, 756)	vision_encoder.py
llava-1.6	(336, 672), (672, 336), (672, 672), (1008, 336), (336, 1008)	config.json

Possible Solutions

Client-Side Image Resizing: Implement image resizing on client side using JavaScript to scale the resolution of images down to 2048x2048, or user-defined maximum size, before uploading them.

Alternatively, it could convert images to the best resolution supported by the underlying vision transformer, but this information is not currently exposed from ollama.

Image Storage as Files or Binary Blobs:

Store images as separate files inside a designated folder (e.g., backend/data/uploads/images). Image entries in the database would then be URI links or hashes to these images, enabling easier cleanup and organization.
Alternatively, store images as binary blobs within the database.

Database Cleanup Mechanisms:

Manual Cleanup: Add a button in "Settings" -> "Chats" to manually call SQLite VACUUM command and remove unlinked images. Preferably, display the current size of the database and potential reduction in size from cleanup.
Automatic Cleanup: Alternatively, implement a mechanism to detect large chunks of deletion commits (e.g., when users and their chat histories are removed). Periodically reclaim unused space in the database and perform cleanup tasks. Note that VACUUM should not be called frequently, as it just reconstructs the database.

Originally created by @wanderingmeow on GitHub (Jun 1, 2024). ## Description The current implementation of open-webui stores full-resolution images in base64 within the `webui.db` SQLite database. While this approach may be convenient, it can lead to: 1. **Database Bloating**: Storing images as base64 strings can significantly increase the size of the database, particularly when using multi-modal models frequently. 2. **No Garbage Collection**: Large image entries remain in the database after chats are deleted, leaving unused space. Currently, users can only do garbage collection manually with this shell command: ```bash sqlite3 open-webui/backend/data/webui.db 'VACUUM;' ``` 3. **Bandwidth and Storage Limitations**: Only uploading full-resolution images is supported, which can consume a lot of bandwidth and storage, especially for remote connections and uploading photos taken on phones (commonly 12MP or 48MP). Moreover, most vision models only support low-resolution images: |Model|Supported Resolutions (width, height) in px|Source| |:-|:-|-:| |moondream2|(378, 378), (378, 756), (756, 378), (756, 756)|[vision_encoder.py](https://github.com/vikhyat/moondream/blob/3f7e3cb930a019ec861e938152c4dfb5a4e86f33/moondream/vision_encoder.py#L216)| |llava-1.6|(336, 672), (672, 336), (672, 672), (1008, 336), (336, 1008)|[config.json](https://huggingface.co/liuhaotian/llava-v1.6-vicuna-7b/blob/deae57a8c0ccb0da4c2661cc1891cc9d06503d11/config.json#L16)| ## Possible Solutions 1. **Client-Side Image Resizing**: Implement image resizing on client side using JavaScript to scale the resolution of images down to 2048x2048, or user-defined maximum size, before uploading them. - Alternatively, it could convert images to the best resolution supported by the underlying vision transformer, but this information is not currently exposed from ollama. 2. **Image Storage as Files or Binary Blobs**: - Store images as separate files inside a designated folder (e.g., `backend/data/uploads/images`). Image entries in the database would then be URI links or hashes to these images, enabling easier cleanup and organization. - Alternatively, store images as binary blobs within the database. 3. **Database Cleanup Mechanisms**: - **Manual Cleanup**: Add a button in "Settings" -> "Chats" to manually call SQLite [VACUUM](https://www.sqlite.org/lang_vacuum.html) command and remove unlinked images. Preferably, display the current size of the database and potential reduction in size from cleanup. - **Automatic Cleanup**: Alternatively, implement a mechanism to detect large chunks of deletion commits (e.g., when users and their chat histories are removed). Periodically reclaim unused space in the database and perform cleanup tasks. Note that `VACUUM` should not be called frequently, as it just reconstructs the database.

GiteaMirror closed this issue

2025-11-11 14:36:59 -06:00

GiteaMirror commented

2025-11-11 14:37:00 -06:00

@spammenotinoz commented on GitHub (Jun 3, 2024):

Client Size Resizing would fix a lot of challenges with Anthropic and GPT-4-Turbo, especially when using mobiles. The resolution on phones these days is just too good.

@spammenotinoz commented on GitHub (Jun 3, 2024): 1. Client Size Resizing would fix a lot of challenges with Anthropic and GPT-4-Turbo, especially when using mobiles. The resolution on phones these days is just too good.

GiteaMirror commented

2025-11-11 14:37:01 -06:00

@Fusseldieb commented on GitHub (Sep 1, 2024):

Yes, this has been an ongoing issue for me as well.
I encounter myself quite frequently in situations where I take a photo using my phone, and ask GPT-4o something. Since each photo is like 5MB, it takes an eternity to upload, and also wastes A LOT of tokens since the image received by it will be absolutely GIGANTIC.

@Fusseldieb commented on GitHub (Sep 1, 2024): Yes, this has been an ongoing issue for me as well. I encounter myself quite frequently in situations where I take a photo using my phone, and ask GPT-4o something. Since each photo is like 5MB, it takes an eternity to upload, and also wastes A LOT of tokens since the image received by it will be absolutely GIGANTIC.

GiteaMirror commented

2025-11-11 14:37:01 -06:00

@ice6 commented on GitHub (Oct 29, 2024):

vote for Client Size Resizing.

@ice6 commented on GitHub (Oct 29, 2024): vote for `Client Size Resizing`.

GiteaMirror commented

2025-11-11 14:37:01 -06:00

@tjbck commented on GitHub (Dec 2, 2024):

Closing in favour of #6848

@tjbck commented on GitHub (Dec 2, 2024): Closing in favour of #6848