[GH-ISSUE #23192] issue: Runtime OOM: chat.chat JSON blob fully loaded into memory on every chat open — no column projection in read path #35444

Closed
opened 2026-04-25 09:39:24 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @soonlaii-upskill on GitHub (Mar 29, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/23192

Check Existing Issues

  • I have searched for any existing and/or related issues.
  • I have searched for any existing and/or related discussions.
  • I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.8.10

Ollama Version (if applicable)

No response

Operating System

Docker (Linux container)

Browser (if applicable)

No response

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

Opening a chat or running a search should use memory proportional to the response payload, not the full stored blob size. A single large chat row should not be able to OOM-kill the server for all users.

Actual Behavior

Our production server is OOM-killed (Signal 9). We have a chat column row of 230,153,522 characters (~219 MB) — a single long-running conversation. Every read path in models/chats.py loads this full blob into Python memory with no column projection.

Memory lifecycle per request on that row:

  • SQLAlchemy deserializes blob → Python dict (~3–4× size in RAM = ~500–900 MB)
  • _sanitize_chat_row() inside get_chat_by_id recursively copies the entire dict again (second full copy, designed for write-time null-byte cleaning)
  • Pydantic model_validate creates a third copy
  • Router model_dump() + JSON serialization = fourth copy

Peak RSS: ~1–2 GB per request on one 219 MB row. Multiple concurrent users = OOM-kill.

Beyond single-chat opens, these functions are also affected:

  • get_chats().limit(limit).offset(skip) is commented out. The admin endpoint GET /api/v1/chats/all calls this, loading ALL rows with full blobs (5,149 rows × avg 2 MB = ~10 GB).
  • get_chats_by_user_id_and_search_text() — SQL WHERE filtering is correct (uses json_each/json_array_elements in-DB), but matching rows are fetched as full ORM objects. A search that matches the 219 MB row loads it fully just to return a title.
  • get_archived_chats_by_user_id() — no limit, no projection, returns list[ChatModel] with full blobs.
  • get_chats_by_folder_ids_and_user_id() — no limit, no projection.
  • get_chat_list_by_user_id_and_tag_name() — no limit, no projection.
  • get_chat_list_by_user_id() — returns list[ChatModel] (full blobs) even for sidebar rendering which only needs id/title/timestamps.

Note: these 4 functions already do it correctly using .with_entities()get_chat_title_id_list_by_user_id, get_pinned_chats_by_user_id, get_archived_chat_list_by_user_id, get_shared_chat_list_by_user_id. The pattern exists, it just needs applying consistently.

Steps to Reproduce

  1. Have a long-running conversation with hundreds of messages, large content, files, and tool output stored in sources — realistic after months of daily use
  2. Open that chat in the UI → RSS spikes proportional to chat column size
  3. Type a keyword that appears in that chat's messages into the search bar → same spike, even though only the title is displayed in results
  4. As admin, visit the all-chats page → all blobs loaded simultaneously

Logs & Screenshots

1. Pod describe — OOMKill confirmation

Last State:     Terminated
  Reason:       OOMKilled
  Exit Code:    137
  Restart Count: 3

2. Last log lines before process death

No Python exception, no traceback — process is killed at OS level with no warning:

INFO:     "GET /api/v1/chats/ HTTP/1.1" 200 OK
INFO:     "GET /api/v1/chats/ HTTP/1.1" 200 OK
[process terminated — no further output]

3. Database blob distribution

SELECT MAX(LENGTH(chat)), AVG(LENGTH(chat)), COUNT(*) FROM chat;
-- max=230153522  avg=2142073  count=5149

4. Subprocess OOMKill proof — loading largest blob kills the exec itself

kubectl exec -n <namespace> <pod> -c open-webui -- python3 -c "
import sqlite3, json
conn = sqlite3.connect('/app/backend/data/webui.db')
row = conn.execute('SELECT chat FROM chat ORDER BY LENGTH(chat) DESC LIMIT 1').fetchone()
json.loads(row[0])
"
# command terminated with exit code 137

The kubectl exec subprocess itself is OOMKilled before json.loads() returns. This is not a Python exception — it is the kernel reclaiming memory.

5. Cgroup memory readings (/sys/fs/cgroup/memory.current)

Baseline (idle, fresh restart):  2,063,306,752 bytes  (~1.92 GiB / 4.00 GiB limit)
After normal user activity:      4,257,050,624 bytes  (~3.97 GiB / 4.00 GiB limit)

Pod came within ~32 MB of the 4 GiB limit during normal use. Prior limit was 2 GiB — explaining the kills.

6. Top 5 blobs by size

size=219.5MB  user=3a0bf510-b8cb-4c1b-a6cc-6a4597d6e8d2  id=17dcf07a-02aa-4376-972a-2302b4475e53
size=114.6MB  user=cead7146-0ede-4529-9d0a-24b6e877fc28  id=5cc94a8a-bb6d-4479-a3b9-7e3857873541
size=110.5MB  user=3a0bf510-b8cb-4c1b-a6cc-6a4597d6e8d2  id=378c757b-4112-4dc5-bdc9-d47daa0d3727
size=110.4MB  user=3a0bf510-b8cb-4c1b-a6cc-6a4597d6e8d2  id=51bedebf-7002-4dc8-ad3e-cf5437fd0e99
size=109.2MB  user=ded0487e-dd7a-4449-9d14-cd3d724d9ac2  id=cae83b9b-8c97-4e41-a89a-b349c7f8ba41

One user owns 3 of the top 5 blobs (~440 MB combined). Every GET /api/v1/chats/ call by this user loads all of them.

Additional Information

This is distinct from the migration OOM fixed in PR #21542. That fix correctly addresses the 8452d01d26d7 Alembic migration (we already have it). This issue is the runtime read path — it exists in every version after chat_message was introduced because the read path was never switched away from the blob.

The long-term fix is switching get_chat_by_id to read from chat_message WHERE chat_id = ? instead of loading the blob — the dual-write is already fully in place. The immediate fix is applying .with_entities() projection to list endpoints (same pattern already used in 4 existing functions) and removing _sanitize_chat_row from the read path (null-byte sanitization belongs at write time only).

Originally created by @soonlaii-upskill on GitHub (Mar 29, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/23192 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.8.10 ### Ollama Version (if applicable) _No response_ ### Operating System Docker (Linux container) ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior Opening a chat or running a search should use memory proportional to the response payload, not the full stored blob size. A single large chat row should not be able to OOM-kill the server for all users. ### Actual Behavior Our production server is OOM-killed (Signal 9). We have a `chat` column row of **230,153,522 characters (~219 MB)** — a single long-running conversation. Every read path in `models/chats.py` loads this full blob into Python memory with no column projection. Memory lifecycle per request on that row: - SQLAlchemy deserializes blob → Python dict (~3–4× size in RAM = ~500–900 MB) - `_sanitize_chat_row()` inside `get_chat_by_id` recursively copies the entire dict again (second full copy, designed for write-time null-byte cleaning) - Pydantic `model_validate` creates a third copy - Router `model_dump()` + JSON serialization = fourth copy **Peak RSS: ~1–2 GB per request on one 219 MB row.** Multiple concurrent users = OOM-kill. Beyond single-chat opens, these functions are also affected: - **`get_chats()`** — `.limit(limit).offset(skip)` is commented out. The admin endpoint `GET /api/v1/chats/all` calls this, loading ALL rows with full blobs (5,149 rows × avg 2 MB = ~10 GB). - **`get_chats_by_user_id_and_search_text()`** — SQL `WHERE` filtering is correct (uses `json_each`/`json_array_elements` in-DB), but matching rows are fetched as full ORM objects. A search that matches the 219 MB row loads it fully just to return a title. - **`get_archived_chats_by_user_id()`** — no limit, no projection, returns `list[ChatModel]` with full blobs. - **`get_chats_by_folder_ids_and_user_id()`** — no limit, no projection. - **`get_chat_list_by_user_id_and_tag_name()`** — no limit, no projection. - **`get_chat_list_by_user_id()`** — returns `list[ChatModel]` (full blobs) even for sidebar rendering which only needs id/title/timestamps. Note: these 4 functions already do it correctly using `.with_entities()` — `get_chat_title_id_list_by_user_id`, `get_pinned_chats_by_user_id`, `get_archived_chat_list_by_user_id`, `get_shared_chat_list_by_user_id`. The pattern exists, it just needs applying consistently. ### Steps to Reproduce 1. Have a long-running conversation with hundreds of messages, large content, files, and tool output stored in `sources` — realistic after months of daily use 2. Open that chat in the UI → RSS spikes proportional to `chat` column size 3. Type a keyword that appears in that chat's messages into the search bar → same spike, even though only the title is displayed in results 4. As admin, visit the all-chats page → all blobs loaded simultaneously ### Logs & Screenshots ### 1. Pod describe — OOMKill confirmation ``` Last State: Terminated Reason: OOMKilled Exit Code: 137 Restart Count: 3 ``` ### 2. Last log lines before process death No Python exception, no traceback — process is killed at OS level with no warning: ``` INFO: "GET /api/v1/chats/ HTTP/1.1" 200 OK INFO: "GET /api/v1/chats/ HTTP/1.1" 200 OK [process terminated — no further output] ``` ### 3. Database blob distribution ```sql SELECT MAX(LENGTH(chat)), AVG(LENGTH(chat)), COUNT(*) FROM chat; -- max=230153522 avg=2142073 count=5149 ``` ### 4. Subprocess OOMKill proof — loading largest blob kills the exec itself ```bash kubectl exec -n <namespace> <pod> -c open-webui -- python3 -c " import sqlite3, json conn = sqlite3.connect('/app/backend/data/webui.db') row = conn.execute('SELECT chat FROM chat ORDER BY LENGTH(chat) DESC LIMIT 1').fetchone() json.loads(row[0]) " # command terminated with exit code 137 ``` The `kubectl exec` subprocess itself is OOMKilled before `json.loads()` returns. This is not a Python exception — it is the kernel reclaiming memory. ### 5. Cgroup memory readings (`/sys/fs/cgroup/memory.current`) ``` Baseline (idle, fresh restart): 2,063,306,752 bytes (~1.92 GiB / 4.00 GiB limit) After normal user activity: 4,257,050,624 bytes (~3.97 GiB / 4.00 GiB limit) ``` Pod came within ~32 MB of the 4 GiB limit during normal use. Prior limit was 2 GiB — explaining the kills. ### 6. Top 5 blobs by size ``` size=219.5MB user=3a0bf510-b8cb-4c1b-a6cc-6a4597d6e8d2 id=17dcf07a-02aa-4376-972a-2302b4475e53 size=114.6MB user=cead7146-0ede-4529-9d0a-24b6e877fc28 id=5cc94a8a-bb6d-4479-a3b9-7e3857873541 size=110.5MB user=3a0bf510-b8cb-4c1b-a6cc-6a4597d6e8d2 id=378c757b-4112-4dc5-bdc9-d47daa0d3727 size=110.4MB user=3a0bf510-b8cb-4c1b-a6cc-6a4597d6e8d2 id=51bedebf-7002-4dc8-ad3e-cf5437fd0e99 size=109.2MB user=ded0487e-dd7a-4449-9d14-cd3d724d9ac2 id=cae83b9b-8c97-4e41-a89a-b349c7f8ba41 ``` One user owns 3 of the top 5 blobs (~440 MB combined). Every `GET /api/v1/chats/` call by this user loads all of them. ### Additional Information This is **distinct from the migration OOM fixed in PR #21542**. That fix correctly addresses the `8452d01d26d7` Alembic migration (we already have it). This issue is the **runtime read path** — it exists in every version after `chat_message` was introduced because the read path was never switched away from the blob. The long-term fix is switching `get_chat_by_id` to read from `chat_message WHERE chat_id = ?` instead of loading the blob — the dual-write is already fully in place. The immediate fix is applying `.with_entities()` projection to list endpoints (same pattern already used in 4 existing functions) and removing `_sanitize_chat_row` from the read path (null-byte sanitization belongs at write time only).
GiteaMirror added the bug label 2026-04-25 09:39:24 -05:00
Author
Owner

@ShirasawaSama commented on GitHub (Mar 29, 2026):

#22206

<!-- gh-comment-id:4150794850 --> @ShirasawaSama commented on GitHub (Mar 29, 2026): #22206
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#35444