[PR #23382] [CLOSED] fix: improve PostgreSQL database sided reliability for migrations #27174

Closed
opened 2026-04-20 06:55:08 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/23382
Author: @Classic298
Created: 4/3/2026
Status: Closed

Base: devHead: migration


📝 Commits (5)

  • 7769b16 fix
  • 1aaf286 fix: address code review feedback on migration
  • 116a6b8 rebased
  • 05ef5c5 fix: reduce BATCH_SIZE to avoid PG bind param limit, commit per index
  • d93fe55 fix: guard non-dict JSON, incremental flushing, offline txn parity

📊 Changes

2 files changed (+349 additions, -90 deletions)

View changed files

📝 backend/open_webui/migrations/env.py (+6 -1)
📝 backend/open_webui/migrations/versions/8452d01d26d7_add_chat_message_table.py (+343 -89)

📄 Description

The 8452d01d26d7 ("Add chat_message table") migration fails on large PostgreSQL datasets (50+ GB) due to three interacting database-side issues that the existing Python-side batching fix (#21542) does not address:

  1. Unbounded transaction: env.py wraps all migrations in a single context.begin_transaction(). The begin_nested() savepoints in _flush_batch() nest inside this outer transaction — nothing actually commits to disk until the entire migration returns. For large datasets this generates massive WAL that can't be recycled, causing managed PostgreSQL services (AlloyDB, Cloud SQL, RDS) to kill the connection.

  2. Indexes + FK created before backfill: The migration creates 7 secondary indexes and a FK constraint before inserting any data. Every INSERT must update all 7 B-tree indexes + validate the FK — ~9 WAL records per row. PostgreSQL's own bulk loading docs explicitly warn against this pattern.

  3. Server-side cursor invalidated by COMMIT: stream_results=True creates a WITHOUT HOLD cursor on PostgreSQL. You can't naively "add commits inside the loop" — the cursor dies on the first COMMIT. A different pagination strategy is required.

Contributor License Agreement

Note

Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/23382 **Author:** [@Classic298](https://github.com/Classic298) **Created:** 4/3/2026 **Status:** ❌ Closed **Base:** `dev` ← **Head:** `migration` --- ### 📝 Commits (5) - [`7769b16`](https://github.com/open-webui/open-webui/commit/7769b1657c5141695e5339dffd1c18c03bfc5299) fix - [`1aaf286`](https://github.com/open-webui/open-webui/commit/1aaf286295d8cbe2e44a8434fc0ea8175070c296) fix: address code review feedback on migration - [`116a6b8`](https://github.com/open-webui/open-webui/commit/116a6b8eac911a12627ea41a797ebdd8e4eb166d) rebased - [`05ef5c5`](https://github.com/open-webui/open-webui/commit/05ef5c5223830c227823257097bdf86c29f0d4fa) fix: reduce BATCH_SIZE to avoid PG bind param limit, commit per index - [`d93fe55`](https://github.com/open-webui/open-webui/commit/d93fe55ad15670d16dc92c3c2670d5d437697f48) fix: guard non-dict JSON, incremental flushing, offline txn parity ### 📊 Changes **2 files changed** (+349 additions, -90 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/migrations/env.py` (+6 -1) 📝 `backend/open_webui/migrations/versions/8452d01d26d7_add_chat_message_table.py` (+343 -89) </details> ### 📄 Description The `8452d01d26d7` ("Add chat_message table") migration fails on large PostgreSQL datasets (50+ GB) due to three interacting database-side issues that the existing Python-side batching fix (#21542) does not address: 1. **Unbounded transaction**: `env.py` wraps all migrations in a single `context.begin_transaction()`. The `begin_nested()` savepoints in `_flush_batch()` nest *inside* this outer transaction — nothing actually commits to disk until the entire migration returns. For large datasets this generates massive WAL that can't be recycled, causing managed PostgreSQL services (AlloyDB, Cloud SQL, RDS) to kill the connection. 2. **Indexes + FK created before backfill**: The migration creates 7 secondary indexes and a FK constraint *before* inserting any data. Every INSERT must update all 7 B-tree indexes + validate the FK — ~9 WAL records per row. PostgreSQL's own [bulk loading docs](https://www.postgresql.org/docs/current/populate.html) explicitly warn against this pattern. 3. **Server-side cursor invalidated by COMMIT**: `stream_results=True` creates a `WITHOUT HOLD` cursor on PostgreSQL. You can't naively "add commits inside the loop" — the cursor dies on the first COMMIT. A different pagination strategy is required. ## Related issues - Closes #23273 - Related to #23134 ### Contributor License Agreement <!-- 🚨 DO NOT DELETE THE TEXT BELOW 🚨 Keep the "Contributor License Agreement" confirmation text intact. Deleting it will trigger the CLA-Bot to INVALIDATE your PR. Your PR will NOT be reviewed or merged until you check the box below confirming that you have read and agree to the terms of the CLA. --> - [X] By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. > [!NOTE] > Deleting the CLA section will lead to immediate closure of your PR and it will not be merged in. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-20 06:55:08 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#27174