[PR #17775] [CLOSED] feat: add comprehensive data sanitization to prevent null bytes #63400

Closed
opened 2026-05-06 08:08:32 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/17775
Author: @Classic298
Created: 9/26/2025
Status: Closed

Base: devHead: fix-postgres-null-byte-error


📝 Commits (7)

  • e2bb682 feat: add comprehensive data sanitization to prevent null bytes
  • 62dc633 Merge branch 'dev' into fix-postgres-null-byte-error
  • f4c118f Merge branch 'open-webui:main' into fix-postgres-null-byte-error
  • dbb4a0e Merge branch 'dev' into fix-postgres-null-byte-error
  • 09cb75d Update data_sanitizer.py
  • e543300 Update data_sanitizer.py
  • 626bca5 Merge branch 'open-webui:main' into fix-postgres-null-byte-error

📊 Changes

7 files changed (+92 additions, -43 deletions)

View changed files

📝 backend/open_webui/models/chats.py (+6 -5)
📝 backend/open_webui/models/files.py (+7 -6)
📝 backend/open_webui/models/knowledge.py (+8 -7)
📝 backend/open_webui/models/messages.py (+12 -11)
📝 backend/open_webui/models/prompts.py (+5 -4)
📝 backend/open_webui/models/users.py (+11 -10)
backend/open_webui/utils/data_sanitizer.py (+43 -0)

📄 Description

This commit introduces a robust, centralized sanitization mechanism to prevent null bytes (\u0000) from being stored in the database. These characters were causing UntranslatableCharacter errors during search operations in PostgreSQL deployments.

The solution consists of a new centralized utility, data_sanitizer.py, which provides a sanitize_data function that recursively traverses data structures (dictionaries, lists, and strings) and removes null bytes.

To apply this sanitization broadly and automatically, this change introduces custom SQLAlchemy TypeDecorator classes, SanitizedText and SanitizedJSON. These types wrap the standard Text and JSON types and apply the sanitization logic before the data is sent to the database.

These new sanitized types have been applied to all relevant columns in the database models, including Chat, Prompt, File, Message, User, and Knowledge, ensuring that all user-generated content and external data are cleaned proactively. This approach provides a comprehensive, long-term fix for the issue by ensuring data integrity at the ORM layer.

Contributor License Agreement

By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/17775 **Author:** [@Classic298](https://github.com/Classic298) **Created:** 9/26/2025 **Status:** ❌ Closed **Base:** `dev` ← **Head:** `fix-postgres-null-byte-error` --- ### 📝 Commits (7) - [`e2bb682`](https://github.com/open-webui/open-webui/commit/e2bb682b4e55afd121e4f991d36f555a033837a3) feat: add comprehensive data sanitization to prevent null bytes - [`62dc633`](https://github.com/open-webui/open-webui/commit/62dc633266ab89578020129111bf291243a3f367) Merge branch 'dev' into fix-postgres-null-byte-error - [`f4c118f`](https://github.com/open-webui/open-webui/commit/f4c118f078413aa82b694a915476f7cac7c51ba9) Merge branch 'open-webui:main' into fix-postgres-null-byte-error - [`dbb4a0e`](https://github.com/open-webui/open-webui/commit/dbb4a0eb4fd3057b0810b2d284c3403af63c661f) Merge branch 'dev' into fix-postgres-null-byte-error - [`09cb75d`](https://github.com/open-webui/open-webui/commit/09cb75dfd2c08a596ea5d07e5aa2b72b8c8f69ad) Update data_sanitizer.py - [`e543300`](https://github.com/open-webui/open-webui/commit/e5433000284b609b0e1385e47b174aeed3070bd0) Update data_sanitizer.py - [`626bca5`](https://github.com/open-webui/open-webui/commit/626bca541ee7003990ae68604acd164abd02c925) Merge branch 'open-webui:main' into fix-postgres-null-byte-error ### 📊 Changes **7 files changed** (+92 additions, -43 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/models/chats.py` (+6 -5) 📝 `backend/open_webui/models/files.py` (+7 -6) 📝 `backend/open_webui/models/knowledge.py` (+8 -7) 📝 `backend/open_webui/models/messages.py` (+12 -11) 📝 `backend/open_webui/models/prompts.py` (+5 -4) 📝 `backend/open_webui/models/users.py` (+11 -10) ➕ `backend/open_webui/utils/data_sanitizer.py` (+43 -0) </details> ### 📄 Description This commit introduces a robust, centralized sanitization mechanism to prevent null bytes (`\u0000`) from being stored in the database. These characters were causing `UntranslatableCharacter` errors during search operations in PostgreSQL deployments. The solution consists of a new centralized utility, `data_sanitizer.py`, which provides a `sanitize_data` function that recursively traverses data structures (dictionaries, lists, and strings) and removes null bytes. To apply this sanitization broadly and automatically, this change introduces custom SQLAlchemy `TypeDecorator` classes, `SanitizedText` and `SanitizedJSON`. These types wrap the standard `Text` and `JSON` types and apply the sanitization logic before the data is sent to the database. These new sanitized types have been applied to all relevant columns in the database models, including `Chat`, `Prompt`, `File`, `Message`, `User`, and `Knowledge`, ensuring that all user-generated content and external data are cleaned proactively. This approach provides a comprehensive, long-term fix for the issue by ensuring data integrity at the ORM layer. ### Contributor License Agreement By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-06 08:08:33 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#63400