mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 02:48:13 -05:00
[PR #17775] [CLOSED] feat: add comprehensive data sanitization to prevent null bytes #24544
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/open-webui/open-webui/pull/17775
Author: @Classic298
Created: 9/26/2025
Status: ❌ Closed
Base:
dev← Head:fix-postgres-null-byte-error📝 Commits (7)
e2bb682feat: add comprehensive data sanitization to prevent null bytes62dc633Merge branch 'dev' into fix-postgres-null-byte-errorf4c118fMerge branch 'open-webui:main' into fix-postgres-null-byte-errordbb4a0eMerge branch 'dev' into fix-postgres-null-byte-error09cb75dUpdate data_sanitizer.pye543300Update data_sanitizer.py626bca5Merge branch 'open-webui:main' into fix-postgres-null-byte-error📊 Changes
7 files changed (+92 additions, -43 deletions)
View changed files
📝
backend/open_webui/models/chats.py(+6 -5)📝
backend/open_webui/models/files.py(+7 -6)📝
backend/open_webui/models/knowledge.py(+8 -7)📝
backend/open_webui/models/messages.py(+12 -11)📝
backend/open_webui/models/prompts.py(+5 -4)📝
backend/open_webui/models/users.py(+11 -10)➕
backend/open_webui/utils/data_sanitizer.py(+43 -0)📄 Description
This commit introduces a robust, centralized sanitization mechanism to prevent null bytes (
\u0000) from being stored in the database. These characters were causingUntranslatableCharactererrors during search operations in PostgreSQL deployments.The solution consists of a new centralized utility,
data_sanitizer.py, which provides asanitize_datafunction that recursively traverses data structures (dictionaries, lists, and strings) and removes null bytes.To apply this sanitization broadly and automatically, this change introduces custom SQLAlchemy
TypeDecoratorclasses,SanitizedTextandSanitizedJSON. These types wrap the standardTextandJSONtypes and apply the sanitization logic before the data is sent to the database.These new sanitized types have been applied to all relevant columns in the database models, including
Chat,Prompt,File,Message,User, andKnowledge, ensuring that all user-generated content and external data are cleaned proactively. This approach provides a comprehensive, long-term fix for the issue by ensuring data integrity at the ORM layer.Contributor License Agreement
By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.