mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 19:08:59 -05:00
[PR #23039] [CLOSED] fix: remove null bytes from PDF metadata to prevent PostgreSQL JSONB errors #26988
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/open-webui/open-webui/pull/23039
Author: @yang1002378395-cmyk
Created: 3/25/2026
Status: ❌ Closed
Base:
dev← Head:fix-pdf-null-byte-22992📝 Commits (1)
697ab41fix: remove null bytes from PDF metadata to prevent PostgreSQL JSONB errors📊 Changes
1 file changed (+3 additions, -0 deletions)
View changed files
📝
backend/open_webui/retrieval/vector/utils.py(+3 -0)📄 Description
Pull Request Checklist
devbranchChangelog Entry
Description
PostgreSQL JSONB cannot handle null bytes (
\x00) in strings. Some PDF metadata contains null bytes (e.g.,"Adobe PSL 1.3e for Canon\x00") which causesDataError: unsupported Unicode escape sequencewhen inserting document chunks.Fixed
process_metadatafunction\n), carriage returns (\r), and tabs (\t)Root Cause
The
process_metadatafunction inbackend/open_webui/retrieval/vector/utils.pyconverts non-serializable types to strings but does not sanitize string values. When PDF metadata contains null bytes, PostgreSQL raises an error during INSERT.Files Changed
backend/open_webui/retrieval/vector/utils.py: Added string sanitization to remove null bytesTesting
Related Issues
Fixes #22992
Contributor License Agreement
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.