[GH-ISSUE #22992] issue: Null Byte results in pdf document upload failure with PostgreSQL/pgvector #19859

Closed
opened 2026-04-20 02:23:01 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @Sacul13 on GitHub (Mar 24, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/22992

Check Existing Issues

  • I have searched for any existing and/or related issues.
  • I have searched for any existing and/or related discussions.
  • I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

0.8.10

Ollama Version (if applicable)

No response

Operating System

Ubuntu 22.04

Browser (if applicable)

No response

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

Upload the PDF document and content gets extracted for fruther usage in chat or knowledge space.

Actual Behavior

Document upload fials with a error but stays visibly in the chat UI as context eventhough its not assesable for the LLM. Uploading the document in the knowledge space results in the same error message but the document isnt shown in the knowledge space UI.

Steps to Reproduce

Example A:

  1. Upload the specific PDF into chat as context
  2. Error message is shown that \u0000 cannot be converted to text
  3. PDF file stays in context of chat
  4. Sending prompt regarding the document context
  5. Getting the answer "document cant be accessed"

Example B:

  1. Upload the specific PDF in a knowledge space
  2. Error message is shown that \u0000 cannot be converted to text
  3. PDF file is not present in the knowledge space

I cant upload the document here as context since it holds sensitive information.

Logs & Screenshots

2026-03-24T14:01:31.9599458Z stdout F Traceback (most recent call last):
2026-03-24T14:01:31.9599509Z stdout F
2026-03-24T14:01:31.9599529Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2125, in _exec_insertmany_context
2026-03-24T14:01:31.9599564Z stdout F dialect.do_execute(
2026-03-24T14:01:31.9599886Z stdout F │ └ <function DefaultDialect.do_execute at 0x79030d018ae0>
2026-03-24T14:01:31.9600124Z stdout F └ <sqlalchemy.dialects.postgresql.psycopg2.PGDialect_psycopg2 object at 0x7902b3bbdf50>
2026-03-24T14:01:31.9600232Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 952, in do_execute
2026-03-24T14:01:31.9600259Z stdout F cursor.execute(statement, parameters)
2026-03-24T14:01:31.9600281Z stdout F │ │ │ └ {'id__0': '70d2bbf1-f4df-4029-b5b4-64fde9db1286', 'vmetadata__0': '{"producer": "Adobe PSL 1.3e for Canon\u0000", "creator":...
2026-03-24T14:01:31.9600306Z stdout F │ │ └ 'INSERT INTO document_chunk (id, vector, collection_name, text, vmetadata) VALUES (%(id__0)s, %(vector__0)s, %(collection_nam...
2026-03-24T14:01:31.9600359Z stdout F │ └ <method 'execute' of 'psycopg2.extensions.cursor' objects>
2026-03-24T14:01:31.9600381Z stdout F └ <cursor object at 0x7902ae6a86d0; closed: -1>
2026-03-24T14:01:31.9600404Z stdout F
2026-03-24T14:01:31.9600426Z stdout F psycopg2.errors.UntranslatableCharacter: unsupported Unicode escape sequence
2026-03-24T14:01:31.9600448Z stdout F LINE 27: XXXXXXXXXXXXXXXXXX,', '{"producer": "Adobe...
2026-03-24T14:01:31.9600468Z stdout F ^
2026-03-24T14:01:31.9600488Z stdout F DETAIL: \u0000 cannot be converted to text.
2026-03-24T14:01:31.9600860Z stdout F CONTEXT: JSON data, line 1: {"producer": "Adobe PSL 1.3e for Canon\u0000...
2026-03-24T14:01:31.9600896Z stdout F
2026-03-24T14:01:31.9600924Z stdout F
2026-03-24T14:01:31.9600944Z stdout F
2026-03-24T14:01:31.9600964Z stdout F The above exception was the direct cause of the following exception:
2026-03-24T14:01:31.9601001Z stdout F
2026-03-24T14:01:31.9601021Z stdout F
2026-03-24T14:01:31.9601040Z stdout F Traceback (most recent call last):
2026-03-24T14:01:31.9601061Z stdout F
2026-03-24T14:01:31.9601080Z stdout F File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap
2026-03-24T14:01:31.9601101Z stdout F self._bootstrap_inner()
2026-03-24T14:01:31.9601122Z stdout F │ └ <function Thread._bootstrap_inner at 0x79030fa00b80>
2026-03-24T14:01:31.9601144Z stdout F └ <WorkerThread(AnyIO worker thread, started 133049945614016)>
2026-03-24T14:01:31.9601166Z stdout F File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
2026-03-24T14:01:31.9601186Z stdout F self.run()
2026-03-24T14:01:31.9601207Z stdout F │ └ <function WorkerThread.run at 0x7902ae7d7560>
2026-03-24T14:01:31.9601228Z stdout F └ <WorkerThread(AnyIO worker thread, started 133049945614016)>
2026-03-24T14:01:31.9601249Z stdout F File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 986, in run
2026-03-24T14:01:31.9601282Z stdout F result = context.run(func, *args)
2026-03-24T14:01:31.9601303Z stdout F │ │ │ └ ()
2026-03-24T14:01:31.9601325Z stdout F │ │ └ functools.partial(<function process_uploaded_file at 0x7902b3ab3420>, <starlette.requests.Request object at 0x790246c21bd0>, ...
2026-03-24T14:01:31.9601349Z stdout F │ └ <method 'run' of '_contextvars.Context' objects>
2026-03-24T14:01:31.9601621Z stdout F └ <_contextvars.Context object at 0x79024595be80>
2026-03-24T14:01:31.9601756Z stdout F
2026-03-24T14:01:31.9601813Z stdout F File "/app/backend/open_webui/routers/files.py", line 164, in process_uploaded_file
2026-03-24T14:01:31.9601837Z stdout F _process_handler(db_session)
2026-03-24T14:01:31.9602187Z stdout F │ └ <sqlalchemy.orm.session.Session object at 0x790266242b10>
2026-03-24T14:01:31.9602221Z stdout F └ <function process_uploaded_file.._process_handler at 0x79024611c9a0>
2026-03-24T14:01:31.9602244Z stdout F
2026-03-24T14:01:31.9602264Z stdout F File "/app/backend/open_webui/routers/files.py", line 128, in _process_handler
2026-03-24T14:01:31.9602285Z stdout F process_file(
2026-03-24T14:01:31.9602306Z stdout F └ <function process_file at 0x7902b16d44a0>
2026-03-24T14:01:31.9602342Z stdout F
2026-03-24T14:01:31.9602363Z stdout F File "/app/backend/open_webui/routers/retrieval.py", line 1836, in process_file
2026-03-24T14:01:31.9602383Z stdout F result = save_docs_to_vector_db(
2026-03-24T14:01:31.9602403Z stdout F └ <function save_docs_to_vector_db at 0x7902b16d4360>
2026-03-24T14:01:31.9602424Z stdout F
2026-03-24T14:01:31.9602754Z stdout F File "/app/backend/open_webui/routers/retrieval.py", line 1638, in save_docs_to_vector_db
2026-03-24T14:01:31.9602776Z stdout F VECTOR_DB_CLIENT.insert(
2026-03-24T14:01:31.9602797Z stdout F │ └ <function PgvectorClient.insert at 0x7902b3bb3880>
2026-03-24T14:01:31.9602819Z stdout F └ <open_webui.retrieval.vector.dbs.pgvector.PgvectorClient object at 0x7902b3d837d0>
2026-03-24T14:01:31.9602841Z stdout F
2026-03-24T14:01:31.9603001Z stdout F > File "/app/backend/open_webui/retrieval/vector/dbs/pgvector.py", line 336, in insert
2026-03-24T14:01:31.9603051Z stdout F self.session.bulk_save_objects(new_items)
2026-03-24T14:01:31.9603091Z stdout F │ │ │ └ [<open_webui.retrieval.vector.dbs.pgvector.DocumentChunk object at 0x7902446a1590>, <open_webui.retrieval.vector.dbs.pgvector...
2026-03-24T14:01:31.9603117Z stdout F │ │ └ <function scoped_session.bulk_save_objects at 0x7902e4eaf7e0>
2026-03-24T14:01:31.9603137Z stdout F │ └ <sqlalchemy.orm.scoping.scoped_session object at 0x7902b3bbe9d0>
2026-03-24T14:01:31.9603155Z stdout F └ <open_webui.retrieval.vector.dbs.pgvector.PgvectorClient object at 0x7902b3d837d0>
2026-03-24T14:01:31.9603206Z stdout F
2026-03-24T14:01:31.9603224Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/scoping.py", line 1344, in bulk_save_objects
2026-03-24T14:01:31.9603243Z stdout F return self._proxied.bulk_save_objects(
2026-03-24T14:01:31.9603261Z stdout F │ └ <property object at 0x7902e4e56e80>
2026-03-24T14:01:31.9603280Z stdout F └ <sqlalchemy.orm.scoping.scoped_session object at 0x7902b3bbe9d0>
2026-03-24T14:01:31.9603301Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 4564, in bulk_save_objects
2026-03-24T14:01:31.9603327Z stdout F self._bulk_save_mappings(
2026-03-24T14:01:31.9603349Z stdout F │ └ <function Session._bulk_save_mappings at 0x7902e4fcf240>
2026-03-24T14:01:31.9603468Z stdout F └ <sqlalchemy.orm.session.Session object at 0x790245078f50>
2026-03-24T14:01:31.9603524Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 4741, in _bulk_save_mappings
2026-03-24T14:01:31.9603554Z stdout F with util.safe_reraise():
2026-03-24T14:01:31.9603575Z stdout F │ └ <class 'sqlalchemy.util.langhelpers.safe_reraise'>
2026-03-24T14:01:31.9603597Z stdout F └ <module 'sqlalchemy.util' from '/usr/local/lib/python3.11/site-packages/sqlalchemy/util/init.py'>
2026-03-24T14:01:31.9603621Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/util/langhelpers.py", line 121, in exit
2026-03-24T14:01:31.9603642Z stdout F raise exc_value.with_traceback(exc_tb)
2026-03-24T14:01:31.9603662Z stdout F │ │ └ <traceback object at 0x790244a167c0>
2026-03-24T14:01:31.9603684Z stdout F │ └ <method 'with_traceback' of 'BaseException' objects>
2026-03-24T14:01:31.9603706Z stdout F └ DataError('(psycopg2.errors.UntranslatableCharacter) unsupported Unicode escape sequence\nLINE 27: XXXXXXXXXXXXXX
2026-03-24T14:01:31.9603954Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 4730, in _bulk_save_mappings
2026-03-24T14:01:31.9603981Z stdout F bulk_persistence._bulk_insert(
2026-03-24T14:01:31.9604078Z stdout F │ └ <function _bulk_insert at 0x7902e4fb8e00>
2026-03-24T14:01:31.9604105Z stdout F └ <module 'sqlalchemy.orm.bulk_persistence' from '/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/bulk_persistence.py'>
2026-03-24T14:01:31.9604152Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/bulk_persistence.py", line 222, in _bulk_insert
2026-03-24T14:01:31.9604174Z stdout F result = persistence._emit_insert_statements(
2026-03-24T14:01:31.9604196Z stdout F │ └ <function _emit_insert_statements at 0x7902e4fb87c0>
2026-03-24T14:01:31.9604218Z stdout F └ <module 'sqlalchemy.orm.persistence' from '/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/persistence.py'>
2026-03-24T14:01:31.9604240Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/persistence.py", line 1048, in _emit_insert_statements
2026-03-24T14:01:31.9604262Z stdout F result = connection.execute(
2026-03-24T14:01:31.9604282Z stdout F │ └ <function Connection.execute at 0x79030d084720>
2026-03-24T14:01:31.9604303Z stdout F └ <sqlalchemy.engine.base.Connection object at 0x790245a83d50>
2026-03-24T14:01:31.9604324Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1419, in execute
2026-03-24T14:01:31.9604356Z stdout F return meth(
2026-03-24T14:01:31.9604463Z stdout F └ <bound method ClauseElement._execute_on_connection of <sqlalchemy.sql.dml.Insert object at 0x7902885a2b90>>
2026-03-24T14:01:31.9604505Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/sql/elements.py", line 527, in _execute_on_connection
2026-03-24T14:01:31.9604539Z stdout F return connection._execute_clauseelement(
2026-03-24T14:01:31.9604560Z stdout F │ └ <function Connection._execute_clauseelement at 0x79030d084a40>
2026-03-24T14:01:31.9604582Z stdout F └ <sqlalchemy.engine.base.Connection object at 0x790245a83d50>
2026-03-24T14:01:31.9604603Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1641, in _execute_clauseelement
2026-03-24T14:01:31.9604627Z stdout F ret = self._execute_context(
2026-03-24T14:01:31.9604650Z stdout F │ └ <function Connection._execute_context at 0x79030d084c20>
2026-03-24T14:01:31.9604671Z stdout F └ <sqlalchemy.engine.base.Connection object at 0x790245a83d50>
2026-03-24T14:01:31.9604708Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1844, in _execute_context
2026-03-24T14:01:31.9604730Z stdout F return self._exec_insertmany_context(dialect, context)
2026-03-24T14:01:31.9604755Z stdout F │ │ │ └ <sqlalchemy.dialects.postgresql.psycopg2.PGExecutionContext_psycopg2 object at 0x7902a8769510>
2026-03-24T14:01:31.9604858Z stdout F │ │ └ <sqlalchemy.dialects.postgresql.psycopg2.PGDialect_psycopg2 object at 0x7902b3bbdf50>
2026-03-24T14:01:31.9604889Z stdout F │ └ <function Connection._exec_insertmany_context at 0x79030d084d60>
2026-03-24T14:01:31.9604914Z stdout F └ <sqlalchemy.engine.base.Connection object at 0x790245a83d50>
2026-03-24T14:01:31.9604938Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2133, in _exec_insertmany_context
2026-03-24T14:01:31.9604959Z stdout F self._handle_dbapi_exception(
2026-03-24T14:01:31.9604980Z stdout F │ └ <function Connection._handle_dbapi_exception at 0x79030d084f40>
2026-03-24T14:01:31.9605002Z stdout F └ <sqlalchemy.engine.base.Connection object at 0x790245a83d50>
2026-03-24T14:01:31.9605160Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2363, in _handle_dbapi_exception
2026-03-24T14:01:31.9605187Z stdout F raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
2026-03-24T14:01:31.9605209Z stdout F │ │ │ └ UntranslatableCharacter('unsupported Unicode escape sequence\nLINE 27: 1) repeals the challenged tax ruling,', '{"producer"...
2026-03-24T14:01:31.9605297Z stdout F │ │ └ (<class 'psycopg2.errors.UntranslatableCharacter'>, UntranslatableCharacter('unsupported Unicode escape sequence\nLINE 27: 1)...
2026-03-24T14:01:31.9605343Z stdout F │ └ <method 'with_traceback' of 'BaseException' objects>
2026-03-24T14:01:31.9605369Z stdout F └ DataError('(psycopg2.errors.UntranslatableCharacter) unsupported Unicode escape sequence\nLINE 27: 1) repeals the challenged ...
2026-03-24T14:01:31.9605391Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2125, in _exec_insertmany_context
2026-03-24T14:01:31.9605412Z stdout F dialect.do_execute(
2026-03-24T14:01:31.9605432Z stdout F │ └ <function DefaultDialect.do_execute at 0x79030d018ae0>
2026-03-24T14:01:31.9605454Z stdout F └ <sqlalchemy.dialects.postgresql.psycopg2.PGDialect_psycopg2 object at 0x7902b3bbdf50>
2026-03-24T14:01:31.9605475Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 952, in do_execute
2026-03-24T14:01:31.9605496Z stdout F cursor.execute(statement, parameters)
2026-03-24T14:01:31.9605516Z stdout F │ │ │ └ {'id__0': '70d2bbf1-f4df-4029-b5b4-64fde9db1286', 'vmetadata__0': '{"producer": "Adobe PSL 1.3e for Canon\u0000", "creator":...
2026-03-24T14:01:31.9605631Z stdout F │ │ └ 'INSERT INTO document_chunk (id, vector, collection_name, text, vmetadata) VALUES (%(id__0)s, %(vector__0)s, %(collection_nam...
2026-03-24T14:01:31.9605670Z stdout F │ └ <method 'execute' of 'psycopg2.extensions.cursor' objects>
2026-03-24T14:01:31.9605701Z stdout F └ <cursor object at 0x7902ae6a86d0; closed: -1>
2026-03-24T14:01:31.9605723Z stdout F
2026-03-24T14:01:31.9605748Z stdout F sqlalchemy.exc.DataError: (psycopg2.errors.UntranslatableCharacter) unsupported Unicode escape sequence
2026-03-24T14:01:31.9605775Z stdout F LINE 27: XXXXXXXXXXXXXXXXXX', '{"producer": "Adobe...
2026-03-24T14:01:31.9605800Z stdout F ^
2026-03-24T14:01:31.9605820Z stdout F DETAIL: \u0000 cannot be converted to text.
2026-03-24T14:01:31.9605841Z stdout F CONTEXT: JSON data, line 1: {"producer": "Adobe PSL 1.3e for Canon\u0000...
2026-03-24T14:01:31.9605864Z stdout F
2026-03-24T14:01:31.9605884Z stdout F [SQL: INSERT INTO document_chunk (id, vector, collection_name, text, vmetadata) VALUES (%(id__0)s, %(vector__0)s, %(collection_name__0)s, %(text__0)s, %(vmetadata__0)s::JSONB), (%(id__1)s, %(vector__1)s, %(collection_name__1)s, %(text__1)s, %(vmetadata__1) ... 4619 characters truncated ... SONB), (%(id__51)s, %(vector__51)s, %(collection_name__51)s, %(text__51)s, %(vmetadata__51)s::JSONB)]
2026-03-24T14:01:31.9607130Z stdout F [parameters: {'id__0': '70d2bbf1-f4df-4029-b5b4-64fde9db1286', 'vmetadata__0': '{"producer": "Adobe PSL 1.3e for Canon\u0000", "creator": "Canon iR-ADV 4935 PDF", "creationdate": "2026-03-13T11:18:51+00:00", "moddate": "2026-03 ... (334 characters truncated) ... fa62cdd67f91856d4f3773aede19895bbb64bde07b1c6aa800eeeb", "embedding_config": "{'engine': 'azure_openai', 'model': 'text-embedding-3-large'}"}', 'vector__0':

Additional Information

Im using openwebui 0.8.10 with PostgreSQL/pgvector and openai text-embedding-3-large as embedding model.

Originally created by @Sacul13 on GitHub (Mar 24, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/22992 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version 0.8.10 ### Ollama Version (if applicable) _No response_ ### Operating System Ubuntu 22.04 ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior Upload the PDF document and content gets extracted for fruther usage in chat or knowledge space. ### Actual Behavior Document upload fials with a error but stays visibly in the chat UI as context eventhough its not assesable for the LLM. Uploading the document in the knowledge space results in the same error message but the document isnt shown in the knowledge space UI. ### Steps to Reproduce Example A: 1. Upload the specific PDF into chat as context 2. Error message is shown that \u0000 cannot be converted to text 3. PDF file stays in context of chat 4. Sending prompt regarding the document context 5. Getting the answer "document cant be accessed" Example B: 1. Upload the specific PDF in a knowledge space 2. Error message is shown that \u0000 cannot be converted to text 3. PDF file is not present in the knowledge space I cant upload the document here as context since it holds sensitive information. ### Logs & Screenshots 2026-03-24T14:01:31.9599458Z stdout F Traceback (most recent call last): 2026-03-24T14:01:31.9599509Z stdout F 2026-03-24T14:01:31.9599529Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2125, in _exec_insertmany_context 2026-03-24T14:01:31.9599564Z stdout F dialect.do_execute( 2026-03-24T14:01:31.9599886Z stdout F │ └ <function DefaultDialect.do_execute at 0x79030d018ae0> 2026-03-24T14:01:31.9600124Z stdout F └ <sqlalchemy.dialects.postgresql.psycopg2.PGDialect_psycopg2 object at 0x7902b3bbdf50> 2026-03-24T14:01:31.9600232Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 952, in do_execute 2026-03-24T14:01:31.9600259Z stdout F cursor.execute(statement, parameters) 2026-03-24T14:01:31.9600281Z stdout F │ │ │ └ {'id__0': '70d2bbf1-f4df-4029-b5b4-64fde9db1286', 'vmetadata__0': '{"producer": "Adobe PSL 1.3e for Canon\\u0000", "creator":... 2026-03-24T14:01:31.9600306Z stdout F │ │ └ 'INSERT INTO document_chunk (id, vector, collection_name, text, vmetadata) VALUES (%(id__0)s, %(vector__0)s, %(collection_nam... 2026-03-24T14:01:31.9600359Z stdout F │ └ <method 'execute' of 'psycopg2.extensions.cursor' objects> 2026-03-24T14:01:31.9600381Z stdout F └ <cursor object at 0x7902ae6a86d0; closed: -1> 2026-03-24T14:01:31.9600404Z stdout F 2026-03-24T14:01:31.9600426Z stdout F psycopg2.errors.UntranslatableCharacter: unsupported Unicode escape sequence 2026-03-24T14:01:31.9600448Z stdout F LINE 27: XXXXXXXXXXXXXXXXXX,', '{"producer": "Adobe... 2026-03-24T14:01:31.9600468Z stdout F ^ 2026-03-24T14:01:31.9600488Z stdout F DETAIL: \u0000 cannot be converted to text. 2026-03-24T14:01:31.9600860Z stdout F CONTEXT: JSON data, line 1: {"producer": "Adobe PSL 1.3e for Canon\u0000... 2026-03-24T14:01:31.9600896Z stdout F 2026-03-24T14:01:31.9600924Z stdout F 2026-03-24T14:01:31.9600944Z stdout F 2026-03-24T14:01:31.9600964Z stdout F The above exception was the direct cause of the following exception: 2026-03-24T14:01:31.9601001Z stdout F 2026-03-24T14:01:31.9601021Z stdout F 2026-03-24T14:01:31.9601040Z stdout F Traceback (most recent call last): 2026-03-24T14:01:31.9601061Z stdout F 2026-03-24T14:01:31.9601080Z stdout F File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap 2026-03-24T14:01:31.9601101Z stdout F self._bootstrap_inner() 2026-03-24T14:01:31.9601122Z stdout F │ └ <function Thread._bootstrap_inner at 0x79030fa00b80> 2026-03-24T14:01:31.9601144Z stdout F └ <WorkerThread(AnyIO worker thread, started 133049945614016)> 2026-03-24T14:01:31.9601166Z stdout F File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner 2026-03-24T14:01:31.9601186Z stdout F self.run() 2026-03-24T14:01:31.9601207Z stdout F │ └ <function WorkerThread.run at 0x7902ae7d7560> 2026-03-24T14:01:31.9601228Z stdout F └ <WorkerThread(AnyIO worker thread, started 133049945614016)> 2026-03-24T14:01:31.9601249Z stdout F File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 986, in run 2026-03-24T14:01:31.9601282Z stdout F result = context.run(func, *args) 2026-03-24T14:01:31.9601303Z stdout F │ │ │ └ () 2026-03-24T14:01:31.9601325Z stdout F │ │ └ functools.partial(<function process_uploaded_file at 0x7902b3ab3420>, <starlette.requests.Request object at 0x790246c21bd0>, ... 2026-03-24T14:01:31.9601349Z stdout F │ └ <method 'run' of '_contextvars.Context' objects> 2026-03-24T14:01:31.9601621Z stdout F └ <_contextvars.Context object at 0x79024595be80> 2026-03-24T14:01:31.9601756Z stdout F 2026-03-24T14:01:31.9601813Z stdout F File "/app/backend/open_webui/routers/files.py", line 164, in process_uploaded_file 2026-03-24T14:01:31.9601837Z stdout F _process_handler(db_session) 2026-03-24T14:01:31.9602187Z stdout F │ └ <sqlalchemy.orm.session.Session object at 0x790266242b10> 2026-03-24T14:01:31.9602221Z stdout F └ <function process_uploaded_file.<locals>._process_handler at 0x79024611c9a0> 2026-03-24T14:01:31.9602244Z stdout F 2026-03-24T14:01:31.9602264Z stdout F File "/app/backend/open_webui/routers/files.py", line 128, in _process_handler 2026-03-24T14:01:31.9602285Z stdout F process_file( 2026-03-24T14:01:31.9602306Z stdout F └ <function process_file at 0x7902b16d44a0> 2026-03-24T14:01:31.9602342Z stdout F 2026-03-24T14:01:31.9602363Z stdout F File "/app/backend/open_webui/routers/retrieval.py", line 1836, in process_file 2026-03-24T14:01:31.9602383Z stdout F result = save_docs_to_vector_db( 2026-03-24T14:01:31.9602403Z stdout F └ <function save_docs_to_vector_db at 0x7902b16d4360> 2026-03-24T14:01:31.9602424Z stdout F 2026-03-24T14:01:31.9602754Z stdout F File "/app/backend/open_webui/routers/retrieval.py", line 1638, in save_docs_to_vector_db 2026-03-24T14:01:31.9602776Z stdout F VECTOR_DB_CLIENT.insert( 2026-03-24T14:01:31.9602797Z stdout F │ └ <function PgvectorClient.insert at 0x7902b3bb3880> 2026-03-24T14:01:31.9602819Z stdout F └ <open_webui.retrieval.vector.dbs.pgvector.PgvectorClient object at 0x7902b3d837d0> 2026-03-24T14:01:31.9602841Z stdout F 2026-03-24T14:01:31.9603001Z stdout F > File "/app/backend/open_webui/retrieval/vector/dbs/pgvector.py", line 336, in insert 2026-03-24T14:01:31.9603051Z stdout F self.session.bulk_save_objects(new_items) 2026-03-24T14:01:31.9603091Z stdout F │ │ │ └ [<open_webui.retrieval.vector.dbs.pgvector.DocumentChunk object at 0x7902446a1590>, <open_webui.retrieval.vector.dbs.pgvector... 2026-03-24T14:01:31.9603117Z stdout F │ │ └ <function scoped_session.bulk_save_objects at 0x7902e4eaf7e0> 2026-03-24T14:01:31.9603137Z stdout F │ └ <sqlalchemy.orm.scoping.scoped_session object at 0x7902b3bbe9d0> 2026-03-24T14:01:31.9603155Z stdout F └ <open_webui.retrieval.vector.dbs.pgvector.PgvectorClient object at 0x7902b3d837d0> 2026-03-24T14:01:31.9603206Z stdout F 2026-03-24T14:01:31.9603224Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/scoping.py", line 1344, in bulk_save_objects 2026-03-24T14:01:31.9603243Z stdout F return self._proxied.bulk_save_objects( 2026-03-24T14:01:31.9603261Z stdout F │ └ <property object at 0x7902e4e56e80> 2026-03-24T14:01:31.9603280Z stdout F └ <sqlalchemy.orm.scoping.scoped_session object at 0x7902b3bbe9d0> 2026-03-24T14:01:31.9603301Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 4564, in bulk_save_objects 2026-03-24T14:01:31.9603327Z stdout F self._bulk_save_mappings( 2026-03-24T14:01:31.9603349Z stdout F │ └ <function Session._bulk_save_mappings at 0x7902e4fcf240> 2026-03-24T14:01:31.9603468Z stdout F └ <sqlalchemy.orm.session.Session object at 0x790245078f50> 2026-03-24T14:01:31.9603524Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 4741, in _bulk_save_mappings 2026-03-24T14:01:31.9603554Z stdout F with util.safe_reraise(): 2026-03-24T14:01:31.9603575Z stdout F │ └ <class 'sqlalchemy.util.langhelpers.safe_reraise'> 2026-03-24T14:01:31.9603597Z stdout F └ <module 'sqlalchemy.util' from '/usr/local/lib/python3.11/site-packages/sqlalchemy/util/__init__.py'> 2026-03-24T14:01:31.9603621Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/util/langhelpers.py", line 121, in __exit__ 2026-03-24T14:01:31.9603642Z stdout F raise exc_value.with_traceback(exc_tb) 2026-03-24T14:01:31.9603662Z stdout F │ │ └ <traceback object at 0x790244a167c0> 2026-03-24T14:01:31.9603684Z stdout F │ └ <method 'with_traceback' of 'BaseException' objects> 2026-03-24T14:01:31.9603706Z stdout F └ DataError('(psycopg2.errors.UntranslatableCharacter) unsupported Unicode escape sequence\nLINE 27: XXXXXXXXXXXXXX 2026-03-24T14:01:31.9603954Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 4730, in _bulk_save_mappings 2026-03-24T14:01:31.9603981Z stdout F bulk_persistence._bulk_insert( 2026-03-24T14:01:31.9604078Z stdout F │ └ <function _bulk_insert at 0x7902e4fb8e00> 2026-03-24T14:01:31.9604105Z stdout F └ <module 'sqlalchemy.orm.bulk_persistence' from '/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/bulk_persistence.py'> 2026-03-24T14:01:31.9604152Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/bulk_persistence.py", line 222, in _bulk_insert 2026-03-24T14:01:31.9604174Z stdout F result = persistence._emit_insert_statements( 2026-03-24T14:01:31.9604196Z stdout F │ └ <function _emit_insert_statements at 0x7902e4fb87c0> 2026-03-24T14:01:31.9604218Z stdout F └ <module 'sqlalchemy.orm.persistence' from '/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/persistence.py'> 2026-03-24T14:01:31.9604240Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/persistence.py", line 1048, in _emit_insert_statements 2026-03-24T14:01:31.9604262Z stdout F result = connection.execute( 2026-03-24T14:01:31.9604282Z stdout F │ └ <function Connection.execute at 0x79030d084720> 2026-03-24T14:01:31.9604303Z stdout F └ <sqlalchemy.engine.base.Connection object at 0x790245a83d50> 2026-03-24T14:01:31.9604324Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1419, in execute 2026-03-24T14:01:31.9604356Z stdout F return meth( 2026-03-24T14:01:31.9604463Z stdout F └ <bound method ClauseElement._execute_on_connection of <sqlalchemy.sql.dml.Insert object at 0x7902885a2b90>> 2026-03-24T14:01:31.9604505Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/sql/elements.py", line 527, in _execute_on_connection 2026-03-24T14:01:31.9604539Z stdout F return connection._execute_clauseelement( 2026-03-24T14:01:31.9604560Z stdout F │ └ <function Connection._execute_clauseelement at 0x79030d084a40> 2026-03-24T14:01:31.9604582Z stdout F └ <sqlalchemy.engine.base.Connection object at 0x790245a83d50> 2026-03-24T14:01:31.9604603Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1641, in _execute_clauseelement 2026-03-24T14:01:31.9604627Z stdout F ret = self._execute_context( 2026-03-24T14:01:31.9604650Z stdout F │ └ <function Connection._execute_context at 0x79030d084c20> 2026-03-24T14:01:31.9604671Z stdout F └ <sqlalchemy.engine.base.Connection object at 0x790245a83d50> 2026-03-24T14:01:31.9604708Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 1844, in _execute_context 2026-03-24T14:01:31.9604730Z stdout F return self._exec_insertmany_context(dialect, context) 2026-03-24T14:01:31.9604755Z stdout F │ │ │ └ <sqlalchemy.dialects.postgresql.psycopg2.PGExecutionContext_psycopg2 object at 0x7902a8769510> 2026-03-24T14:01:31.9604858Z stdout F │ │ └ <sqlalchemy.dialects.postgresql.psycopg2.PGDialect_psycopg2 object at 0x7902b3bbdf50> 2026-03-24T14:01:31.9604889Z stdout F │ └ <function Connection._exec_insertmany_context at 0x79030d084d60> 2026-03-24T14:01:31.9604914Z stdout F └ <sqlalchemy.engine.base.Connection object at 0x790245a83d50> 2026-03-24T14:01:31.9604938Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2133, in _exec_insertmany_context 2026-03-24T14:01:31.9604959Z stdout F self._handle_dbapi_exception( 2026-03-24T14:01:31.9604980Z stdout F │ └ <function Connection._handle_dbapi_exception at 0x79030d084f40> 2026-03-24T14:01:31.9605002Z stdout F └ <sqlalchemy.engine.base.Connection object at 0x790245a83d50> 2026-03-24T14:01:31.9605160Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2363, in _handle_dbapi_exception 2026-03-24T14:01:31.9605187Z stdout F raise sqlalchemy_exception.with_traceback(exc_info[2]) from e 2026-03-24T14:01:31.9605209Z stdout F │ │ │ └ UntranslatableCharacter('unsupported Unicode escape sequence\nLINE 27: 1) repeals the challenged tax ruling,\', \'{"producer"... 2026-03-24T14:01:31.9605297Z stdout F │ │ └ (<class 'psycopg2.errors.UntranslatableCharacter'>, UntranslatableCharacter('unsupported Unicode escape sequence\nLINE 27: 1)... 2026-03-24T14:01:31.9605343Z stdout F │ └ <method 'with_traceback' of 'BaseException' objects> 2026-03-24T14:01:31.9605369Z stdout F └ DataError('(psycopg2.errors.UntranslatableCharacter) unsupported Unicode escape sequence\nLINE 27: 1) repeals the challenged ... 2026-03-24T14:01:31.9605391Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2125, in _exec_insertmany_context 2026-03-24T14:01:31.9605412Z stdout F dialect.do_execute( 2026-03-24T14:01:31.9605432Z stdout F │ └ <function DefaultDialect.do_execute at 0x79030d018ae0> 2026-03-24T14:01:31.9605454Z stdout F └ <sqlalchemy.dialects.postgresql.psycopg2.PGDialect_psycopg2 object at 0x7902b3bbdf50> 2026-03-24T14:01:31.9605475Z stdout F File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 952, in do_execute 2026-03-24T14:01:31.9605496Z stdout F cursor.execute(statement, parameters) 2026-03-24T14:01:31.9605516Z stdout F │ │ │ └ {'id__0': '70d2bbf1-f4df-4029-b5b4-64fde9db1286', 'vmetadata__0': '{"producer": "Adobe PSL 1.3e for Canon\\u0000", "creator":... 2026-03-24T14:01:31.9605631Z stdout F │ │ └ 'INSERT INTO document_chunk (id, vector, collection_name, text, vmetadata) VALUES (%(id__0)s, %(vector__0)s, %(collection_nam... 2026-03-24T14:01:31.9605670Z stdout F │ └ <method 'execute' of 'psycopg2.extensions.cursor' objects> 2026-03-24T14:01:31.9605701Z stdout F └ <cursor object at 0x7902ae6a86d0; closed: -1> 2026-03-24T14:01:31.9605723Z stdout F 2026-03-24T14:01:31.9605748Z stdout F sqlalchemy.exc.DataError: (psycopg2.errors.UntranslatableCharacter) unsupported Unicode escape sequence 2026-03-24T14:01:31.9605775Z stdout F LINE 27: XXXXXXXXXXXXXXXXXX', '{"producer": "Adobe... 2026-03-24T14:01:31.9605800Z stdout F ^ 2026-03-24T14:01:31.9605820Z stdout F DETAIL: \u0000 cannot be converted to text. 2026-03-24T14:01:31.9605841Z stdout F CONTEXT: JSON data, line 1: {"producer": "Adobe PSL 1.3e for Canon\u0000... 2026-03-24T14:01:31.9605864Z stdout F 2026-03-24T14:01:31.9605884Z stdout F [SQL: INSERT INTO document_chunk (id, vector, collection_name, text, vmetadata) VALUES (%(id__0)s, %(vector__0)s, %(collection_name__0)s, %(text__0)s, %(vmetadata__0)s::JSONB), (%(id__1)s, %(vector__1)s, %(collection_name__1)s, %(text__1)s, %(vmetadata__1) ... 4619 characters truncated ... SONB), (%(id__51)s, %(vector__51)s, %(collection_name__51)s, %(text__51)s, %(vmetadata__51)s::JSONB)] 2026-03-24T14:01:31.9607130Z stdout F [parameters: {'id__0': '70d2bbf1-f4df-4029-b5b4-64fde9db1286', 'vmetadata__0': '{"producer": "Adobe PSL 1.3e for Canon\\u0000", "creator": "Canon iR-ADV 4935 PDF", "creationdate": "2026-03-13T11:18:51+00:00", "moddate": "2026-03 ... (334 characters truncated) ... fa62cdd67f91856d4f3773aede19895bbb64bde07b1c6aa800eeeb", "embedding_config": "{\'engine\': \'azure_openai\', \'model\': \'text-embedding-3-large\'}"}', 'vector__0': ### Additional Information Im using openwebui 0.8.10 with PostgreSQL/pgvector and openai text-embedding-3-large as embedding model.
GiteaMirror added the bug label 2026-04-20 02:23:01 -05:00
Author
Owner

@rgaricano commented on GitHub (Apr 3, 2026):

yes, it happened to me too,
I solved with this FIX in the adjust_vector_length function of backend/open_webui/retrieval/vector/dbs/pgvector.py :

import math  # Add this import at the top of the file  
  
def adjust_vector_length(self, vector: List[float]) -> List[float]:  
    # Adjust vector to have length VECTOR_LENGTH  
    current_length = len(vector)  
    if current_length < VECTOR_LENGTH:  
        # Pad the vector with zeros  
        vector += [0.0] * (VECTOR_LENGTH - current_length)  
    elif current_length > VECTOR_LENGTH:  
        # Truncate the vector to VECTOR_LENGTH  
        vector = vector[:VECTOR_LENGTH]  
      
    # Replace NaN values with 0.0  
    vector = [0.0 if math.isnan(x) else x for x in vector]  
    return vector
<!-- gh-comment-id:4184840033 --> @rgaricano commented on GitHub (Apr 3, 2026): yes, it happened to me too, I solved with this FIX in the `adjust_vector_length` function of `backend/open_webui/retrieval/vector/dbs/pgvector.py` : ``` import math # Add this import at the top of the file def adjust_vector_length(self, vector: List[float]) -> List[float]: # Adjust vector to have length VECTOR_LENGTH current_length = len(vector) if current_length < VECTOR_LENGTH: # Pad the vector with zeros vector += [0.0] * (VECTOR_LENGTH - current_length) elif current_length > VECTOR_LENGTH: # Truncate the vector to VECTOR_LENGTH vector = vector[:VECTOR_LENGTH] # Replace NaN values with 0.0 vector = [0.0 if math.isnan(x) else x for x in vector] return vector ```
Author
Owner

@tjbck commented on GitHub (Apr 13, 2026):

Addressed in dev.

<!-- gh-comment-id:4239618189 --> @tjbck commented on GitHub (Apr 13, 2026): Addressed in dev.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#19859