[GH-ISSUE #14729] issue: default document loader can't handle some PDF articles #17348

Closed
opened 2026-04-19 23:05:02 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @astroboylrx on GitHub (Jun 6, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/14729

Check Existing Issues

  • I have searched the existing issues and discussions.
  • I am using the latest version of Open WebUI.

Installation Method

Pip Install

Open WebUI Version

v0.6.13

Ollama Version (if applicable)

No response

Operating System

macOS Sequoia 15.5

Browser (if applicable)

Chrome 137.0.7151.68, Safari 18.5

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

Drag a PDF to a chat with any model, the backend should extract text from that PDF.

Actual Behavior

For certain PDF files, the default loader fails to extract text from PDF.

Steps to Reproduce

Drag this PDF to your WebUI interface:

s41550-023-01945-7.pdf

Logs & Screenshots

The only relevant log is:

2025-06-06 11:50:28.311 | INFO     | open_webui.routers.files:upload_file:94 - file.content_type: application/pdf - {}
2025-06-06 11:50:28.350 | ERROR    | open_webui.routers.retrieval:process_file:1413 - Cannot handle this data type: (1, 1, 1), |u1 - {}
Traceback (most recent call last):

  File "/Users/user/venvs/webui/lib/python3.11/site-packages/PIL/Image.py", line 3299, in fromarray
    mode, rawmode = _fromarray_typemap[typekey]
    │               │                  └ ((1, 1, 1), '|u1')
    │               └ {((1, 1), '|b1'): ('1', '1;8'), ((1, 1), '|u1'): ('L', 'L'), ((1, 1), '|i1'): ('I', 'I;8'), ((1, 1), '<u2'): ('I', 'I;16'), (...
    └ None

KeyError: ((1, 1, 1), '|u1')


The above exception was the direct cause of the following exception:


Traceback (most recent call last):

  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1002, in _bootstrap
    self._bootstrap_inner()
    │    └ <function Thread._bootstrap_inner at 0x1024751c0>
    └ <WorkerThread(AnyIO worker thread, started 14422241280)>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
    │    └ <function WorkerThread.run at 0x355cfb600>
    └ <WorkerThread(AnyIO worker thread, started 14422241280)>
  File "/Users/user/venvs/webui/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 967, in run
    result = context.run(func, *args)
             │       │   │      └ ()
             │       │   └ functools.partial(<function upload_file at 0x1690b9120>, user=UserModel(id='d1610ef6-173c-42df-b591-2d4afc13b308', name='John...
             │       └ <method 'run' of '_contextvars.Context' objects>
             └ <_contextvars.Context object at 0x3539d3a00>
  File "/Users/user/venvs/webui/lib/python3.11/site-packages/open_webui/routers/files.py", line 172, in upload_file
    process_file(request, ProcessFileForm(file_id=id), user=user)
    │            │        │                       │         └ UserModel(id='d1610ef6-173c-42df-b591-2d4afc13b308', name='John Doe', email='johndoe@gmail.com', role='admin', profile_im...
    │            │        │                       └ '1e5b6280-f4de-4b93-a467-728881b29236'
    │            │        └ <class 'open_webui.routers.retrieval.ProcessFileForm'>
    │            └ <starlette.requests.Request object at 0x353d90290>
    └ <function process_file at 0x16cb7ce00>
> File "/Users/user/venvs/webui/lib/python3.11/site-packages/open_webui/routers/retrieval.py", line 1332, in process_file
    docs = loader.load(
           │      └ <function Loader.load at 0x16bb494e0>
           └ <open_webui.retrieval.loaders.main.Loader object at 0x16ba54290>
  File "/Users/user/venvs/webui/lib/python3.11/site-packages/open_webui/retrieval/loaders/main.py", line 203, in load
    docs = loader.load()
           │      └ <function BaseLoader.load at 0x16b425300>
           └ <langchain_community.document_loaders.pdf.PyPDFLoader object at 0x3538cc410>
  File "/Users/user/venvs/webui/lib/python3.11/site-packages/langchain_core/document_loaders/base.py", line 32, in load
    return list(self.lazy_load())
                │    └ <function PyPDFLoader.lazy_load at 0x16badf880>
                └ <langchain_community.document_loaders.pdf.PyPDFLoader object at 0x3538cc410>
  File "/Users/user/venvs/webui/lib/python3.11/site-packages/langchain_community/document_loaders/pdf.py", line 305, in lazy_load
    yield from self.parser.lazy_parse(blob)
               │    │      │          └ Blob 14293046992 /Users/user/venvs/webui/lib/python3.11/site-packages/open_webui/data/uploads/1e5b6280-f4de-4b93-a4...
               │    │      └ <function PyPDFParser.lazy_parse at 0x16bade2a0>
               │    └ <langchain_community.document_loaders.parsers.pdf.PyPDFParser object at 0x34a3f6d50>
               └ <langchain_community.document_loaders.pdf.PyPDFLoader object at 0x3538cc410>
  File "/Users/user/venvs/webui/lib/python3.11/site-packages/langchain_community/document_loaders/parsers/pdf.py", line 397, in lazy_parse
    images_from_page = self.extract_images_from_page(page)
                       │    │                        └ {'/Annots': IndirectObject(325, 0, 14291643280), '/Contents': [IndirectObject(391, 0, 14291643280), IndirectObject(392, 0, 14...
                       │    └ <function PyPDFParser.extract_images_from_page at 0x16bade340>
                       └ <langchain_community.document_loaders.parsers.pdf.PyPDFParser object at 0x34a3f6d50>
  File "/Users/user/venvs/webui/lib/python3.11/site-packages/langchain_community/document_loaders/parsers/pdf.py", line 460, in extract_images_from_page
    Image.fromarray(np_image).save(image_bytes, format="PNG")
    │     │         │              └ <_io.BytesIO object at 0x353a5d710>
    │     │         └ array([[[ 0],
    │     │                   [ 0],
    │     │                   [ 0],
    │     │                   ...,
    │     │                   [ 0],
    │     │                   [ 0],
    │     │                   [ 0]],
    │     │
    │     │                  [[ 1],
    │     │                   [ 1]...
    │     └ <function fromarray at 0x17b237420>
    └ <module 'PIL.Image' from '/Users/user/venvs/webui/lib/python3.11/site-packages/PIL/Image.py'>
  File "/Users/user/venvs/webui/lib/python3.11/site-packages/PIL/Image.py", line 3303, in fromarray
    raise TypeError(msg) from e
                    └ 'Cannot handle this data type: (1, 1, 1), |u1'

TypeError: Cannot handle this data type: (1, 1, 1), |u1
2025-06-06 11:50:28.352 | ERROR    | open_webui.routers.files:upload_file:181 - 400: Cannot handle this data type: (1, 1, 1), |u1 - {}
Traceback (most recent call last):

  File "/Users/user/venvs/webui/lib/python3.11/site-packages/PIL/Image.py", line 3299, in fromarray
    mode, rawmode = _fromarray_typemap[typekey]
    │               │                  └ ((1, 1, 1), '|u1')
    │               └ {((1, 1), '|b1'): ('1', '1;8'), ((1, 1), '|u1'): ('L', 'L'), ((1, 1), '|i1'): ('I', 'I;8'), ((1, 1), '<u2'): ('I', 'I;16'), (...
    └ None

KeyError: ((1, 1, 1), '|u1')


The above exception was the direct cause of the following exception:


Traceback (most recent call last):

  File "/Users/user/venvs/webui/lib/python3.11/site-packages/open_webui/routers/retrieval.py", line 1332, in process_file
    docs = loader.load(
           │      └ <function Loader.load at 0x16bb494e0>
           └ <open_webui.retrieval.loaders.main.Loader object at 0x16ba54290>
  File "/Users/user/venvs/webui/lib/python3.11/site-packages/open_webui/retrieval/loaders/main.py", line 203, in load
    docs = loader.load()
           │      └ <function BaseLoader.load at 0x16b425300>
           └ <langchain_community.document_loaders.pdf.PyPDFLoader object at 0x3538cc410>
  File "/Users/user/venvs/webui/lib/python3.11/site-packages/langchain_core/document_loaders/base.py", line 32, in load
    return list(self.lazy_load())
                │    └ <function PyPDFLoader.lazy_load at 0x16badf880>
                └ <langchain_community.document_loaders.pdf.PyPDFLoader object at 0x3538cc410>
  File "/Users/user/venvs/webui/lib/python3.11/site-packages/langchain_community/document_loaders/pdf.py", line 305, in lazy_load
    yield from self.parser.lazy_parse(blob)
               │    │      │          └ Blob 14293046992 /Users/user/venvs/webui/lib/python3.11/site-packages/open_webui/data/uploads/1e5b6280-f4de-4b93-a4...
               │    │      └ <function PyPDFParser.lazy_parse at 0x16bade2a0>
               │    └ <langchain_community.document_loaders.parsers.pdf.PyPDFParser object at 0x34a3f6d50>
               └ <langchain_community.document_loaders.pdf.PyPDFLoader object at 0x3538cc410>
  File "/Users/user/venvs/webui/lib/python3.11/site-packages/langchain_community/document_loaders/parsers/pdf.py", line 397, in lazy_parse
    images_from_page = self.extract_images_from_page(page)
                       │    │                        └ {'/Annots': IndirectObject(325, 0, 14291643280), '/Contents': [IndirectObject(391, 0, 14291643280), IndirectObject(392, 0, 14...
                       │    └ <function PyPDFParser.extract_images_from_page at 0x16bade340>
                       └ <langchain_community.document_loaders.parsers.pdf.PyPDFParser object at 0x34a3f6d50>
  File "/Users/user/venvs/webui/lib/python3.11/site-packages/langchain_community/document_loaders/parsers/pdf.py", line 460, in extract_images_from_page
    Image.fromarray(np_image).save(image_bytes, format="PNG")
    │     │         │              └ <_io.BytesIO object at 0x353a5d710>
    │     │         └ array([[[ 0],
    │     │                   [ 0],
    │     │                   [ 0],
    │     │                   ...,
    │     │                   [ 0],
    │     │                   [ 0],
    │     │                   [ 0]],
    │     │
    │     │                  [[ 1],
    │     │                   [ 1]...
    │     └ <function fromarray at 0x17b237420>
    └ <module 'PIL.Image' from '/Users/user/venvs/webui/lib/python3.11/site-packages/PIL/Image.py'>
  File "/Users/user/venvs/webui/lib/python3.11/site-packages/PIL/Image.py", line 3303, in fromarray
    raise TypeError(msg) from e
                    └ 'Cannot handle this data type: (1, 1, 1), |u1'

TypeError: Cannot handle this data type: (1, 1, 1), |u1


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1002, in _bootstrap
    self._bootstrap_inner()
    │    └ <function Thread._bootstrap_inner at 0x1024751c0>
    └ <WorkerThread(AnyIO worker thread, started 14422241280)>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
    │    └ <function WorkerThread.run at 0x355cfb600>
    └ <WorkerThread(AnyIO worker thread, started 14422241280)>
  File "/Users/user/venvs/webui/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 967, in run
    result = context.run(func, *args)
             │       │   │      └ ()
             │       │   └ functools.partial(<function upload_file at 0x1690b9120>, user=UserModel(id='d1610ef6-173c-42df-b591-2d4afc13b308', name='John...
             │       └ <method 'run' of '_contextvars.Context' objects>
             └ <_contextvars.Context object at 0x3539d3a00>
> File "/Users/user/venvs/webui/lib/python3.11/site-packages/open_webui/routers/files.py", line 172, in upload_file
    process_file(request, ProcessFileForm(file_id=id), user=user)
    │            │        │                       │         └ UserModel(id='d1610ef6-173c-42df-b591-2d4afc13b308', name='John Doe', email='johndoe@gmail.com', role='admin', profile_im...
    │            │        │                       └ '1e5b6280-f4de-4b93-a467-728881b29236'
    │            │        └ <class 'open_webui.routers.retrieval.ProcessFileForm'>
    │            └ <starlette.requests.Request object at 0x353d90290>
    └ <function process_file at 0x16cb7ce00>
  File "/Users/user/venvs/webui/lib/python3.11/site-packages/open_webui/routers/retrieval.py", line 1420, in process_file
    raise HTTPException(
          └ <class 'fastapi.exceptions.HTTPException'>

fastapi.exceptions.HTTPException: 400: Cannot handle this data type: (1, 1, 1), |u1
2025-06-06 11:50:28.353 | ERROR    | open_webui.routers.files:upload_file:182 - Error processing file: 1e5b6280-f4de-4b93-a467-728881b29236 - {}

Additional Information

No response

Originally created by @astroboylrx on GitHub (Jun 6, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/14729 ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Pip Install ### Open WebUI Version v0.6.13 ### Ollama Version (if applicable) _No response_ ### Operating System macOS Sequoia 15.5 ### Browser (if applicable) Chrome 137.0.7151.68, Safari 18.5 ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior Drag a PDF to a chat with any model, the backend should extract text from that PDF. ### Actual Behavior For certain PDF files, the default loader fails to extract text from PDF. ### Steps to Reproduce Drag this PDF to your WebUI interface: [s41550-023-01945-7.pdf](https://github.com/user-attachments/files/20626664/s41550-023-01945-7.pdf) ### Logs & Screenshots The only relevant log is: ``` 2025-06-06 11:50:28.311 | INFO | open_webui.routers.files:upload_file:94 - file.content_type: application/pdf - {} 2025-06-06 11:50:28.350 | ERROR | open_webui.routers.retrieval:process_file:1413 - Cannot handle this data type: (1, 1, 1), |u1 - {} Traceback (most recent call last): File "/Users/user/venvs/webui/lib/python3.11/site-packages/PIL/Image.py", line 3299, in fromarray mode, rawmode = _fromarray_typemap[typekey] │ │ └ ((1, 1, 1), '|u1') │ └ {((1, 1), '|b1'): ('1', '1;8'), ((1, 1), '|u1'): ('L', 'L'), ((1, 1), '|i1'): ('I', 'I;8'), ((1, 1), '<u2'): ('I', 'I;16'), (... └ None KeyError: ((1, 1, 1), '|u1') The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1002, in _bootstrap self._bootstrap_inner() │ └ <function Thread._bootstrap_inner at 0x1024751c0> └ <WorkerThread(AnyIO worker thread, started 14422241280)> File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1045, in _bootstrap_inner self.run() │ └ <function WorkerThread.run at 0x355cfb600> └ <WorkerThread(AnyIO worker thread, started 14422241280)> File "/Users/user/venvs/webui/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 967, in run result = context.run(func, *args) │ │ │ └ () │ │ └ functools.partial(<function upload_file at 0x1690b9120>, user=UserModel(id='d1610ef6-173c-42df-b591-2d4afc13b308', name='John... │ └ <method 'run' of '_contextvars.Context' objects> └ <_contextvars.Context object at 0x3539d3a00> File "/Users/user/venvs/webui/lib/python3.11/site-packages/open_webui/routers/files.py", line 172, in upload_file process_file(request, ProcessFileForm(file_id=id), user=user) │ │ │ │ └ UserModel(id='d1610ef6-173c-42df-b591-2d4afc13b308', name='John Doe', email='johndoe@gmail.com', role='admin', profile_im... │ │ │ └ '1e5b6280-f4de-4b93-a467-728881b29236' │ │ └ <class 'open_webui.routers.retrieval.ProcessFileForm'> │ └ <starlette.requests.Request object at 0x353d90290> └ <function process_file at 0x16cb7ce00> > File "/Users/user/venvs/webui/lib/python3.11/site-packages/open_webui/routers/retrieval.py", line 1332, in process_file docs = loader.load( │ └ <function Loader.load at 0x16bb494e0> └ <open_webui.retrieval.loaders.main.Loader object at 0x16ba54290> File "/Users/user/venvs/webui/lib/python3.11/site-packages/open_webui/retrieval/loaders/main.py", line 203, in load docs = loader.load() │ └ <function BaseLoader.load at 0x16b425300> └ <langchain_community.document_loaders.pdf.PyPDFLoader object at 0x3538cc410> File "/Users/user/venvs/webui/lib/python3.11/site-packages/langchain_core/document_loaders/base.py", line 32, in load return list(self.lazy_load()) │ └ <function PyPDFLoader.lazy_load at 0x16badf880> └ <langchain_community.document_loaders.pdf.PyPDFLoader object at 0x3538cc410> File "/Users/user/venvs/webui/lib/python3.11/site-packages/langchain_community/document_loaders/pdf.py", line 305, in lazy_load yield from self.parser.lazy_parse(blob) │ │ │ └ Blob 14293046992 /Users/user/venvs/webui/lib/python3.11/site-packages/open_webui/data/uploads/1e5b6280-f4de-4b93-a4... │ │ └ <function PyPDFParser.lazy_parse at 0x16bade2a0> │ └ <langchain_community.document_loaders.parsers.pdf.PyPDFParser object at 0x34a3f6d50> └ <langchain_community.document_loaders.pdf.PyPDFLoader object at 0x3538cc410> File "/Users/user/venvs/webui/lib/python3.11/site-packages/langchain_community/document_loaders/parsers/pdf.py", line 397, in lazy_parse images_from_page = self.extract_images_from_page(page) │ │ └ {'/Annots': IndirectObject(325, 0, 14291643280), '/Contents': [IndirectObject(391, 0, 14291643280), IndirectObject(392, 0, 14... │ └ <function PyPDFParser.extract_images_from_page at 0x16bade340> └ <langchain_community.document_loaders.parsers.pdf.PyPDFParser object at 0x34a3f6d50> File "/Users/user/venvs/webui/lib/python3.11/site-packages/langchain_community/document_loaders/parsers/pdf.py", line 460, in extract_images_from_page Image.fromarray(np_image).save(image_bytes, format="PNG") │ │ │ └ <_io.BytesIO object at 0x353a5d710> │ │ └ array([[[ 0], │ │ [ 0], │ │ [ 0], │ │ ..., │ │ [ 0], │ │ [ 0], │ │ [ 0]], │ │ │ │ [[ 1], │ │ [ 1]... │ └ <function fromarray at 0x17b237420> └ <module 'PIL.Image' from '/Users/user/venvs/webui/lib/python3.11/site-packages/PIL/Image.py'> File "/Users/user/venvs/webui/lib/python3.11/site-packages/PIL/Image.py", line 3303, in fromarray raise TypeError(msg) from e └ 'Cannot handle this data type: (1, 1, 1), |u1' TypeError: Cannot handle this data type: (1, 1, 1), |u1 2025-06-06 11:50:28.352 | ERROR | open_webui.routers.files:upload_file:181 - 400: Cannot handle this data type: (1, 1, 1), |u1 - {} Traceback (most recent call last): File "/Users/user/venvs/webui/lib/python3.11/site-packages/PIL/Image.py", line 3299, in fromarray mode, rawmode = _fromarray_typemap[typekey] │ │ └ ((1, 1, 1), '|u1') │ └ {((1, 1), '|b1'): ('1', '1;8'), ((1, 1), '|u1'): ('L', 'L'), ((1, 1), '|i1'): ('I', 'I;8'), ((1, 1), '<u2'): ('I', 'I;16'), (... └ None KeyError: ((1, 1, 1), '|u1') The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/Users/user/venvs/webui/lib/python3.11/site-packages/open_webui/routers/retrieval.py", line 1332, in process_file docs = loader.load( │ └ <function Loader.load at 0x16bb494e0> └ <open_webui.retrieval.loaders.main.Loader object at 0x16ba54290> File "/Users/user/venvs/webui/lib/python3.11/site-packages/open_webui/retrieval/loaders/main.py", line 203, in load docs = loader.load() │ └ <function BaseLoader.load at 0x16b425300> └ <langchain_community.document_loaders.pdf.PyPDFLoader object at 0x3538cc410> File "/Users/user/venvs/webui/lib/python3.11/site-packages/langchain_core/document_loaders/base.py", line 32, in load return list(self.lazy_load()) │ └ <function PyPDFLoader.lazy_load at 0x16badf880> └ <langchain_community.document_loaders.pdf.PyPDFLoader object at 0x3538cc410> File "/Users/user/venvs/webui/lib/python3.11/site-packages/langchain_community/document_loaders/pdf.py", line 305, in lazy_load yield from self.parser.lazy_parse(blob) │ │ │ └ Blob 14293046992 /Users/user/venvs/webui/lib/python3.11/site-packages/open_webui/data/uploads/1e5b6280-f4de-4b93-a4... │ │ └ <function PyPDFParser.lazy_parse at 0x16bade2a0> │ └ <langchain_community.document_loaders.parsers.pdf.PyPDFParser object at 0x34a3f6d50> └ <langchain_community.document_loaders.pdf.PyPDFLoader object at 0x3538cc410> File "/Users/user/venvs/webui/lib/python3.11/site-packages/langchain_community/document_loaders/parsers/pdf.py", line 397, in lazy_parse images_from_page = self.extract_images_from_page(page) │ │ └ {'/Annots': IndirectObject(325, 0, 14291643280), '/Contents': [IndirectObject(391, 0, 14291643280), IndirectObject(392, 0, 14... │ └ <function PyPDFParser.extract_images_from_page at 0x16bade340> └ <langchain_community.document_loaders.parsers.pdf.PyPDFParser object at 0x34a3f6d50> File "/Users/user/venvs/webui/lib/python3.11/site-packages/langchain_community/document_loaders/parsers/pdf.py", line 460, in extract_images_from_page Image.fromarray(np_image).save(image_bytes, format="PNG") │ │ │ └ <_io.BytesIO object at 0x353a5d710> │ │ └ array([[[ 0], │ │ [ 0], │ │ [ 0], │ │ ..., │ │ [ 0], │ │ [ 0], │ │ [ 0]], │ │ │ │ [[ 1], │ │ [ 1]... │ └ <function fromarray at 0x17b237420> └ <module 'PIL.Image' from '/Users/user/venvs/webui/lib/python3.11/site-packages/PIL/Image.py'> File "/Users/user/venvs/webui/lib/python3.11/site-packages/PIL/Image.py", line 3303, in fromarray raise TypeError(msg) from e └ 'Cannot handle this data type: (1, 1, 1), |u1' TypeError: Cannot handle this data type: (1, 1, 1), |u1 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1002, in _bootstrap self._bootstrap_inner() │ └ <function Thread._bootstrap_inner at 0x1024751c0> └ <WorkerThread(AnyIO worker thread, started 14422241280)> File "/opt/local/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1045, in _bootstrap_inner self.run() │ └ <function WorkerThread.run at 0x355cfb600> └ <WorkerThread(AnyIO worker thread, started 14422241280)> File "/Users/user/venvs/webui/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 967, in run result = context.run(func, *args) │ │ │ └ () │ │ └ functools.partial(<function upload_file at 0x1690b9120>, user=UserModel(id='d1610ef6-173c-42df-b591-2d4afc13b308', name='John... │ └ <method 'run' of '_contextvars.Context' objects> └ <_contextvars.Context object at 0x3539d3a00> > File "/Users/user/venvs/webui/lib/python3.11/site-packages/open_webui/routers/files.py", line 172, in upload_file process_file(request, ProcessFileForm(file_id=id), user=user) │ │ │ │ └ UserModel(id='d1610ef6-173c-42df-b591-2d4afc13b308', name='John Doe', email='johndoe@gmail.com', role='admin', profile_im... │ │ │ └ '1e5b6280-f4de-4b93-a467-728881b29236' │ │ └ <class 'open_webui.routers.retrieval.ProcessFileForm'> │ └ <starlette.requests.Request object at 0x353d90290> └ <function process_file at 0x16cb7ce00> File "/Users/user/venvs/webui/lib/python3.11/site-packages/open_webui/routers/retrieval.py", line 1420, in process_file raise HTTPException( └ <class 'fastapi.exceptions.HTTPException'> fastapi.exceptions.HTTPException: 400: Cannot handle this data type: (1, 1, 1), |u1 2025-06-06 11:50:28.353 | ERROR | open_webui.routers.files:upload_file:182 - Error processing file: 1e5b6280-f4de-4b93-a467-728881b29236 - {} ``` ### Additional Information _No response_
GiteaMirror added the bug label 2026-04-19 23:05:02 -05:00
Author
Owner

@tjbck commented on GitHub (Jun 6, 2025):

Tika is recommended.

<!-- gh-comment-id:2948938372 --> @tjbck commented on GitHub (Jun 6, 2025): Tika is recommended.
Author
Owner

@astroboylrx commented on GitHub (Jun 6, 2025):

Okay, let me rephrase.
Would it be possible to allow users to point PDF to an external loader but still pass other types of documents to the default loader?

<!-- gh-comment-id:2948958489 --> @astroboylrx commented on GitHub (Jun 6, 2025): Okay, let me rephrase. Would it be possible to allow users to point PDF to an external loader but still pass other types of documents to the default loader?
Author
Owner

@mykola-mmm commented on GitHub (Jun 27, 2025):

@tjbck @astroboylrx I have encountered the same issue when working with the internal documentation of my company, updating the langchain/lagchain-community versions to the newest one helped

<!-- gh-comment-id:3012972688 --> @mykola-mmm commented on GitHub (Jun 27, 2025): @tjbck @astroboylrx I have encountered the same issue when working with the internal documentation of my company, updating the langchain/lagchain-community versions to the newest one helped
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#17348