Upload file error Reshape #1725

New Issue

GiteaMirror · 2025-11-11T14:50:58-06:00

GiteaMirror commented

2025-11-11 14:50:58 -06:00

Originally created by @cpolcino on GitHub (Aug 8, 2024).

Bug Report

During the processing of a PDF document, an error occurs when attempting to reshape an image array. The system is unable to reshape an array of size 294 into the required shape of (300, 152, newaxis).

Installation Method**

Direct installation, not Docker, pip

Environment

Open WebUI Version: v0.3.11
Ollama (if applicable): v0.2.0
Operating System: Ubuntu 20.04
Browser (if applicable): Chrome 100.0

Confirmation:

[x ] I have read and followed all the instructions provided in the README.md.
[x ] I am on the latest version of both Open WebUI and Ollama.
[ x] I have included the browser console logs.
I have included the Docker container logs.
I have provided the exact steps to reproduce the bug in the "Steps to Reproduce" section below.

Expected Behavior:

The file should be uploaded in the section Documents without problem

Actual Behavior:

The process halts with a ValueError when trying to reshape the image data extracted from the PDF.

Description

Bug Summary:
[Provide a brief but clear summary of the bug]

Reproduction Details

Embedding model: sentence-transformersparaphrase-multilingual-MiniLM-L12-v2
Reranker:BBAI-reranker-v2-m3
upload in documents the attach file "ESA6march2009"

Steps to Reproduce:
embedding model

Logs and Screenshots

Something went wrong :/ cannot reshape array of size 294 into shape (300,152,newaxis)
ERROR:apps.rag.main:cannot reshape array of size 294 into shape (300,152,newaxis)
Traceback (most recent call last):
File "/home/user/.pyenv/versions/3.11.9/lib/python3.11/site-packages/apps/rag/main.py", line 1241, in process_doc
data = loader.load()
^^^^^^^^^^^^^
File "/home/user/.pyenv/versions/3.11.9/lib/python3.11/site-packages/langchain_core/document_loaders/base.py", line 30, in load
return list(self.lazy_load())
^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.pyenv/versions/3.11.9/lib/python3.11/site-packages/langchain_community/document_loaders/pdf.py", line 202, in lazy_load
yield from self.parser.parse(blob)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.pyenv/versions/3.11.9/lib/python3.11/site-packages/langchain_core/document_loaders/base.py", line 126, in parse
return list(self.lazy_parse(blob))
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.pyenv/versions/3.11.9/lib/python3.11/site-packages/langchain_community/document_loaders/parsers/pdf.py", line 124, in lazy_parse
yield from [
^
File "/home/user/.pyenv/versions/3.11.9/lib/python3.11/site-packages/langchain_community/document_loaders/parsers/pdf.py", line 127, in
+ self._extract_images_from_page(page),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.pyenv/versions/3.11.9/lib/python3.11/site-packages/langchain_community/document_loaders/parsers/pdf.py", line 146, in _extract_images_from_page
np.frombuffer(xObject[obj].get_data(), dtype=np.uint8).reshape(
ValueError: cannot reshape array of size 294 into shape (300,152,newaxis)
INFO: 127.0.0.1:54686 - "POST /rag/api/v1/process/doc HTTP/1.1" 400 Bad Request

Additional Information

The error occurs while using the langchain library for processing PDF documents.
The issue appears to be related to image extraction and processing from the PDF.
The system is attempting to reshape an array of size 294 into a shape of (300, 152, newaxis), which is not mathematically possible.
This resulted in a 400 Bad Request response for the POST request to "/rag/api/v1/process/doc".
ECSS-Q-ST-70C_(6March2009).pdf
ECSS-Q-ST-70C_(6March2009).pdf

Originally created by @cpolcino on GitHub (Aug 8, 2024). # Bug Report During the processing of a PDF document, an error occurs when attempting to reshape an image array. The system is unable to reshape an array of size 294 into the required shape of (300, 152, newaxis). ## Installation Method** Direct installation, not Docker, pip ## Environment Open WebUI Version: v0.3.11 Ollama (if applicable): v0.2.0 Operating System: Ubuntu 20.04 Browser (if applicable): Chrome 100.0 **Confirmation:** - [x ] I have read and followed all the instructions provided in the README.md. - [x ] I am on the latest version of both Open WebUI and Ollama. - [ x] I have included the browser console logs. - [ ] I have included the Docker container logs. - [ ] I have provided the exact steps to reproduce the bug in the "Steps to Reproduce" section below. ## Expected Behavior: The file should be uploaded in the section Documents without problem ## Actual Behavior: The process halts with a ValueError when trying to reshape the image data extracted from the PDF. ## Description ![Schermata del 2024-08-08 11-11-52](https://github.com/user-attachments/assets/3a1db0c1-817a-436e-98a9-35c19cb17054) **Bug Summary:** [Provide a brief but clear summary of the bug] ## Reproduction Details Embedding model: sentence-transformersparaphrase-multilingual-MiniLM-L12-v2 Reranker:BBAI-reranker-v2-m3 upload in documents the attach file "ESA6march2009" **Steps to Reproduce:** embedding model ## Logs and Screenshots Something went wrong :/ cannot reshape array of size 294 into shape (300,152,newaxis) ERROR:apps.rag.main:cannot reshape array of size 294 into shape (300,152,newaxis) Traceback (most recent call last): File "/home/user/.pyenv/versions/3.11.9/lib/python3.11/site-packages/apps/rag/main.py", line 1241, in process_doc data = loader.load() ^^^^^^^^^^^^^ File "/home/user/.pyenv/versions/3.11.9/lib/python3.11/site-packages/langchain_core/document_loaders/base.py", line 30, in load return list(self.lazy_load()) ^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/.pyenv/versions/3.11.9/lib/python3.11/site-packages/langchain_community/document_loaders/pdf.py", line 202, in lazy_load yield from self.parser.parse(blob) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/.pyenv/versions/3.11.9/lib/python3.11/site-packages/langchain_core/document_loaders/base.py", line 126, in parse return list(self.lazy_parse(blob)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/.pyenv/versions/3.11.9/lib/python3.11/site-packages/langchain_community/document_loaders/parsers/pdf.py", line 124, in lazy_parse yield from [ ^ File "/home/user/.pyenv/versions/3.11.9/lib/python3.11/site-packages/langchain_community/document_loaders/parsers/pdf.py", line 127, in <listcomp> + self._extract_images_from_page(page), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/.pyenv/versions/3.11.9/lib/python3.11/site-packages/langchain_community/document_loaders/parsers/pdf.py", line 146, in _extract_images_from_page np.frombuffer(xObject[obj].get_data(), dtype=np.uint8).reshape( ValueError: cannot reshape array of size 294 into shape (300,152,newaxis) INFO: 127.0.0.1:54686 - "POST /rag/api/v1/process/doc HTTP/1.1" 400 Bad Request Additional Information The error occurs while using the langchain library for processing PDF documents. The issue appears to be related to image extraction and processing from the PDF. The system is attempting to reshape an array of size 294 into a shape of (300, 152, newaxis), which is not mathematically possible. This resulted in a 400 Bad Request response for the POST request to "/rag/api/v1/process/doc". [ECSS-Q-ST-70C_(6March2009).pdf](https://github.com/user-attachments/files/16540630/ECSS-Q-ST-70C_.6March2009.pdf) [ECSS-Q-ST-70C_(6March2009).pdf](https://github.com/user-attachments/files/16540633/ECSS-Q-ST-70C_.6March2009.pdf)

GiteaMirror closed this issue