mirror of
https://github.com/open-webui/open-webui.git
synced 2026-03-24 20:14:58 -05:00
Upload file error Reshape #1725
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @cpolcino on GitHub (Aug 8, 2024).
Bug Report
During the processing of a PDF document, an error occurs when attempting to reshape an image array. The system is unable to reshape an array of size 294 into the required shape of (300, 152, newaxis).
Installation Method**
Direct installation, not Docker, pip
Environment
Open WebUI Version: v0.3.11
Ollama (if applicable): v0.2.0
Operating System: Ubuntu 20.04
Browser (if applicable): Chrome 100.0
Confirmation:
Expected Behavior:
The file should be uploaded in the section Documents without problem
Actual Behavior:
The process halts with a ValueError when trying to reshape the image data extracted from the PDF.
Description
Bug Summary:
[Provide a brief but clear summary of the bug]
Reproduction Details
Embedding model: sentence-transformersparaphrase-multilingual-MiniLM-L12-v2
Reranker:BBAI-reranker-v2-m3
upload in documents the attach file "ESA6march2009"
Steps to Reproduce:
embedding model
Logs and Screenshots
Something went wrong :/ cannot reshape array of size 294 into shape (300,152,newaxis)
ERROR:apps.rag.main:cannot reshape array of size 294 into shape (300,152,newaxis)
Traceback (most recent call last):
File "/home/user/.pyenv/versions/3.11.9/lib/python3.11/site-packages/apps/rag/main.py", line 1241, in process_doc
data = loader.load()
^^^^^^^^^^^^^
File "/home/user/.pyenv/versions/3.11.9/lib/python3.11/site-packages/langchain_core/document_loaders/base.py", line 30, in load
return list(self.lazy_load())
^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.pyenv/versions/3.11.9/lib/python3.11/site-packages/langchain_community/document_loaders/pdf.py", line 202, in lazy_load
yield from self.parser.parse(blob)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.pyenv/versions/3.11.9/lib/python3.11/site-packages/langchain_core/document_loaders/base.py", line 126, in parse
return list(self.lazy_parse(blob))
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.pyenv/versions/3.11.9/lib/python3.11/site-packages/langchain_community/document_loaders/parsers/pdf.py", line 124, in lazy_parse
yield from [
^
File "/home/user/.pyenv/versions/3.11.9/lib/python3.11/site-packages/langchain_community/document_loaders/parsers/pdf.py", line 127, in
+ self._extract_images_from_page(page),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.pyenv/versions/3.11.9/lib/python3.11/site-packages/langchain_community/document_loaders/parsers/pdf.py", line 146, in _extract_images_from_page
np.frombuffer(xObject[obj].get_data(), dtype=np.uint8).reshape(
ValueError: cannot reshape array of size 294 into shape (300,152,newaxis)
INFO: 127.0.0.1:54686 - "POST /rag/api/v1/process/doc HTTP/1.1" 400 Bad Request
Additional Information
The error occurs while using the langchain library for processing PDF documents.
The issue appears to be related to image extraction and processing from the PDF.
The system is attempting to reshape an array of size 294 into a shape of (300, 152, newaxis), which is not mathematically possible.
This resulted in a 400 Bad Request response for the POST request to "/rag/api/v1/process/doc".
ECSS-Q-ST-70C_(6March2009).pdf
ECSS-Q-ST-70C_(6March2009).pdf