mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 02:48:13 -05:00
Documents not accepted for upload #2096
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @pcrossleyAC on GitHub (Sep 17, 2024).
Bug Report
Installation Method
docker command from Open WebUI
Environment
Open WebUI Version: latest version as of Sept 16 v0.3.21
Ollama (if applicable): also the latest version from their website
Operating System: Windows 11
Browser (if applicable): Chrome, Edge
Confirmation:
Expected Behavior:
Subsequent to an apparently successful install of Ollama, Docker, Open WebUI, in that order, I can launch localhost:3000 and converse with the LLM. I expect to be able to upload a number of different types of documents, .txt, .docx, .pdf etc..
Actual Behavior:
I receive an error message when uploading ANY document, whether through the chat interface, or through the documents upload functionality in Workspaces. I receive an error and the document is not uploaded.
Description
Bug Summary:
When attempting to upload a pdf document via chat or Workspace, a 500 error for PDFs, " The content provided is empty. Please ensure that there is text or data present before proceeding." is generated. Also, the message on MSWord docs is, ""There is no item named 'word/document.xml' in the archive"
Reproduction Details
Steps to Reproduce:
install on a new computer running Windows 11 Home. Installed basic applications (Office 365 etc). and Nvidia drivers. Then install Ollama .exe. Pull llama3.1:latest through Ollama CLI. Install Docker Desktop and accept default install options (recommended options). From command prompt run docker command "docker run -d -p 3000:8080 --gpus all --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:cuda" Then launch localhost:3000. Create new account (admin) and attempt to upload PDF or any document.
Logs and Screenshots
Browser Console Logs:
[Include relevant browser console logs, if applicable]
Docker Container Logs:
INFO [open_webui.apps.webui.routers.files] file.content_type: application/pdf
2024-09-16 19:06:21 ERROR [open_webui.apps.rag.main] 500: The content provided is empty. Please ensure that there is text or data present before proceeding.
2024-09-16 19:06:21 Traceback (most recent call last):
2024-09-16 19:06:21 File "/app/backend/open_webui/apps/rag/main.py", line 1271, in process_doc
2024-09-16 19:06:21 result = store_data_in_vector_db(
2024-09-16 19:06:21 ^^^^^^^^^^^^^^^^^^^^^^^^
2024-09-16 19:06:21 File "/app/backend/open_webui/apps/rag/main.py", line 969, in store_data_in_vector_db
2024-09-16 19:06:21 raise ValueError(ERROR_MESSAGES.EMPTY_CONTENT)
2024-09-16 19:06:21 ValueError: The content provided is empty. Please ensure that there is text or data present before proceeding.
2024-09-16 19:06:21
2024-09-16 19:06:21 During handling of the above exception, another exception occurred:
2024-09-16 19:06:21
2024-09-16 19:06:21 Traceback (most recent call last):
2024-09-16 19:06:21 File "/app/backend/open_webui/apps/rag/main.py", line 1288, in process_doc
2024-09-16 19:06:21 raise HTTPException(
2024-09-16 19:06:21 fastapi.exceptions.HTTPException: 500: The content provided is empty. Please ensure that there is text or data present before proceeding.
2024-09-16 19:48:23 Traceback (most recent call last):
2024-09-16 19:48:23 File "/app/backend/open_webui/apps/rag/main.py", line 1288, in process_doc
2024-09-16 19:48:23 raise HTTPException(
2024-09-16 19:48:23 fastapi.exceptions.HTTPException: 500: The content provided is empty. Please ensure that there is text or data present before proceeding.
2024-09-16 19:48:52 INFO [open_webui.apps.webui.routers.files] file.content_type: application/msword
2024-09-16 19:48:52 ERROR [open_webui.apps.rag.main] "There is no item named 'word/document.xml' in the archive"
2024-09-16 19:48:52 Traceback (most recent call last):
2024-09-16 19:48:52 File "/app/backend/open_webui/apps/rag/main.py", line 1268, in process_doc
2024-09-16 19:48:52 data = loader.load()
2024-09-16 19:48:52 ^^^^^^^^^^^^^
2024-09-16 19:48:52 File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/word_document.py", line 57, in load
2024-09-16 19:48:52 page_content=docx2txt.process(self.file_path),
2024-09-16 19:48:52 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-09-16 19:48:52 File "/usr/local/lib/python3.11/site-packages/docx2txt/docx2txt.py", line 88, in process
2024-09-16 19:48:52 text += xml2text(zipf.read(doc_xml))
2024-09-16 19:48:52 ^^^^^^^^^^^^^^^^^^
2024-09-16 19:48:52 File "/usr/local/lib/python3.11/zipfile.py", line 1527, in read
2024-09-16 19:48:52 with self.open(name, "r", pwd) as fp:
2024-09-16 19:48:52 ^^^^^^^^^^^^^^^^^^^^^^^^^
2024-09-16 19:48:52 File "/usr/local/lib/python3.11/zipfile.py", line 1564, in open
2024-09-16 19:48:52 zinfo = self.getinfo(name)
2024-09-16 19:48:52 ^^^^^^^^^^^^^^^^^^
2024-09-16 19:48:52 File "/usr/local/lib/python3.11/zipfile.py", line 1493, in getinfo
2024-09-16 19:48:52 raise KeyError(
2024-09-16 19:48:52 KeyError: "There is no item named 'word/document.xml' in the archive"
2024-09-16 19:49:25 INFO [open_webui.apps.webui.routers.files] file.content_type: application/msword
2024-09-16 19:49:25 ERROR [open_webui.apps.rag.main] "There is no item named 'word/document.xml' in the archive"
2024-09-16 19:49:25 Traceback (most recent call last):
2024-09-16 19:49:25 File "/app/backend/open_webui/apps/rag/main.py", line 1268, in process_doc
2024-09-16 19:49:25 data = loader.load()
2024-09-16 19:49:25 ^^^^^^^^^^^^^
2024-09-16 19:49:25 File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/word_document.py", line 57, in load
2024-09-16 19:49:25 page_content=docx2txt.process(self.file_path),
2024-09-16 19:49:25 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-09-16 19:49:25 File "/usr/local/lib/python3.11/site-packages/docx2txt/docx2txt.py", line 88, in process
2024-09-16 19:49:25 text += xml2text(zipf.read(doc_xml))
2024-09-16 19:49:25 ^^^^^^^^^^^^^^^^^^
2024-09-16 19:49:25 File "/usr/local/lib/python3.11/zipfile.py", line 1527, in read
2024-09-16 19:49:25 with self.open(name, "r", pwd) as fp:
2024-09-16 19:49:25 ^^^^^^^^^^^^^^^^^^^^^^^^^
2024-09-16 19:49:25 File "/usr/local/lib/python3.11/zipfile.py", line 1564, in open
2024-09-16 19:49:25 zinfo = self.getinfo(name)
2024-09-16 19:49:25 ^^^^^^^^^^^^^^^^^^
2024-09-16 19:49:25 File "/usr/local/lib/python3.11/zipfile.py", line 1493, in getinfo
2024-09-16 19:49:25 raise KeyError(
2024-09-16 19:49:25 KeyError: "There is no item named 'word/document.xml' in the archive"
2024-09-16 19:50:04 INFO [open_webui.apps.webui.routers.files] file.content_type: application/msword
2024-09-16 19:50:04 ERROR [open_webui.apps.rag.main] "There is no item named 'word/document.xml' in the archive"
2024-09-16 19:50:04 Traceback (most recent call last):
2024-09-16 19:50:04 File "/app/backend/open_webui/apps/rag/main.py", line 1268, in process_doc
2024-09-16 19:50:04 data = loader.load()
2024-09-16 19:50:04 ^^^^^^^^^^^^^
2024-09-16 19:50:04 File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/word_document.py", line 57, in load
2024-09-16 19:50:04 page_content=docx2txt.process(self.file_path),
2024-09-16 19:50:04 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-09-16 19:50:04 File "/usr/local/lib/python3.11/site-packages/docx2txt/docx2txt.py", line 88, in process
2024-09-16 19:50:04 text += xml2text(zipf.read(doc_xml))
2024-09-16 19:50:04 ^^^^^^^^^^^^^^^^^^
2024-09-16 19:50:04 File "/usr/local/lib/python3.11/zipfile.py", line 1527, in read
2024-09-16 19:50:04 with self.open(name, "r", pwd) as fp:
2024-09-16 19:50:04 ^^^^^^^^^^^^^^^^^^^^^^^^^
2024-09-16 19:50:04 File "/usr/local/lib/python3.11/zipfile.py", line 1564, in open
2024-09-16 19:50:04 zinfo = self.getinfo(name)
2024-09-16 19:50:04 ^^^^^^^^^^^^^^^^^^
2024-09-16 19:50:04 File "/usr/local/lib/python3.11/zipfile.py", line 1493, in getinfo
2024-09-16 19:50:04 raise KeyError(
2024-09-16 19:50:04 KeyError: "There is no item named 'word/document.xml' in the archive"
Screenshots/Screen Recordings (if applicable):
[Attach any relevant screenshots to help illustrate the issue]
Additional Information
I can access the open WebUI interface from a networked computer using internal IP, and I can SUCCESSFULLY upload a PDF or other document from that computer. I cannot do this on the local machine, where the Ollama, Docker and Open WebUI is running. Strikes me it may be a permissions issue? Also, locally I can access Open WebUI on both 3000 and 8080. I will confess that when I couldn't get this working, I installed via WSL and Ubuntu...still with the same problem. I removed the WSL Ubuntu image and reverted back to a Windows install. Perhaps this is now contributing to the app being accessible on both ports, but the PDF/Documents issue existed before this. Thanks for your help! I did scour the internet to see if others have the same problem, but found nothing!
Note
If the bug report is incomplete or does not follow the provided instructions, it may not be addressed. Please ensure that you have followed the steps outlined in the README.md and troubleshooting.md documents, and provide all necessary information for us to reproduce and address the issue. Thank you!