Documents not accepted for upload #2096

Closed
opened 2025-11-11 15:00:13 -06:00 by GiteaMirror · 0 comments
Owner

Originally created by @pcrossleyAC on GitHub (Sep 17, 2024).

Bug Report

Installation Method

docker command from Open WebUI

Environment

  • Open WebUI Version: latest version as of Sept 16 v0.3.21

  • Ollama (if applicable): also the latest version from their website

  • Operating System: Windows 11

  • Browser (if applicable): Chrome, Edge

Confirmation:

  • I have read and followed all the instructions provided in the README.md.
  • I am on the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided the exact steps to reproduce the bug in the "Steps to Reproduce" section below.

Expected Behavior:

Subsequent to an apparently successful install of Ollama, Docker, Open WebUI, in that order, I can launch localhost:3000 and converse with the LLM. I expect to be able to upload a number of different types of documents, .txt, .docx, .pdf etc..

Actual Behavior:

I receive an error message when uploading ANY document, whether through the chat interface, or through the documents upload functionality in Workspaces. I receive an error and the document is not uploaded.

Description

Bug Summary:
When attempting to upload a pdf document via chat or Workspace, a 500 error for PDFs, " The content provided is empty. Please ensure that there is text or data present before proceeding." is generated. Also, the message on MSWord docs is, ""There is no item named 'word/document.xml' in the archive"

Reproduction Details

Steps to Reproduce:
install on a new computer running Windows 11 Home. Installed basic applications (Office 365 etc). and Nvidia drivers. Then install Ollama .exe. Pull llama3.1:latest through Ollama CLI. Install Docker Desktop and accept default install options (recommended options). From command prompt run docker command "docker run -d -p 3000:8080 --gpus all --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:cuda" Then launch localhost:3000. Create new account (admin) and attempt to upload PDF or any document.

Logs and Screenshots

Browser Console Logs:
[Include relevant browser console logs, if applicable]

Docker Container Logs:
INFO [open_webui.apps.webui.routers.files] file.content_type: application/pdf
2024-09-16 19:06:21 ERROR [open_webui.apps.rag.main] 500: The content provided is empty. Please ensure that there is text or data present before proceeding.
2024-09-16 19:06:21 Traceback (most recent call last):
2024-09-16 19:06:21 File "/app/backend/open_webui/apps/rag/main.py", line 1271, in process_doc
2024-09-16 19:06:21 result = store_data_in_vector_db(
2024-09-16 19:06:21 ^^^^^^^^^^^^^^^^^^^^^^^^
2024-09-16 19:06:21 File "/app/backend/open_webui/apps/rag/main.py", line 969, in store_data_in_vector_db
2024-09-16 19:06:21 raise ValueError(ERROR_MESSAGES.EMPTY_CONTENT)
2024-09-16 19:06:21 ValueError: The content provided is empty. Please ensure that there is text or data present before proceeding.
2024-09-16 19:06:21
2024-09-16 19:06:21 During handling of the above exception, another exception occurred:
2024-09-16 19:06:21
2024-09-16 19:06:21 Traceback (most recent call last):
2024-09-16 19:06:21 File "/app/backend/open_webui/apps/rag/main.py", line 1288, in process_doc
2024-09-16 19:06:21 raise HTTPException(
2024-09-16 19:06:21 fastapi.exceptions.HTTPException: 500: The content provided is empty. Please ensure that there is text or data present before proceeding.

2024-09-16 19:48:23 Traceback (most recent call last):
2024-09-16 19:48:23 File "/app/backend/open_webui/apps/rag/main.py", line 1288, in process_doc
2024-09-16 19:48:23 raise HTTPException(
2024-09-16 19:48:23 fastapi.exceptions.HTTPException: 500: The content provided is empty. Please ensure that there is text or data present before proceeding.
2024-09-16 19:48:52 INFO [open_webui.apps.webui.routers.files] file.content_type: application/msword
2024-09-16 19:48:52 ERROR [open_webui.apps.rag.main] "There is no item named 'word/document.xml' in the archive"
2024-09-16 19:48:52 Traceback (most recent call last):
2024-09-16 19:48:52 File "/app/backend/open_webui/apps/rag/main.py", line 1268, in process_doc
2024-09-16 19:48:52 data = loader.load()
2024-09-16 19:48:52 ^^^^^^^^^^^^^
2024-09-16 19:48:52 File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/word_document.py", line 57, in load
2024-09-16 19:48:52 page_content=docx2txt.process(self.file_path),
2024-09-16 19:48:52 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-09-16 19:48:52 File "/usr/local/lib/python3.11/site-packages/docx2txt/docx2txt.py", line 88, in process
2024-09-16 19:48:52 text += xml2text(zipf.read(doc_xml))
2024-09-16 19:48:52 ^^^^^^^^^^^^^^^^^^
2024-09-16 19:48:52 File "/usr/local/lib/python3.11/zipfile.py", line 1527, in read
2024-09-16 19:48:52 with self.open(name, "r", pwd) as fp:
2024-09-16 19:48:52 ^^^^^^^^^^^^^^^^^^^^^^^^^
2024-09-16 19:48:52 File "/usr/local/lib/python3.11/zipfile.py", line 1564, in open
2024-09-16 19:48:52 zinfo = self.getinfo(name)
2024-09-16 19:48:52 ^^^^^^^^^^^^^^^^^^
2024-09-16 19:48:52 File "/usr/local/lib/python3.11/zipfile.py", line 1493, in getinfo
2024-09-16 19:48:52 raise KeyError(
2024-09-16 19:48:52 KeyError: "There is no item named 'word/document.xml' in the archive"
2024-09-16 19:49:25 INFO [open_webui.apps.webui.routers.files] file.content_type: application/msword
2024-09-16 19:49:25 ERROR [open_webui.apps.rag.main] "There is no item named 'word/document.xml' in the archive"
2024-09-16 19:49:25 Traceback (most recent call last):
2024-09-16 19:49:25 File "/app/backend/open_webui/apps/rag/main.py", line 1268, in process_doc
2024-09-16 19:49:25 data = loader.load()
2024-09-16 19:49:25 ^^^^^^^^^^^^^
2024-09-16 19:49:25 File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/word_document.py", line 57, in load
2024-09-16 19:49:25 page_content=docx2txt.process(self.file_path),
2024-09-16 19:49:25 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-09-16 19:49:25 File "/usr/local/lib/python3.11/site-packages/docx2txt/docx2txt.py", line 88, in process
2024-09-16 19:49:25 text += xml2text(zipf.read(doc_xml))
2024-09-16 19:49:25 ^^^^^^^^^^^^^^^^^^
2024-09-16 19:49:25 File "/usr/local/lib/python3.11/zipfile.py", line 1527, in read
2024-09-16 19:49:25 with self.open(name, "r", pwd) as fp:
2024-09-16 19:49:25 ^^^^^^^^^^^^^^^^^^^^^^^^^
2024-09-16 19:49:25 File "/usr/local/lib/python3.11/zipfile.py", line 1564, in open
2024-09-16 19:49:25 zinfo = self.getinfo(name)
2024-09-16 19:49:25 ^^^^^^^^^^^^^^^^^^
2024-09-16 19:49:25 File "/usr/local/lib/python3.11/zipfile.py", line 1493, in getinfo
2024-09-16 19:49:25 raise KeyError(
2024-09-16 19:49:25 KeyError: "There is no item named 'word/document.xml' in the archive"
2024-09-16 19:50:04 INFO [open_webui.apps.webui.routers.files] file.content_type: application/msword
2024-09-16 19:50:04 ERROR [open_webui.apps.rag.main] "There is no item named 'word/document.xml' in the archive"
2024-09-16 19:50:04 Traceback (most recent call last):
2024-09-16 19:50:04 File "/app/backend/open_webui/apps/rag/main.py", line 1268, in process_doc
2024-09-16 19:50:04 data = loader.load()
2024-09-16 19:50:04 ^^^^^^^^^^^^^
2024-09-16 19:50:04 File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/word_document.py", line 57, in load
2024-09-16 19:50:04 page_content=docx2txt.process(self.file_path),
2024-09-16 19:50:04 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-09-16 19:50:04 File "/usr/local/lib/python3.11/site-packages/docx2txt/docx2txt.py", line 88, in process
2024-09-16 19:50:04 text += xml2text(zipf.read(doc_xml))
2024-09-16 19:50:04 ^^^^^^^^^^^^^^^^^^
2024-09-16 19:50:04 File "/usr/local/lib/python3.11/zipfile.py", line 1527, in read
2024-09-16 19:50:04 with self.open(name, "r", pwd) as fp:
2024-09-16 19:50:04 ^^^^^^^^^^^^^^^^^^^^^^^^^
2024-09-16 19:50:04 File "/usr/local/lib/python3.11/zipfile.py", line 1564, in open
2024-09-16 19:50:04 zinfo = self.getinfo(name)
2024-09-16 19:50:04 ^^^^^^^^^^^^^^^^^^
2024-09-16 19:50:04 File "/usr/local/lib/python3.11/zipfile.py", line 1493, in getinfo
2024-09-16 19:50:04 raise KeyError(
2024-09-16 19:50:04 KeyError: "There is no item named 'word/document.xml' in the archive"

Screenshots/Screen Recordings (if applicable):
[Attach any relevant screenshots to help illustrate the issue]

Additional Information

I can access the open WebUI interface from a networked computer using internal IP, and I can SUCCESSFULLY upload a PDF or other document from that computer. I cannot do this on the local machine, where the Ollama, Docker and Open WebUI is running. Strikes me it may be a permissions issue? Also, locally I can access Open WebUI on both 3000 and 8080. I will confess that when I couldn't get this working, I installed via WSL and Ubuntu...still with the same problem. I removed the WSL Ubuntu image and reverted back to a Windows install. Perhaps this is now contributing to the app being accessible on both ports, but the PDF/Documents issue existed before this. Thanks for your help! I did scour the internet to see if others have the same problem, but found nothing!

Note

If the bug report is incomplete or does not follow the provided instructions, it may not be addressed. Please ensure that you have followed the steps outlined in the README.md and troubleshooting.md documents, and provide all necessary information for us to reproduce and address the issue. Thank you!

Originally created by @pcrossleyAC on GitHub (Sep 17, 2024). # Bug Report ## Installation Method docker command from Open WebUI ## Environment - **Open WebUI Version: latest version as of Sept 16 v0.3.21** - **Ollama (if applicable): also the latest version from their website** - **Operating System:** Windows 11 - **Browser (if applicable):** Chrome, Edge **Confirmation:** - [X] I have read and followed all the instructions provided in the README.md. - [X] I am on the latest version of both Open WebUI and Ollama. - [ ] I have included the browser console logs. - [X] I have included the Docker container logs. - [X] I have provided the exact steps to reproduce the bug in the "Steps to Reproduce" section below. ## Expected Behavior: Subsequent to an apparently successful install of Ollama, Docker, Open WebUI, in that order, I can launch localhost:3000 and converse with the LLM. I expect to be able to upload a number of different types of documents, .txt, .docx, .pdf etc.. ## Actual Behavior: I receive an error message when uploading ANY document, whether through the chat interface, or through the documents upload functionality in Workspaces. I receive an error and the document is not uploaded. ## Description **Bug Summary:** When attempting to upload a pdf document via chat or Workspace, a 500 error for PDFs, " The content provided is empty. Please ensure that there is text or data present before proceeding." is generated. Also, the message on MSWord docs is, ""There is no item named 'word/document.xml' in the archive" ## Reproduction Details **Steps to Reproduce:** install on a new computer running Windows 11 Home. Installed basic applications (Office 365 etc). and Nvidia drivers. Then install Ollama .exe. Pull llama3.1:latest through Ollama CLI. Install Docker Desktop and accept default install options (recommended options). From command prompt run docker command "docker run -d -p 3000:8080 --gpus all --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:cuda" Then launch localhost:3000. Create new account (admin) and attempt to upload PDF or any document. ## Logs and Screenshots **Browser Console Logs:** [Include relevant browser console logs, if applicable] **Docker Container Logs:** INFO [open_webui.apps.webui.routers.files] file.content_type: application/pdf 2024-09-16 19:06:21 ERROR [open_webui.apps.rag.main] 500: The content provided is empty. Please ensure that there is text or data present before proceeding. 2024-09-16 19:06:21 Traceback (most recent call last): 2024-09-16 19:06:21 File "/app/backend/open_webui/apps/rag/main.py", line 1271, in process_doc 2024-09-16 19:06:21 result = store_data_in_vector_db( 2024-09-16 19:06:21 ^^^^^^^^^^^^^^^^^^^^^^^^ 2024-09-16 19:06:21 File "/app/backend/open_webui/apps/rag/main.py", line 969, in store_data_in_vector_db 2024-09-16 19:06:21 raise ValueError(ERROR_MESSAGES.EMPTY_CONTENT) 2024-09-16 19:06:21 ValueError: The content provided is empty. Please ensure that there is text or data present before proceeding. 2024-09-16 19:06:21 2024-09-16 19:06:21 During handling of the above exception, another exception occurred: 2024-09-16 19:06:21 2024-09-16 19:06:21 Traceback (most recent call last): 2024-09-16 19:06:21 File "/app/backend/open_webui/apps/rag/main.py", line 1288, in process_doc 2024-09-16 19:06:21 raise HTTPException( 2024-09-16 19:06:21 fastapi.exceptions.HTTPException: 500: The content provided is empty. Please ensure that there is text or data present before proceeding. 2024-09-16 19:48:23 Traceback (most recent call last): 2024-09-16 19:48:23 File "/app/backend/open_webui/apps/rag/main.py", line 1288, in process_doc 2024-09-16 19:48:23 raise HTTPException( 2024-09-16 19:48:23 fastapi.exceptions.HTTPException: 500: The content provided is empty. Please ensure that there is text or data present before proceeding. 2024-09-16 19:48:52 INFO [open_webui.apps.webui.routers.files] file.content_type: application/msword 2024-09-16 19:48:52 ERROR [open_webui.apps.rag.main] "There is no item named 'word/document.xml' in the archive" 2024-09-16 19:48:52 Traceback (most recent call last): 2024-09-16 19:48:52 File "/app/backend/open_webui/apps/rag/main.py", line 1268, in process_doc 2024-09-16 19:48:52 data = loader.load() 2024-09-16 19:48:52 ^^^^^^^^^^^^^ 2024-09-16 19:48:52 File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/word_document.py", line 57, in load 2024-09-16 19:48:52 page_content=docx2txt.process(self.file_path), 2024-09-16 19:48:52 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-09-16 19:48:52 File "/usr/local/lib/python3.11/site-packages/docx2txt/docx2txt.py", line 88, in process 2024-09-16 19:48:52 text += xml2text(zipf.read(doc_xml)) 2024-09-16 19:48:52 ^^^^^^^^^^^^^^^^^^ 2024-09-16 19:48:52 File "/usr/local/lib/python3.11/zipfile.py", line 1527, in read 2024-09-16 19:48:52 with self.open(name, "r", pwd) as fp: 2024-09-16 19:48:52 ^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-09-16 19:48:52 File "/usr/local/lib/python3.11/zipfile.py", line 1564, in open 2024-09-16 19:48:52 zinfo = self.getinfo(name) 2024-09-16 19:48:52 ^^^^^^^^^^^^^^^^^^ 2024-09-16 19:48:52 File "/usr/local/lib/python3.11/zipfile.py", line 1493, in getinfo 2024-09-16 19:48:52 raise KeyError( 2024-09-16 19:48:52 KeyError: "There is no item named 'word/document.xml' in the archive" 2024-09-16 19:49:25 INFO [open_webui.apps.webui.routers.files] file.content_type: application/msword 2024-09-16 19:49:25 ERROR [open_webui.apps.rag.main] "There is no item named 'word/document.xml' in the archive" 2024-09-16 19:49:25 Traceback (most recent call last): 2024-09-16 19:49:25 File "/app/backend/open_webui/apps/rag/main.py", line 1268, in process_doc 2024-09-16 19:49:25 data = loader.load() 2024-09-16 19:49:25 ^^^^^^^^^^^^^ 2024-09-16 19:49:25 File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/word_document.py", line 57, in load 2024-09-16 19:49:25 page_content=docx2txt.process(self.file_path), 2024-09-16 19:49:25 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-09-16 19:49:25 File "/usr/local/lib/python3.11/site-packages/docx2txt/docx2txt.py", line 88, in process 2024-09-16 19:49:25 text += xml2text(zipf.read(doc_xml)) 2024-09-16 19:49:25 ^^^^^^^^^^^^^^^^^^ 2024-09-16 19:49:25 File "/usr/local/lib/python3.11/zipfile.py", line 1527, in read 2024-09-16 19:49:25 with self.open(name, "r", pwd) as fp: 2024-09-16 19:49:25 ^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-09-16 19:49:25 File "/usr/local/lib/python3.11/zipfile.py", line 1564, in open 2024-09-16 19:49:25 zinfo = self.getinfo(name) 2024-09-16 19:49:25 ^^^^^^^^^^^^^^^^^^ 2024-09-16 19:49:25 File "/usr/local/lib/python3.11/zipfile.py", line 1493, in getinfo 2024-09-16 19:49:25 raise KeyError( 2024-09-16 19:49:25 KeyError: "There is no item named 'word/document.xml' in the archive" 2024-09-16 19:50:04 INFO [open_webui.apps.webui.routers.files] file.content_type: application/msword 2024-09-16 19:50:04 ERROR [open_webui.apps.rag.main] "There is no item named 'word/document.xml' in the archive" 2024-09-16 19:50:04 Traceback (most recent call last): 2024-09-16 19:50:04 File "/app/backend/open_webui/apps/rag/main.py", line 1268, in process_doc 2024-09-16 19:50:04 data = loader.load() 2024-09-16 19:50:04 ^^^^^^^^^^^^^ 2024-09-16 19:50:04 File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/word_document.py", line 57, in load 2024-09-16 19:50:04 page_content=docx2txt.process(self.file_path), 2024-09-16 19:50:04 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-09-16 19:50:04 File "/usr/local/lib/python3.11/site-packages/docx2txt/docx2txt.py", line 88, in process 2024-09-16 19:50:04 text += xml2text(zipf.read(doc_xml)) 2024-09-16 19:50:04 ^^^^^^^^^^^^^^^^^^ 2024-09-16 19:50:04 File "/usr/local/lib/python3.11/zipfile.py", line 1527, in read 2024-09-16 19:50:04 with self.open(name, "r", pwd) as fp: 2024-09-16 19:50:04 ^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-09-16 19:50:04 File "/usr/local/lib/python3.11/zipfile.py", line 1564, in open 2024-09-16 19:50:04 zinfo = self.getinfo(name) 2024-09-16 19:50:04 ^^^^^^^^^^^^^^^^^^ 2024-09-16 19:50:04 File "/usr/local/lib/python3.11/zipfile.py", line 1493, in getinfo 2024-09-16 19:50:04 raise KeyError( 2024-09-16 19:50:04 KeyError: "There is no item named 'word/document.xml' in the archive" **Screenshots/Screen Recordings (if applicable):** [Attach any relevant screenshots to help illustrate the issue] ## Additional Information I can access the open WebUI interface from a networked computer using internal IP, and I can SUCCESSFULLY upload a PDF or other document from that computer. I cannot do this on the local machine, where the Ollama, Docker and Open WebUI is running. Strikes me it may be a permissions issue? Also, locally I can access Open WebUI on both 3000 and 8080. I will confess that when I couldn't get this working, I installed via WSL and Ubuntu...still with the same problem. I removed the WSL Ubuntu image and reverted back to a Windows install. Perhaps this is now contributing to the app being accessible on both ports, but the PDF/Documents issue existed before this. Thanks for your help! I did scour the internet to see if others have the same problem, but found nothing! ## Note If the bug report is incomplete or does not follow the provided instructions, it may not be addressed. Please ensure that you have followed the steps outlined in the README.md and troubleshooting.md documents, and provide all necessary information for us to reproduce and address the issue. Thank you!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#2096