powerpoint file could not encoding in RAG #976

Closed
opened 2025-11-11 14:34:45 -06:00 by GiteaMirror · 2 comments
Owner

Originally created by @flyfox666 on GitHub (May 20, 2024).

Bug Report

Description

Bug Summary:
Failed to upload ppt file in documents

Steps to Reproduce:
Failed to upload file

Expected Behavior:
Successfully read and parsed

Actual Behavior:
Display Failure

Environment

  • Open WebUI Version: v0.1.125

  • Ollama (if applicable): 0.1.38

  • Operating System: win11 wsl2 dockerdesktop

  • **Browser (if applicable):**chrome latest

Reproduction Details

Confirmation:

  • I have read and followed all the instructions provided in the README.md.
  • I am on the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.

Logs and Screenshots

Browser Console Logs:
[Include relevant browser console logs, if applicable]

Docker Container Logs:
2024-05-20 13:45:15 INFO:apps.rag.main:file.content_type: application/vnd.openxmlformats-officedocument.presentationml.presentation
2024-05-20 13:45:17 ERROR:apps.rag.main:Could not detect encoding for /app/backend/data/uploads/samplepptx2.pptx
2024-05-20 13:45:17 Traceback (most recent call last):
2024-05-20 13:45:17 File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/text.py", line 43, in lazy_load
2024-05-20 13:45:17 text = f.read()
2024-05-20 13:45:17 ^^^^^^^^
2024-05-20 13:45:17 File "", line 322, in decode
2024-05-20 13:45:17 UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbc in position 18: invalid start byte
2024-05-20 13:45:17
2024-05-20 13:45:17 During handling of the above exception, another exception occurred:
2024-05-20 13:45:17
2024-05-20 13:45:17 Traceback (most recent call last):
2024-05-20 13:45:17 File "/app/backend/apps/rag/main.py", line 808, in store_doc
2024-05-20 13:45:17 data = loader.load()
2024-05-20 13:45:17 ^^^^^^^^^^^^^
2024-05-20 13:45:17 File "/usr/local/lib/python3.11/site-packages/langchain_core/document_loaders/base.py", line 29, in load
2024-05-20 13:45:17 return list(self.lazy_load())
2024-05-20 13:45:17 ^^^^^^^^^^^^^^^^^^^^^^
2024-05-20 13:45:17 File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/text.py", line 46, in lazy_load
2024-05-20 13:45:17 detected_encodings = detect_file_encodings(self.file_path)
2024-05-20 13:45:17 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-05-20 13:45:17 File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/helpers.py", line 50, in detect_file_encodings
2024-05-20 13:45:17 raise RuntimeError(f"Could not detect encoding for {file_path}")
2024-05-20 13:45:17 RuntimeError: Could not detect encoding for /app/backend/data/uploads/samplepptx2.pptx

Screenshots (if applicable):
image

Installation Method

Docker

Additional Information

[Include any additional details that may help in understanding and reproducing the issue. This could include specific configurations, error messages, or anything else relevant to the bug.]

Note

If the bug report is incomplete or does not follow the provided instructions, it may not be addressed. Please ensure that you have followed the steps outlined in the README.md and troubleshooting.md documents, and provide all necessary information for us to reproduce and address the issue. Thank you!

Originally created by @flyfox666 on GitHub (May 20, 2024). # Bug Report ## Description **Bug Summary:** Failed to upload ppt file in documents **Steps to Reproduce:** Failed to upload file **Expected Behavior:** Successfully read and parsed **Actual Behavior:** Display Failure ## Environment - **Open WebUI Version:** v0.1.125 - **Ollama (if applicable):** 0.1.38 - **Operating System:** win11 wsl2 dockerdesktop - **Browser (if applicable):**chrome latest ## Reproduction Details **Confirmation:** - [x] I have read and followed all the instructions provided in the README.md. - [x] I am on the latest version of both Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. ## Logs and Screenshots **Browser Console Logs:** [Include relevant browser console logs, if applicable] **Docker Container Logs:** 2024-05-20 13:45:15 INFO:apps.rag.main:file.content_type: application/vnd.openxmlformats-officedocument.presentationml.presentation 2024-05-20 13:45:17 ERROR:apps.rag.main:Could not detect encoding for /app/backend/data/uploads/samplepptx2.pptx 2024-05-20 13:45:17 Traceback (most recent call last): 2024-05-20 13:45:17 File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/text.py", line 43, in lazy_load 2024-05-20 13:45:17 text = f.read() 2024-05-20 13:45:17 ^^^^^^^^ 2024-05-20 13:45:17 File "<frozen codecs>", line 322, in decode 2024-05-20 13:45:17 UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbc in position 18: invalid start byte 2024-05-20 13:45:17 2024-05-20 13:45:17 During handling of the above exception, another exception occurred: 2024-05-20 13:45:17 2024-05-20 13:45:17 Traceback (most recent call last): 2024-05-20 13:45:17 File "/app/backend/apps/rag/main.py", line 808, in store_doc 2024-05-20 13:45:17 data = loader.load() 2024-05-20 13:45:17 ^^^^^^^^^^^^^ 2024-05-20 13:45:17 File "/usr/local/lib/python3.11/site-packages/langchain_core/document_loaders/base.py", line 29, in load 2024-05-20 13:45:17 return list(self.lazy_load()) 2024-05-20 13:45:17 ^^^^^^^^^^^^^^^^^^^^^^ 2024-05-20 13:45:17 File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/text.py", line 46, in lazy_load 2024-05-20 13:45:17 detected_encodings = detect_file_encodings(self.file_path) 2024-05-20 13:45:17 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-05-20 13:45:17 File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/helpers.py", line 50, in detect_file_encodings 2024-05-20 13:45:17 raise RuntimeError(f"Could not detect encoding for {file_path}") 2024-05-20 13:45:17 RuntimeError: Could not detect encoding for /app/backend/data/uploads/samplepptx2.pptx **Screenshots (if applicable):** ![image](https://github.com/open-webui/open-webui/assets/121539277/c286f88a-65a8-4413-af8e-7f1432bfa313) ## Installation Method Docker ## Additional Information [Include any additional details that may help in understanding and reproducing the issue. This could include specific configurations, error messages, or anything else relevant to the bug.] ## Note If the bug report is incomplete or does not follow the provided instructions, it may not be addressed. Please ensure that you have followed the steps outlined in the README.md and troubleshooting.md documents, and provide all necessary information for us to reproduce and address the issue. Thank you!
Author
Owner

@jannikstdl commented on GitHub (May 20, 2024):

Got the error, will make a fix

@jannikstdl commented on GitHub (May 20, 2024): Got the error, will make a fix
Author
Owner

@flyfox666 commented on GitHub (May 21, 2024):

LGTM👍

@flyfox666 commented on GitHub (May 21, 2024): LGTM👍
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#976