mirror of
https://github.com/open-webui/open-webui.git
synced 2026-03-19 05:21:23 -05:00
powerpoint file could not encoding in RAG #976
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @flyfox666 on GitHub (May 20, 2024).
Bug Report
Description
Bug Summary:
Failed to upload ppt file in documents
Steps to Reproduce:
Failed to upload file
Expected Behavior:
Successfully read and parsed
Actual Behavior:
Display Failure
Environment
Open WebUI Version: v0.1.125
Ollama (if applicable): 0.1.38
Operating System: win11 wsl2 dockerdesktop
**Browser (if applicable):**chrome latest
Reproduction Details
Confirmation:
Logs and Screenshots
Browser Console Logs:
[Include relevant browser console logs, if applicable]
Docker Container Logs:
2024-05-20 13:45:15 INFO:apps.rag.main:file.content_type: application/vnd.openxmlformats-officedocument.presentationml.presentation
2024-05-20 13:45:17 ERROR:apps.rag.main:Could not detect encoding for /app/backend/data/uploads/samplepptx2.pptx
2024-05-20 13:45:17 Traceback (most recent call last):
2024-05-20 13:45:17 File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/text.py", line 43, in lazy_load
2024-05-20 13:45:17 text = f.read()
2024-05-20 13:45:17 ^^^^^^^^
2024-05-20 13:45:17 File "", line 322, in decode
2024-05-20 13:45:17 UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbc in position 18: invalid start byte
2024-05-20 13:45:17
2024-05-20 13:45:17 During handling of the above exception, another exception occurred:
2024-05-20 13:45:17
2024-05-20 13:45:17 Traceback (most recent call last):
2024-05-20 13:45:17 File "/app/backend/apps/rag/main.py", line 808, in store_doc
2024-05-20 13:45:17 data = loader.load()
2024-05-20 13:45:17 ^^^^^^^^^^^^^
2024-05-20 13:45:17 File "/usr/local/lib/python3.11/site-packages/langchain_core/document_loaders/base.py", line 29, in load
2024-05-20 13:45:17 return list(self.lazy_load())
2024-05-20 13:45:17 ^^^^^^^^^^^^^^^^^^^^^^
2024-05-20 13:45:17 File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/text.py", line 46, in lazy_load
2024-05-20 13:45:17 detected_encodings = detect_file_encodings(self.file_path)
2024-05-20 13:45:17 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-05-20 13:45:17 File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/helpers.py", line 50, in detect_file_encodings
2024-05-20 13:45:17 raise RuntimeError(f"Could not detect encoding for {file_path}")
2024-05-20 13:45:17 RuntimeError: Could not detect encoding for /app/backend/data/uploads/samplepptx2.pptx
Screenshots (if applicable):

Installation Method
Docker
Additional Information
[Include any additional details that may help in understanding and reproducing the issue. This could include specific configurations, error messages, or anything else relevant to the bug.]
Note
If the bug report is incomplete or does not follow the provided instructions, it may not be addressed. Please ensure that you have followed the steps outlined in the README.md and troubleshooting.md documents, and provide all necessary information for us to reproduce and address the issue. Thank you!
@jannikstdl commented on GitHub (May 20, 2024):
Got the error, will make a fix
@flyfox666 commented on GitHub (May 21, 2024):
LGTM👍