mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-05 18:38:17 -05:00
[GH-ISSUE #1158] RAG breaks on larger pdf/txt/doc file #12366
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @reiebrole30 on GitHub (Mar 14, 2024).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/1158
Bug Report
Description
RAG breaks on larger pdf/txt/doc file

Bug Summary:
[Provide a brief but clear summary of the bug]
Steps to Reproduce:
[Outline the steps to reproduce the bug. Be as detailed as possible.]
I have a 58021 character document, originally in docx format, i converted to pdf and plain text, both getting the same error.
I wonder if issue is a rate limiting or bug?, the same document works flawlessly with open WebUI version 1.08
Expected Behavior:
[Describe what you expected to happen.]
Actual Behavior:
[Describe what actually happened.]
Environment
Reproduction Details
Confirmation:
Logs and Screenshots
Browser Console Logs:
[Include relevant browser console logs, if applicable]
Docker Container Logs:
[Include relevant Docker container logs, if applicable]
Screenshots (if applicable):
[Attach any relevant screenshots to help illustrate the issue]
Installation Method
[Describe the method you used to install the project, e.g., manual installation, Docker, package manager, etc.]
Additional Information
[Include any additional details that may help in understanding and reproducing the issue. This could include specific configurations, error messages, or anything else relevant to the bug.]
Note
If the bug report is incomplete or does not follow the provided instructions, it may not be addressed. Please ensure that you have followed the steps outlined in the README.md and troubleshooting.md documents, and provide all necessary information for us to reproduce and address the issue. Thank you!
@tjbck commented on GitHub (Mar 14, 2024):
Could you share the document with us? We're unable to reproduce the issue with our documents.
@reiebrole30 commented on GitHub (Mar 14, 2024):
can send you an email? file is confidential, basically a summary of our helpdesk tasks.
@reiebrole30 commented on GitHub (Mar 14, 2024):
Hey team, Im closing this issue, found a fix to my problem by escaping the content of the pdf lol , i first converted it to plain text then used https://onlinestringtools.com/escape-string , I am new to RAG so Im not sure which string caused the issue.
Here's my docker logs:

And here's how it is now after escaping:
