[GH-ISSUE #1158] RAG breaks on larger pdf/txt/doc file #12366

Closed
opened 2026-04-19 19:15:51 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @reiebrole30 on GitHub (Mar 14, 2024).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/1158

Bug Report

Description

RAG breaks on larger pdf/txt/doc file
Bug Summary:
[Provide a brief but clear summary of the bug]
image

Steps to Reproduce:
[Outline the steps to reproduce the bug. Be as detailed as possible.]
I have a 58021 character document, originally in docx format, i converted to pdf and plain text, both getting the same error.
I wonder if issue is a rate limiting or bug?, the same document works flawlessly with open WebUI version 1.08
Expected Behavior:
[Describe what you expected to happen.]

Actual Behavior:
[Describe what actually happened.]

Environment

  • Operating System: [e.g., Windows 10, macOS Big Sur, Ubuntu 20.04]
  • Oracle Linux Ampere 8
  • Browser (if applicable): [e.g., Chrome 100.0, Firefox 98.0]

Reproduction Details

Confirmation:

  • I have read and followed all the instructions provided in the README.md.
  • I am on the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.

Logs and Screenshots

Browser Console Logs:
[Include relevant browser console logs, if applicable]

Docker Container Logs:
[Include relevant Docker container logs, if applicable]

Screenshots (if applicable):
[Attach any relevant screenshots to help illustrate the issue]

Installation Method

[Describe the method you used to install the project, e.g., manual installation, Docker, package manager, etc.]

Additional Information

[Include any additional details that may help in understanding and reproducing the issue. This could include specific configurations, error messages, or anything else relevant to the bug.]

Note

If the bug report is incomplete or does not follow the provided instructions, it may not be addressed. Please ensure that you have followed the steps outlined in the README.md and troubleshooting.md documents, and provide all necessary information for us to reproduce and address the issue. Thank you!

Originally created by @reiebrole30 on GitHub (Mar 14, 2024). Original GitHub issue: https://github.com/open-webui/open-webui/issues/1158 # Bug Report ## Description RAG breaks on larger pdf/txt/doc file **Bug Summary:** [Provide a brief but clear summary of the bug] ![image](https://github.com/open-webui/open-webui/assets/139945395/d2e1f2bd-d586-4de6-a697-a2cb5c857ff3) **Steps to Reproduce:** [Outline the steps to reproduce the bug. Be as detailed as possible.] I have a 58021 character document, originally in docx format, i converted to pdf and plain text, both getting the same error. I wonder if issue is a rate limiting or bug?, the same document works flawlessly with open WebUI version 1.08 **Expected Behavior:** [Describe what you expected to happen.] **Actual Behavior:** [Describe what actually happened.] ## Environment - **Operating System:** [e.g., Windows 10, macOS Big Sur, Ubuntu 20.04] - Oracle Linux Ampere 8 - **Browser (if applicable):** [e.g., Chrome 100.0, Firefox 98.0] ## Reproduction Details **Confirmation:** - [ ] I have read and followed all the instructions provided in the README.md. - [ ] I am on the latest version of both Open WebUI and Ollama. - [ ] I have included the browser console logs. - [ ] I have included the Docker container logs. ## Logs and Screenshots **Browser Console Logs:** [Include relevant browser console logs, if applicable] **Docker Container Logs:** [Include relevant Docker container logs, if applicable] **Screenshots (if applicable):** [Attach any relevant screenshots to help illustrate the issue] ## Installation Method [Describe the method you used to install the project, e.g., manual installation, Docker, package manager, etc.] ## Additional Information [Include any additional details that may help in understanding and reproducing the issue. This could include specific configurations, error messages, or anything else relevant to the bug.] ## Note If the bug report is incomplete or does not follow the provided instructions, it may not be addressed. Please ensure that you have followed the steps outlined in the README.md and troubleshooting.md documents, and provide all necessary information for us to reproduce and address the issue. Thank you!
Author
Owner

@tjbck commented on GitHub (Mar 14, 2024):

Could you share the document with us? We're unable to reproduce the issue with our documents.

<!-- gh-comment-id:1996567846 --> @tjbck commented on GitHub (Mar 14, 2024): Could you share the document with us? We're unable to reproduce the issue with our documents.
Author
Owner

@reiebrole30 commented on GitHub (Mar 14, 2024):

Could you share the document with us? We're unable to reproduce the issue with our documents.

can send you an email? file is confidential, basically a summary of our helpdesk tasks.

<!-- gh-comment-id:1996596358 --> @reiebrole30 commented on GitHub (Mar 14, 2024): > Could you share the document with us? We're unable to reproduce the issue with our documents. can send you an email? file is confidential, basically a summary of our helpdesk tasks.
Author
Owner

@reiebrole30 commented on GitHub (Mar 14, 2024):

Hey team, Im closing this issue, found a fix to my problem by escaping the content of the pdf lol , i first converted it to plain text then used https://onlinestringtools.com/escape-string , I am new to RAG so Im not sure which string caused the issue.

Here's my docker logs:
image

And here's how it is now after escaping:
image

<!-- gh-comment-id:1996754626 --> @reiebrole30 commented on GitHub (Mar 14, 2024): Hey team, Im closing this issue, found a fix to my problem by escaping the content of the pdf lol , i first converted it to plain text then used https://onlinestringtools.com/escape-string , I am new to RAG so Im not sure which string caused the issue. Here's my docker logs: ![image](https://github.com/open-webui/open-webui/assets/139945395/1c9c8e35-bb7b-4a24-ad5b-ab66e593ed6d) And here's how it is now after escaping: ![image](https://github.com/open-webui/open-webui/assets/139945395/93940894-083a-4566-b91c-dd75688f64af)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#12366