[PR #13079] [CLOSED] fix: Web RAG empty content #23092

Closed
opened 2026-04-20 04:37:35 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/13079
Author: @tth37
Created: 4/20/2025
Status: Closed

Base: devHead: fix_web_rag_empty_content


📝 Commits (1)

  • ebb633f fix: Web RAG empty content

📊 Changes

1 file changed (+18 additions, -13 deletions)

View changed files

📝 backend/open_webui/routers/retrieval.py (+18 -13)

📄 Description

#12832 70718dda90

Problem Description

After web pages are loaded, the backend proceeds to save_doc_to_vector_db. However, if one of the documents has an empty page_content, the save_docs_to_vector_db operation fails, subsequently causing the entire web search to fail. Commit 70718dda90 partially addresses this issue by adding a check for empty doc.page_content.

However, there are cases where doc.page_content contains only whitespace, which also causes the save_docs_to_vector_db process to fail. This happens because save_docs_to_vector_db performs preprocessing on the page_content, including cleaning and splitting, which cannot handle whitespace-only content.

70718dda90/backend/open_webui/routers/retrieval.py (L862-L863)

Proposed Solution

To ensure the web search process remains robust, we can wrap the save_docs_to_vector_db process in a try-catch block. This will handle potential errors gracefully, preventing them from failing the entire web search.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/13079 **Author:** [@tth37](https://github.com/tth37) **Created:** 4/20/2025 **Status:** ❌ Closed **Base:** `dev` ← **Head:** `fix_web_rag_empty_content` --- ### 📝 Commits (1) - [`ebb633f`](https://github.com/open-webui/open-webui/commit/ebb633fcf1a3cee00107f5735014bff6577ffe14) fix: Web RAG empty content ### 📊 Changes **1 file changed** (+18 additions, -13 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/routers/retrieval.py` (+18 -13) </details> ### 📄 Description ### Related Issue and Commit #12832 70718dda90af07370414dea28f9a93058623a33b ### Problem Description After web pages are loaded, the backend proceeds to `save_doc_to_vector_db`. However, if one of the documents has an empty `page_content`, the `save_docs_to_vector_db` operation fails, subsequently causing the entire web search to fail. Commit 70718dda90af07370414dea28f9a93058623a33b partially addresses this issue by adding a check for empty `doc.page_content`. However, there are cases where `doc.page_content` contains **only whitespace**, which also causes the `save_docs_to_vector_db` process to fail. This happens because `save_docs_to_vector_db` performs preprocessing on the `page_content`, including cleaning and splitting, which cannot handle whitespace-only content. https://github.com/open-webui/open-webui/blob/70718dda90af07370414dea28f9a93058623a33b/backend/open_webui/routers/retrieval.py#L862-L863 ### Proposed Solution To ensure the web search process remains robust, we can wrap the `save_docs_to_vector_db` process in a try-catch block. This will handle potential errors gracefully, preventing them from failing the entire web search. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-20 04:37:35 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#23092