mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 19:38:46 -05:00
[PR #13079] [CLOSED] fix: Web RAG empty content #23092
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/open-webui/open-webui/pull/13079
Author: @tth37
Created: 4/20/2025
Status: ❌ Closed
Base:
dev← Head:fix_web_rag_empty_content📝 Commits (1)
ebb633ffix: Web RAG empty content📊 Changes
1 file changed (+18 additions, -13 deletions)
View changed files
📝
backend/open_webui/routers/retrieval.py(+18 -13)📄 Description
Related Issue and Commit
#12832
70718dda90Problem Description
After web pages are loaded, the backend proceeds to
save_doc_to_vector_db. However, if one of the documents has an emptypage_content, thesave_docs_to_vector_dboperation fails, subsequently causing the entire web search to fail. Commit70718dda90partially addresses this issue by adding a check for emptydoc.page_content.However, there are cases where
doc.page_contentcontains only whitespace, which also causes thesave_docs_to_vector_dbprocess to fail. This happens becausesave_docs_to_vector_dbperforms preprocessing on thepage_content, including cleaning and splitting, which cannot handle whitespace-only content.70718dda90/backend/open_webui/routers/retrieval.py (L862-L863)Proposed Solution
To ensure the web search process remains robust, we can wrap the
save_docs_to_vector_dbprocess in a try-catch block. This will handle potential errors gracefully, preventing them from failing the entire web search.🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.