mirror of
https://github.com/open-webui/open-webui.git
synced 2026-03-11 17:47:44 -05:00
[PR #16209] [CLOSED] feat: Add RAG grounding step (extension to Google embeddings) #10868
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/open-webui/open-webui/pull/16209
Author: @ipapapa
Created: 8/1/2025
Status: ❌ Closed
Base:
main← Head:feat/add-grounding-step-clean📝 Commits (6)
ffff6fafeat: Add Google embeddings supportd7262aaMerge branch 'open-webui:main' into fix-google-embeddingsb21509dfeat: Add Google embeddings support with migration guidance2a793a7feat: add RAG grounding step as extension to Google embeddings1c5a0f5feat: integrate grounding step into retrieval pipeline050811afeat: add comprehensive tests for grounding step📊 Changes
5 files changed (+676 additions, -1 deletions)
View changed files
📝
README.md(+41 -0)📝
backend/open_webui/config.py(+12 -0)➕
backend/open_webui/retrieval/grounding.py(+153 -0)📝
backend/open_webui/retrieval/utils.py(+123 -1)➕
backend/open_webui/test/retrieval/test_grounding.py(+347 -0)📄 Description
Summary
This PR adds a lightweight grounding step after retrieval to prevent semantic drift when using different embedding models. This addresses a well-documented problem in RAG systems where retrieved content appears relevant but generates off-topic responses due to embedding model inconsistencies.
This PR extends and builds upon #16022 (Google embeddings support).
Recent academic research has identified significant challenges with embedding model mismatch in RAG systems:
Solution: Post-Retrieval Validation
Our implementation follows grounding techniques.
Key Features
Configuration
Technical Approach
Research Validation
This approach is supported by recent academic work:
Addresses Community Feedback
This implementation responds to discussion in #16043, specifically addressing concerns about cross-embedding provider semantic alignment and the need for validation layers in multi-provider RAG systems.
Testing
Backward Compatibility
Performance Impact
When enabled:
Contributor License Agreement
By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.