mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 10:58:17 -05:00
[PR #19572] feat: Add smart RAG token threshold for automatic bypass optimization #40879
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/open-webui/open-webui/pull/19572
Author: @eliem-ark
Created: 11/28/2025
Status: 🔄 Open
Base:
dev← Head:adding-rag-treshold📝 Commits (8)
fe6783cMerge pull request #19030 from open-webui/devfc05e0aMerge pull request #19405 from open-webui/deve3faec6Merge pull request #19416 from open-webui/dev9899293Merge pull request #19448 from open-webui/dev140605eMerge pull request #19462 from open-webui/deve146a5fadding rag treshold logic86a9d4ffront + remove duplicate4463739front📊 Changes
5 files changed (+69 additions, -1 deletions)
View changed files
📝
backend/open_webui/config.py(+6 -0)📝
backend/open_webui/main.py(+2 -0)📝
backend/open_webui/retrieval/utils.py(+6 -0)📝
backend/open_webui/routers/retrieval.py(+38 -1)📝
src/lib/components/admin/Settings/Documents.svelte(+17 -0)📄 Description
Description
Adds an intelligent token-based threshold to automatically decide between RAG (chunking + embedding) and full context mode for uploaded files, optimizing performance for small documents while maintaining RAG for large files.
Context: Following discussion #19177. Cannot be implemented as a filter due to performance requirements.
Problem: All files are currently processed through chunking/embedding regardless of size, wasting 3-60s for small documents that fit in context windows.
Solution: Files ≤ threshold skip RAG processing entirely. Evaluated per-file: uploading a 1KB and 100KB file with 50K threshold will bypass RAG for the 1KB file only.
Added
Configuration: RAG_TOKEN_THRESHOLD (integer, default: 0 = disabled)
Environment variable: RAG_TOKEN_THRESHOLD
Admin Panel > Settings > Documents
Token counting: Uses existing tiktoken with TIKTOKEN_ENCODING_NAME
Metadata flag: Files bypassing RAG marked with bypass_rag: true in metadata
UI field: Number input (step: 1000, min: 0) in Documents settings, hidden when global bypass enabled
Logging: Decision logged per file: "File 'doc.pdf': 25000 tokens (<= 50000), bypassing RAG"
Changed
backend/open_webui/config.py: Added RAG_TOKEN_THRESHOLD PersistentConfig
backend/open_webui/main.py: Imported and initialized RAG_TOKEN_THRESHOLD
backend/open_webui/routers/retrieval.py:
Added RAG_TOKEN_THRESHOLD to ConfigForm, GET/POST endpoints
Enhanced process_file() with token counting logic
Set bypass_rag metadata flag when threshold bypassed
backend/open_webui/retrieval/utils.py: Check bypass_rag flag in get_sources_from_items()
src/lib/components/admin/Settings/Documents.svelte: Added threshold input field
Fixed
Performance: Small files (< threshold) upload a lot faster by skipping unnecessary chunking/embedding
Implementation Notes
fallback: Token counting errors proceed with normal RAG processing
Respects global BYPASS_EMBEDDING_AND_RETRIEVAL
Compatible with existing RAG settings (hybrid search, reranking, etc.)
Screenshots
Admin Panel > Settings > Documents
Logs when uploading 2 files (only one above treshold set to 2k tokens for the example) :
Contributor License Agreement
By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.