mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 10:58:17 -05:00
[PR #19569] [CLOSED] feat: Add smart RAG token threshold for automatic bypass optimization #64104
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/open-webui/open-webui/pull/19569
Author: @eliem-ark
Created: 11/28/2025
Status: ❌ Closed
Base:
dev← Head:rag-treshold-native📝 Commits (10+)
fe6783cMerge pull request #19030 from open-webui/devfc05e0aMerge pull request #19405 from open-webui/deve3faec6Merge pull request #19416 from open-webui/dev9899293Merge pull request #19448 from open-webui/devebb52c9fix: changelog063c4bdfix: postgres user list issuee1f4f96chore: bump43086b3chore: bump python-socketio==5.14.0c74467eUpdate CHANGELOG.md (#19463)4333653final impl📊 Changes
12 files changed (+71 additions, -10 deletions)
View changed files
📝
CHANGELOG.md(+7 -1)📝
backend/open_webui/config.py(+7 -0)📝
backend/open_webui/main.py(+2 -0)📝
backend/open_webui/models/users.py(+0 -1)📝
backend/open_webui/routers/retrieval.py(+32 -1)📝
backend/requirements-min.txt(+1 -1)📝
backend/requirements.txt(+1 -1)📝
package-lock.json(+2 -2)📝
package.json(+1 -1)📝
pyproject.toml(+1 -1)📝
src/lib/components/admin/Settings/Documents.svelte(+16 -0)📝
src/lib/components/channel/ChannelInfoModal/UserList.svelte(+1 -1)📄 Description
Description
Adds an intelligent token-based threshold to automatically decide between RAG (chunking + embedding) and full context mode for uploaded files, optimizing performance for small documents while maintaining RAG for large files.
Context: Following discussion #19177. Cannot be implemented as a filter due to performance requirements.
Problem: All files are currently processed through chunking/embedding regardless of size, wasting 3-60s for small documents that fit in context windows.
Solution: Files ≤ threshold skip RAG processing entirely. Evaluated per-file: uploading a 1KB and 100KB file with 50K threshold will bypass RAG for the 1KB file only.
Added
Configuration: RAG_TOKEN_THRESHOLD (integer, default: 0 = disabled)
Environment variable: RAG_TOKEN_THRESHOLD
Admin Panel > Settings > Documents
Token counting: Uses existing tiktoken with TIKTOKEN_ENCODING_NAME
Metadata flag: Files bypassing RAG marked with bypass_rag: true in metadata
UI field: Number input (step: 1000, min: 0) in Documents settings, hidden when global bypass enabled
Logging: Decision logged per file: "File 'doc.pdf': 25000 tokens (<= 50000), bypassing RAG"
Changed
backend/open_webui/config.py: Added RAG_TOKEN_THRESHOLD PersistentConfig
backend/open_webui/main.py: Imported and initialized RAG_TOKEN_THRESHOLD
backend/open_webui/routers/retrieval.py:
Added RAG_TOKEN_THRESHOLD to ConfigForm, GET/POST endpoints
Enhanced process_file() with token counting logic
Set bypass_rag metadata flag when threshold bypassed
backend/open_webui/retrieval/utils.py: Check bypass_rag flag in get_sources_from_items()
src/lib/components/admin/Settings/Documents.svelte: Added threshold input field
Fixed
Performance: Small files (< threshold) upload a lot faster by skipping unnecessary chunking/embedding
Implementation Notes
fallback: Token counting errors proceed with normal RAG processing
Respects global BYPASS_EMBEDDING_AND_RETRIEVAL
Compatible with existing RAG settings (hybrid search, reranking, etc.)
Screenshot
Admin Panel > Settings > Documents
Logs when uploading 2 files (only one above treshold set to 2k tokens for the example) :
Contributor License Agreement
By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.