[PR #19572] feat: Add smart RAG token threshold for automatic bypass optimization #25249

Open
opened 2026-04-20 05:50:41 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/19572
Author: @eliem-ark
Created: 11/28/2025
Status: 🔄 Open

Base: devHead: adding-rag-treshold


📝 Commits (8)

📊 Changes

5 files changed (+69 additions, -1 deletions)

View changed files

📝 backend/open_webui/config.py (+6 -0)
📝 backend/open_webui/main.py (+2 -0)
📝 backend/open_webui/retrieval/utils.py (+6 -0)
📝 backend/open_webui/routers/retrieval.py (+38 -1)
📝 src/lib/components/admin/Settings/Documents.svelte (+17 -0)

📄 Description

Description
Adds an intelligent token-based threshold to automatically decide between RAG (chunking + embedding) and full context mode for uploaded files, optimizing performance for small documents while maintaining RAG for large files.

Context: Following discussion #19177. Cannot be implemented as a filter due to performance requirements.

Problem: All files are currently processed through chunking/embedding regardless of size, wasting 3-60s for small documents that fit in context windows.

Solution: Files ≤ threshold skip RAG processing entirely. Evaluated per-file: uploading a 1KB and 100KB file with 50K threshold will bypass RAG for the 1KB file only.

Added
Configuration: RAG_TOKEN_THRESHOLD (integer, default: 0 = disabled)
Environment variable: RAG_TOKEN_THRESHOLD
Admin Panel > Settings > Documents
Token counting: Uses existing tiktoken with TIKTOKEN_ENCODING_NAME
Metadata flag: Files bypassing RAG marked with bypass_rag: true in metadata
UI field: Number input (step: 1000, min: 0) in Documents settings, hidden when global bypass enabled
Logging: Decision logged per file: "File 'doc.pdf': 25000 tokens (<= 50000), bypassing RAG"

Changed
backend/open_webui/config.py: Added RAG_TOKEN_THRESHOLD PersistentConfig
backend/open_webui/main.py: Imported and initialized RAG_TOKEN_THRESHOLD
backend/open_webui/routers/retrieval.py:
Added RAG_TOKEN_THRESHOLD to ConfigForm, GET/POST endpoints
Enhanced process_file() with token counting logic
Set bypass_rag metadata flag when threshold bypassed
backend/open_webui/retrieval/utils.py: Check bypass_rag flag in get_sources_from_items()
src/lib/components/admin/Settings/Documents.svelte: Added threshold input field

Fixed
Performance: Small files (< threshold) upload a lot faster by skipping unnecessary chunking/embedding
Implementation Notes
fallback: Token counting errors proceed with normal RAG processing
Respects global BYPASS_EMBEDDING_AND_RETRIEVAL
Compatible with existing RAG settings (hybrid search, reranking, etc.)

Screenshots
Admin Panel > Settings > Documents

Token Threshold setting in Documents UI

Logs when uploading 2 files (only one above treshold set to 2k tokens for the example) :

image

Contributor License Agreement
By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/19572 **Author:** [@eliem-ark](https://github.com/eliem-ark) **Created:** 11/28/2025 **Status:** 🔄 Open **Base:** `dev` ← **Head:** `adding-rag-treshold` --- ### 📝 Commits (8) - [`fe6783c`](https://github.com/open-webui/open-webui/commit/fe6783c16699911c7be17392596d579333fb110c) Merge pull request #19030 from open-webui/dev - [`fc05e0a`](https://github.com/open-webui/open-webui/commit/fc05e0a6c5d39da60b603b4d520f800d6e36f748) Merge pull request #19405 from open-webui/dev - [`e3faec6`](https://github.com/open-webui/open-webui/commit/e3faec62c58e3a83d89aa3df539feacefa125e0c) Merge pull request #19416 from open-webui/dev - [`9899293`](https://github.com/open-webui/open-webui/commit/9899293f050ad50ae12024cbebee7e018acd851e) Merge pull request #19448 from open-webui/dev - [`140605e`](https://github.com/open-webui/open-webui/commit/140605e660b8186a7d5c79fb3be6ffb147a2f498) Merge pull request #19462 from open-webui/dev - [`e146a5f`](https://github.com/open-webui/open-webui/commit/e146a5f613f987712db390df3f777122c7c242a3) adding rag treshold logic - [`86a9d4f`](https://github.com/open-webui/open-webui/commit/86a9d4fad66c6c6f6ae0452792ff03d7585a0231) front + remove duplicate - [`4463739`](https://github.com/open-webui/open-webui/commit/44637399c0143fb2a02c65eaede9f418d7ed5c68) front ### 📊 Changes **5 files changed** (+69 additions, -1 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/config.py` (+6 -0) 📝 `backend/open_webui/main.py` (+2 -0) 📝 `backend/open_webui/retrieval/utils.py` (+6 -0) 📝 `backend/open_webui/routers/retrieval.py` (+38 -1) 📝 `src/lib/components/admin/Settings/Documents.svelte` (+17 -0) </details> ### 📄 Description Description Adds an intelligent token-based threshold to automatically decide between RAG (chunking + embedding) and full context mode for uploaded files, optimizing performance for small documents while maintaining RAG for large files. Context: Following [discussion #19177](https://github.com/open-webui/open-webui/discussions/19177). Cannot be implemented as a filter due to performance requirements. Problem: All files are currently processed through chunking/embedding regardless of size, wasting 3-60s for small documents that fit in context windows. Solution: Files ≤ threshold skip RAG processing entirely. Evaluated per-file: uploading a 1KB and 100KB file with 50K threshold will bypass RAG for the 1KB file only. Added Configuration: RAG_TOKEN_THRESHOLD (integer, default: 0 = disabled) Environment variable: RAG_TOKEN_THRESHOLD Admin Panel > Settings > Documents Token counting: Uses existing tiktoken with TIKTOKEN_ENCODING_NAME Metadata flag: Files bypassing RAG marked with bypass_rag: true in metadata UI field: Number input (step: 1000, min: 0) in Documents settings, hidden when global bypass enabled Logging: Decision logged per file: "File 'doc.pdf': 25000 tokens (<= 50000), bypassing RAG" Changed backend/open_webui/config.py: Added RAG_TOKEN_THRESHOLD PersistentConfig backend/open_webui/main.py: Imported and initialized RAG_TOKEN_THRESHOLD backend/open_webui/routers/retrieval.py: Added RAG_TOKEN_THRESHOLD to ConfigForm, GET/POST endpoints Enhanced process_file() with token counting logic Set bypass_rag metadata flag when threshold bypassed backend/open_webui/retrieval/utils.py: Check bypass_rag flag in get_sources_from_items() src/lib/components/admin/Settings/Documents.svelte: Added threshold input field Fixed Performance: Small files (< threshold) upload a lot faster by skipping unnecessary chunking/embedding Implementation Notes fallback: Token counting errors proceed with normal RAG processing Respects global BYPASS_EMBEDDING_AND_RETRIEVAL Compatible with existing RAG settings (hybrid search, reranking, etc.) Screenshots Admin Panel > Settings > Documents <img width="1381" height="293" alt="Token Threshold setting in Documents UI" src="https://github.com/user-attachments/assets/03ace7d1-4ed8-4ff5-bf90-b1acf7c41ffd" /> Logs when uploading 2 files (only one above treshold set to 2k tokens for the example) : <img width="1381" height="239" alt="image" src="https://github.com/user-attachments/assets/0cb46658-1c79-4ae9-8d67-471280173d19" /> Contributor License Agreement By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-20 05:50:41 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#25249