[PR #19569] [CLOSED] feat: Add smart RAG token threshold for automatic bypass optimization #64104

Closed
opened 2026-05-06 09:25:14 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/19569
Author: @eliem-ark
Created: 11/28/2025
Status: Closed

Base: devHead: rag-treshold-native


📝 Commits (10+)

📊 Changes

12 files changed (+71 additions, -10 deletions)

View changed files

📝 CHANGELOG.md (+7 -1)
📝 backend/open_webui/config.py (+7 -0)
📝 backend/open_webui/main.py (+2 -0)
📝 backend/open_webui/models/users.py (+0 -1)
📝 backend/open_webui/routers/retrieval.py (+32 -1)
📝 backend/requirements-min.txt (+1 -1)
📝 backend/requirements.txt (+1 -1)
📝 package-lock.json (+2 -2)
📝 package.json (+1 -1)
📝 pyproject.toml (+1 -1)
📝 src/lib/components/admin/Settings/Documents.svelte (+16 -0)
📝 src/lib/components/channel/ChannelInfoModal/UserList.svelte (+1 -1)

📄 Description

Description
Adds an intelligent token-based threshold to automatically decide between RAG (chunking + embedding) and full context mode for uploaded files, optimizing performance for small documents while maintaining RAG for large files.

Context: Following discussion #19177. Cannot be implemented as a filter due to performance requirements.

Problem: All files are currently processed through chunking/embedding regardless of size, wasting 3-60s for small documents that fit in context windows.

Solution: Files ≤ threshold skip RAG processing entirely. Evaluated per-file: uploading a 1KB and 100KB file with 50K threshold will bypass RAG for the 1KB file only.

Added
Configuration: RAG_TOKEN_THRESHOLD (integer, default: 0 = disabled)
Environment variable: RAG_TOKEN_THRESHOLD
Admin Panel > Settings > Documents
Token counting: Uses existing tiktoken with TIKTOKEN_ENCODING_NAME
Metadata flag: Files bypassing RAG marked with bypass_rag: true in metadata
UI field: Number input (step: 1000, min: 0) in Documents settings, hidden when global bypass enabled
Logging: Decision logged per file: "File 'doc.pdf': 25000 tokens (<= 50000), bypassing RAG"
Changed
backend/open_webui/config.py: Added RAG_TOKEN_THRESHOLD PersistentConfig
backend/open_webui/main.py: Imported and initialized RAG_TOKEN_THRESHOLD
backend/open_webui/routers/retrieval.py:
Added RAG_TOKEN_THRESHOLD to ConfigForm, GET/POST endpoints
Enhanced process_file() with token counting logic
Set bypass_rag metadata flag when threshold bypassed
backend/open_webui/retrieval/utils.py: Check bypass_rag flag in get_sources_from_items()
src/lib/components/admin/Settings/Documents.svelte: Added threshold input field
Fixed
Performance: Small files (< threshold) upload a lot faster by skipping unnecessary chunking/embedding
Implementation Notes
fallback: Token counting errors proceed with normal RAG processing
Respects global BYPASS_EMBEDDING_AND_RETRIEVAL
Compatible with existing RAG settings (hybrid search, reranking, etc.)
Screenshot
Admin Panel > Settings > Documents

Token Threshold setting in Documents UI

Logs when uploading 2 files (only one above treshold set to 2k tokens for the example) :

image

Contributor License Agreement
By submitting this pull request, I confirm that I have read and fully agree to the Contributor License Agreement (CLA), and I am providing my contributions under its terms.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/19569 **Author:** [@eliem-ark](https://github.com/eliem-ark) **Created:** 11/28/2025 **Status:** ❌ Closed **Base:** `dev` ← **Head:** `rag-treshold-native` --- ### 📝 Commits (10+) - [`fe6783c`](https://github.com/open-webui/open-webui/commit/fe6783c16699911c7be17392596d579333fb110c) Merge pull request #19030 from open-webui/dev - [`fc05e0a`](https://github.com/open-webui/open-webui/commit/fc05e0a6c5d39da60b603b4d520f800d6e36f748) Merge pull request #19405 from open-webui/dev - [`e3faec6`](https://github.com/open-webui/open-webui/commit/e3faec62c58e3a83d89aa3df539feacefa125e0c) Merge pull request #19416 from open-webui/dev - [`9899293`](https://github.com/open-webui/open-webui/commit/9899293f050ad50ae12024cbebee7e018acd851e) Merge pull request #19448 from open-webui/dev - [`ebb52c9`](https://github.com/open-webui/open-webui/commit/ebb52c9b15c03fe9e1a07725f5321b6005154df4) fix: changelog - [`063c4bd`](https://github.com/open-webui/open-webui/commit/063c4bdb416e9e93beadeea426de789923082a77) fix: postgres user list issue - [`e1f4f96`](https://github.com/open-webui/open-webui/commit/e1f4f96279974a4ffb2ba3159f1eda940b06f80c) chore: bump - [`43086b3`](https://github.com/open-webui/open-webui/commit/43086b3a6cca85f99674bb6e3801094745fc88c8) chore: bump python-socketio==5.14.0 - [`c74467e`](https://github.com/open-webui/open-webui/commit/c74467eaabdfa1fa67fb47867f05eee5acee46c1) Update CHANGELOG.md (#19463) - [`4333653`](https://github.com/open-webui/open-webui/commit/4333653df6f2ce9250eda2789d4f94998b9e4555) final impl ### 📊 Changes **12 files changed** (+71 additions, -10 deletions) <details> <summary>View changed files</summary> 📝 `CHANGELOG.md` (+7 -1) 📝 `backend/open_webui/config.py` (+7 -0) 📝 `backend/open_webui/main.py` (+2 -0) 📝 `backend/open_webui/models/users.py` (+0 -1) 📝 `backend/open_webui/routers/retrieval.py` (+32 -1) 📝 `backend/requirements-min.txt` (+1 -1) 📝 `backend/requirements.txt` (+1 -1) 📝 `package-lock.json` (+2 -2) 📝 `package.json` (+1 -1) 📝 `pyproject.toml` (+1 -1) 📝 `src/lib/components/admin/Settings/Documents.svelte` (+16 -0) 📝 `src/lib/components/channel/ChannelInfoModal/UserList.svelte` (+1 -1) </details> ### 📄 Description Description Adds an intelligent token-based threshold to automatically decide between RAG (chunking + embedding) and full context mode for uploaded files, optimizing performance for small documents while maintaining RAG for large files. Context: Following [discussion #19177](https://github.com/open-webui/open-webui/discussions/19177). Cannot be implemented as a filter due to performance requirements. Problem: All files are currently processed through chunking/embedding regardless of size, wasting 3-60s for small documents that fit in context windows. Solution: Files ≤ threshold skip RAG processing entirely. Evaluated per-file: uploading a 1KB and 100KB file with 50K threshold will bypass RAG for the 1KB file only. Added Configuration: RAG_TOKEN_THRESHOLD (integer, default: 0 = disabled) Environment variable: RAG_TOKEN_THRESHOLD Admin Panel > Settings > Documents Token counting: Uses existing tiktoken with TIKTOKEN_ENCODING_NAME Metadata flag: Files bypassing RAG marked with bypass_rag: true in metadata UI field: Number input (step: 1000, min: 0) in Documents settings, hidden when global bypass enabled Logging: Decision logged per file: "File 'doc.pdf': 25000 tokens (<= 50000), bypassing RAG" Changed backend/open_webui/config.py: Added RAG_TOKEN_THRESHOLD PersistentConfig backend/open_webui/main.py: Imported and initialized RAG_TOKEN_THRESHOLD backend/open_webui/routers/retrieval.py: Added RAG_TOKEN_THRESHOLD to ConfigForm, GET/POST endpoints Enhanced process_file() with token counting logic Set bypass_rag metadata flag when threshold bypassed backend/open_webui/retrieval/utils.py: Check bypass_rag flag in get_sources_from_items() src/lib/components/admin/Settings/Documents.svelte: Added threshold input field Fixed Performance: Small files (< threshold) upload a lot faster by skipping unnecessary chunking/embedding Implementation Notes fallback: Token counting errors proceed with normal RAG processing Respects global BYPASS_EMBEDDING_AND_RETRIEVAL Compatible with existing RAG settings (hybrid search, reranking, etc.) Screenshot Admin Panel > Settings > Documents <img width="1381" height="293" alt="Token Threshold setting in Documents UI" src="https://github.com/user-attachments/assets/03ace7d1-4ed8-4ff5-bf90-b1acf7c41ffd" /> Logs when uploading 2 files (only one above treshold set to 2k tokens for the example) : <img width="1381" height="239" alt="image" src="https://github.com/user-attachments/assets/0cb46658-1c79-4ae9-8d67-471280173d19" /> Contributor License Agreement By submitting this pull request, I confirm that I have read and fully agree to the [Contributor License Agreement (CLA)](https://github.com/open-webui/open-webui/blob/main/CONTRIBUTOR_LICENSE_AGREEMENT), and I am providing my contributions under its terms. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-06 09:25:14 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#64104