[GH-ISSUE #23730] feat: Add configuration option RAG_RERANKING_BATCH_SIZE for reranker batch size #20055

Closed
opened 2026-04-20 02:38:15 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @oofnikj on GitHub (Apr 14, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/23730

Check Existing Issues

  • I have searched for all existing open AND closed issues and discussions for similar requests. I have found none that is comparable to my request.

Verify Feature Scope

  • I have read through and understood the scope definition for feature requests in the Issues section. I believe my feature request meets the definition and belongs in the Issues section instead of the Discussions.

Problem Description

Currently, the RAG reranker batch size in ColBERT.predict() is hard-coded as bsize=32 and cannot be changed without editing the code. This makes it difficult for users to tune performance or adapt to different hardware constraints, for example when running in a containerized environment without access to GPU. Making this option configurable will enable the use of SoTA rerankers in a fully offline RAG pipeline configuration.

Desired Solution you'd like

Allow the internal reranker batch size to be set through a config value (such as RAG_RERANKING_BATCH_SIZE), loaded from environment variables or config file, similar to how RAG_EMBEDDING_BATCH_SIZE works. If not set, the current default (32) can be retained.

Alternatives Considered

  • Continued use of the hard-coded value
  • Adding a CLI flag instead of config option

Additional Context

  • See backend/open_webui/retrieval/models/colbert.py line: bsize=32 in predict()
  • RAG_EMBEDDING_BATCH_SIZE config already exists
  • Would improve flexibility for deployments on GPUs or CPUs with varying capacities
Originally created by @oofnikj on GitHub (Apr 14, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/23730 ### Check Existing Issues - [x] I have searched for all existing **open AND closed** issues and discussions for similar requests. I have found none that is comparable to my request. ### Verify Feature Scope - [x] I have read through and understood the scope definition for feature requests in the Issues section. I believe my feature request meets the definition and belongs in the Issues section instead of the Discussions. ### Problem Description Currently, the RAG reranker batch size in `ColBERT.predict()` is hard-coded as `bsize=32` and cannot be changed without editing the code. This makes it difficult for users to tune performance or adapt to different hardware constraints, for example when running in a containerized environment without access to GPU. Making this option configurable will enable the use of SoTA rerankers in a fully offline RAG pipeline configuration. ### Desired Solution you'd like Allow the internal reranker batch size to be set through a config value (such as `RAG_RERANKING_BATCH_SIZE`), loaded from environment variables or config file, similar to how `RAG_EMBEDDING_BATCH_SIZE` works. If not set, the current default (32) can be retained. ### Alternatives Considered - Continued use of the hard-coded value - Adding a CLI flag instead of config option ### Additional Context - See `backend/open_webui/retrieval/models/colbert.py` line: `bsize=32` in `predict()` - `RAG_EMBEDDING_BATCH_SIZE` config already exists - Would improve flexibility for deployments on GPUs or CPUs with varying capacities
Author
Owner

@Classic298 commented on GitHub (Apr 16, 2026):

4d2f189810

<!-- gh-comment-id:4264156347 --> @Classic298 commented on GitHub (Apr 16, 2026): https://github.com/open-webui/open-webui/pull/23318/changes/4d2f18981051205016bd24d39521e25a33581225
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#20055