[GH-ISSUE #8177] Optimizing the RAG #15028

Closed
opened 2026-04-19 21:18:56 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @Schwenn2002 on GitHub (Dec 28, 2024).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/8177

I would like to be able to search the documents in the RAG in several stages so that I can choose a smaller context window for the LLM and use less VRAM.

Let's assume there is a context window of 10,000 tokens and a chunk in the RAG has 500 tokens.

The ideal first step would be for the RAG search with e.g. Top K = 80, then a reranking of the 80 chunks and then the return of the best 20 chunks to the LLM (20x500 tokens then fit in the context window).

Is that possible?

Originally created by @Schwenn2002 on GitHub (Dec 28, 2024). Original GitHub issue: https://github.com/open-webui/open-webui/issues/8177 I would like to be able to search the documents in the RAG in several stages so that I can choose a smaller context window for the LLM and use less VRAM. Let's assume there is a context window of 10,000 tokens and a chunk in the RAG has 500 tokens. The ideal first step would be for the RAG search with e.g. Top K = 80, then a reranking of the 80 chunks and then the return of the best 20 chunks to the LLM (20x500 tokens then fit in the context window). Is that possible?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#15028