Long context adaptive RAG #714

New Issue

GiteaMirror · 2025-11-11T14:29:38-06:00

GiteaMirror commented

2025-11-11 14:29:38 -06:00

Originally created by @IIPedro on GitHub (Apr 24, 2024).

Is your feature request related to a problem? Please describe.

Long context models have shown to be useful in regard to tasks such as code base analysis and other tasks that require full comprehension of the context. Imminent long context accessible models such as Phi-3 128k also show how, in the future, these models might be used by many.

The current RAG solution is based on providing a snippet through similarity search. While useful for short context models, this might not be the optimal solution for models that can handle 100k tokens or more.

Describe the solution you'd like

Preferably, the ability to indicate how long the RAG context should be. Something such as "#64#{document}" could indicate 64k tokens being used as the snippet. I still see using similarity search as useful since even 128k tokens might not be enough to fit a dense book or alike. It's important to note that it would be very pleasing to see this feature also available in the web search feature, through "#{context}#{link}"

Describe alternatives you've considered

Alternatives include indicating the context size beforehand in the documents tab. Although I believe this would be the short way out, changing the context through the tab would be cumbersome and require adaptation when switching models.

Additional context

I believe the description was enough to explain the issue. I'm still thinking this through, and I'm open to suggestions or opinions. Let's talk about it! Thanks.

Originally created by @IIPedro on GitHub (Apr 24, 2024). **Is your feature request related to a problem? Please describe.** Long context models have shown to be useful in regard to tasks such as code base analysis and other tasks that require full comprehension of the context. Imminent long context accessible models such as Phi-3 128k also show how, in the future, these models might be used by many. The current RAG solution is based on providing a snippet through similarity search. While useful for short context models, this might not be the optimal solution for models that can handle 100k tokens or more. **Describe the solution you'd like** Preferably, the ability to indicate how long the RAG context should be. Something such as "#64#{document}" could indicate 64k tokens being used as the snippet. I still see using similarity search as useful since even 128k tokens might not be enough to fit a dense book or alike. It's important to note that it would be very pleasing to see this feature also available in the web search feature, through "#{context}#{link}" **Describe alternatives you've considered** Alternatives include indicating the context size beforehand in the documents tab. Although I believe this would be the short way out, changing the context through the tab would be cumbersome and require adaptation when switching models. **Additional context** I believe the description was enough to explain the issue. I'm still thinking this through, and I'm open to suggestions or opinions. Let's talk about it! Thanks.

GiteaMirror closed this issue

2025-11-11 14:29:38 -06:00

GiteaMirror commented

2025-11-11 14:29:39 -06:00

@justinh-rahb commented on GitHub (Apr 24, 2024):

Possibly related work (enhances RAG in any case):