[GH-ISSUE #14164] feat: retrieve RAG context chunks within /api/chat/completions #17162

New Issue

GiteaMirror · 2026-04-19T22:54:20-05:00

GiteaMirror commented

2026-04-19 22:54:20 -05:00

Originally created by @vaclcer on GitHub (May 22, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/14164

Check Existing Issues

I have searched the existing issues and discussions.

Problem Description

Hi, this is a recurring topic 13304

Judging by the reactions, many people would appreciate to get context chunks used for final generation step in RAG via API. I believe this should be pretty much standard in any production system.

Context chunks are used not only for a reference, but also for any response evaluations based on RAGAS, DeepEval or others...

There was a rejected PR addressing this 13378

This might be working with enabled streaming response, but that is not trivial to process for many of us, I presume.

Thank you for the consideration!

Desired Solution you'd like

I would think retrieving RAG context/chunks would be configurable via /api/chat/completions payload object, similar as:

payload = {
'model': model,
'messages': [{'role': 'user', 'content': query}],
'files': [{'type': 'collection', 'id': collection_id}]
'retrieve_context': True
}

Alternatives Considered

No response

Additional Context

No response

Originally created by @vaclcer on GitHub (May 22, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/14164 ### Check Existing Issues - [x] I have searched the existing issues and discussions. ### Problem Description Hi, this is a recurring topic [13304](https://github.com/open-webui/open-webui/discussions/13304) Judging by the reactions, many people would appreciate to get context chunks used for final generation step in RAG via API. I believe this should be pretty much standard in any production system. Context chunks are used not only for a reference, but also for any response evaluations based on RAGAS, DeepEval or others... There was a rejected PR addressing this [13378](https://github.com/open-webui/open-webui/pull/13378) This might be working with enabled streaming response, but that is not trivial to process for many of us, I presume. Thank you for the consideration! ### Desired Solution you'd like I would think retrieving RAG context/chunks would be configurable via /api/chat/completions payload object, similar as: > payload = { > 'model': model, > 'messages': [{'role': 'user', 'content': query}], > 'files': [{'type': 'collection', 'id': collection_id}] > 'retrieve_context': True > } ### Alternatives Considered _No response_ ### Additional Context _No response_

GiteaMirror closed this issue

2026-04-19 22:54:20 -05:00

GiteaMirror commented

2026-04-19 22:54:20 -05:00

@daschuchmann commented on GitHub (May 22, 2025):

This would be so nice to have, making the api finally capable to return the knowledge

@daschuchmann commented on GitHub (May 22, 2025): This would be so nice to have, making the api finally capable to return the knowledge

GiteaMirror commented

2026-04-19 22:54:21 -05:00

@tjbck commented on GitHub (May 26, 2025):

Addressed with 2d5b82df8c.

@tjbck commented on GitHub (May 26, 2025): Addressed with 2d5b82df8c9f48e1211dae3394e6eb2f5694cb6c.

GiteaMirror referenced this issue

2026-04-20 05:21:12 -05:00

[PR #17162] [MERGED] perf: fix N+1 query issue in get_tools method #24341

GiteaMirror referenced this issue

2026-04-25 12:23:00 -05:00

[PR #17162] [MERGED] perf: fix N+1 query issue in get_tools method #39971

GiteaMirror referenced this issue