[GH-ISSUE #10187] RAG: result duplicates #54467

Closed
opened 2026-05-05 16:17:21 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @mkhludnev on GitHub (Feb 17, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/10187

Discussed in https://github.com/open-webui/open-webui/discussions/10168

Originally posted by mkhludnev February 17, 2025
it's loosely related #8379.
When a several queries generated by the task. We get a result list per each of them. It's obvious that there are documents occurring in a few of the lists, then merge_and_sort_query_results doesn't does not deduplicate results, and we've got repeating docs in RAG context, beside of wasting LLM contexts with useless repetition, it might trigger repetition in the output.
Why don't drop duplicates in merge_and_sort_query_results?
May I come up with PR for it?

Originally created by @mkhludnev on GitHub (Feb 17, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/10187 ### Discussed in https://github.com/open-webui/open-webui/discussions/10168 <div type='discussions-op-text'> <sup>Originally posted by **mkhludnev** February 17, 2025</sup> it's loosely related #8379. When a several queries generated by the task. We get a result list per each of them. It's obvious that there are documents occurring in a few of the lists, then `merge_and_sort_query_results` doesn't does not deduplicate results, and we've got repeating docs in RAG context, beside of wasting LLM contexts with useless repetition, it might trigger repetition in the output. Why don't drop duplicates in `merge_and_sort_query_results`? May I come up with PR for it? </div>
Author
Owner

@tjbck commented on GitHub (Feb 17, 2025):

PR Welcome!

<!-- gh-comment-id:2663594090 --> @tjbck commented on GitHub (Feb 17, 2025): PR Welcome!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#54467