mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 11:28:35 -05:00
[GH-ISSUE #12655] issue: RAG for an entire knowledge collection cites the first source #16672
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Elmolesto on GitHub (Apr 9, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/12655
Check Existing Issues
Installation Method
Git Clone
Open WebUI Version
v0.6.2
Ollama Version (if applicable)
No response
Operating System
macOS Sequoia
Browser (if applicable)
Chrome
Confirmation
README.md.Expected Behavior
When using RAG with an entire knowledge collection, the system should accurately cite all relevant sources contributing to the generated answer.
Actual Behavior
When querying using RAG with an entire knowledge collection, the generated response consistently cites only the first source in the collection, regardless of the actual documents retrieved or referenced.
However, when the same prompt is run with individual files selected (instead of the entire collection), the citations are accurate and reflect the true sources of the information. This suggests the issue is specific to how citations are handled when using full collections.
Steps to Reproduce
Logs & Screenshots
RAG on collection: NOT WORKING

RAG on same files: WORKING

Additional Information
After debugging the issue, I found that it might have originated in the
get_sources_from_filesmethod inretrieval/utils.py.Here is the attached JSON dump of the variable
relevant_contextsin both scenariosrelevant_contexts_on_collection.json
relevant_contexts_on_files.json
Specifically, the generation of the sources list for collections leads to incorrect citation behaviour because it only references
documents[0], thereby using only the first document.RELATED: https://github.com/open-webui/open-webui/discussions/10595#discussioncomment-12484708
@almajo commented on GitHub (Apr 9, 2025):
This is being worked on: https://github.com/open-webui/open-webui/pull/12562
@athoik commented on GitHub (Apr 10, 2025):
@almajo thank you! 🥇 It's really a great improvement!
@tjbck please consider having a look on this improvement.
@Elmolesto commented on GitHub (Apr 10, 2025):
Thanks! I've tested the fix, and it's working.
However, this may need to go to a discussion: The context generated for RAG on a collection is shorter than the context generated when you RAG on the same files, loaded as files. This is because of the topK, but here's the question: is this a desired behaviour?
@tjbck commented on GitHub (Apr 10, 2025):
https://github.com/open-webui/open-webui/pull/12562 Merged!
@Elmolesto commented on GitHub (Apr 10, 2025):
@almajo could we expand on this? Should I open discussion?
CC @tjbck
@tjbck commented on GitHub (Apr 10, 2025):
Intended behaviour.
@controldev commented on GitHub (May 14, 2025):
This is still broken (at least for web search) in 0.6.9.