[PR #12971] [MERGED] fix: Align backend <source> tag indexing with frontend citation grouping #23067

Closed
opened 2026-04-20 04:36:32 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/open-webui/open-webui/pull/12971
Author: @tth37
Created: 4/17/2025
Status: Merged
Merged: 4/18/2025
Merged by: @tjbck

Base: devHead: fix_source_indexing


📝 Commits (1)

  • 79bde6f fix: Align backend tag indexing with frontend citation grouping

📊 Changes

2 files changed (+11 additions, -6 deletions)

View changed files

📝 backend/open_webui/utils/middleware.py (+10 -6)
📝 src/lib/components/chat/Messages/Citations.svelte (+1 -0)

📄 Description

#12811 #12562

Problem

The current backend logic for generating citation indexes within <source> tags in the context string, leads to inconsistencies with how the frontend groups and display citations:

  1. Web Search Results: Web search results lack a file_id in their metadata. The original code used file_id as the key for indexing, causing all web results (where file_id is None) to map to the same index, even though they represent distinct sources. This directly causes issue #12811
  2. Frontend Grouping Mismatch: Sources that the frontend treats as a single citation group (e.g., potentially multiple uploads of files with the same name, or multiple chunks from the same logical source) could be assigned different sequential indices by the backend based on their distinct file_id. This results in generated text like [1][2][3] where the frontend might only display clickable citations for [1][2], leaving subsequent numbers as plain text because they refer to indices not uniquely represented in the frontend's grouped view.

Root Cause

The backend used an internal identifier (file_id) to create unique sequential indices for each document chunk. However, the frontend aggregates citation sources based on a different logic: metadata?.source ?? source?.source?.id ?? 'N/A'.

c5636ff68c/src/lib/components/chat/Messages/Citations.svelte (L47-L57)

This mismatch causes the mapping between the backend-generated <source id="..."> tags and the frontend's citation display to break.

Solution

This PR modifies the backend context generation logic to use the exact same identifier logic as the frontend for creating the citation indices within the <source> tags.

Specifically, the identifier used to group sources and assign the id attribute in <source id="{index}"> is now calculated using:
doc_meta.get("source", None) or source.get("source", {}).get("id", None) or "N/A"

This ensures that:

  • Each unique logical citation source (as identified by the frontend logic) gets a single, consistent index number.
  • Documents belonging to the same logical citation group (e.g., multiple web results grouped under one citation, or multiple chunks from the same file) will now correctly share the same <source id="..."> index in the backend-generated context.
  • The indices ([1], [2], etc.) referenced in the model's generation will directly correspond to the citation blocks rendered by the frontend.

This alignment will hopefully ensure citation references function correctly.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/open-webui/open-webui/pull/12971 **Author:** [@tth37](https://github.com/tth37) **Created:** 4/17/2025 **Status:** ✅ Merged **Merged:** 4/18/2025 **Merged by:** [@tjbck](https://github.com/tjbck) **Base:** `dev` ← **Head:** `fix_source_indexing` --- ### 📝 Commits (1) - [`79bde6f`](https://github.com/open-webui/open-webui/commit/79bde6fa4718205713a12aefc97fa4c0789e8d68) fix: Align backend <source> tag indexing with frontend citation grouping ### 📊 Changes **2 files changed** (+11 additions, -6 deletions) <details> <summary>View changed files</summary> 📝 `backend/open_webui/utils/middleware.py` (+10 -6) 📝 `src/lib/components/chat/Messages/Citations.svelte` (+1 -0) </details> ### 📄 Description ### Related Issue and PR: #12811 #12562 ### Problem The current backend logic for generating citation indexes within `<source>` tags in the context string, leads to inconsistencies with how the frontend groups and display citations: 1. **Web Search Results:** Web search results lack a `file_id` in their metadata. The original code used `file_id` as the key for indexing, causing all web results (where `file_id` is `None`) to map to the same index, even though they represent distinct sources. This directly causes issue #12811 2. **Frontend Grouping Mismatch:** Sources that the frontend treats as a single citation group (e.g., potentially multiple uploads of files with the same name, or multiple chunks from the same logical source) could be assigned different sequential indices by the backend based on their distinct `file_id`. This results in generated text like `[1][2][3]` where the frontend might only display clickable citations for `[1][2]`, leaving subsequent numbers as plain text because they refer to indices not uniquely represented in the frontend's grouped view. ### Root Cause The backend used an internal identifier (`file_id`) to create unique sequential indices for *each document chunk*. However, the frontend aggregates citation sources based on a different logic: `metadata?.source ?? source?.source?.id ?? 'N/A'`. https://github.com/open-webui/open-webui/blob/c5636ff68c4e9ca095cd6458cc740f4a3aa53831/src/lib/components/chat/Messages/Citations.svelte#L47-L57 This mismatch causes the mapping between the backend-generated `<source id="...">` tags and the frontend's citation display to break. ### Solution This PR modifies the backend context generation logic to use the **exact same identifier logic as the frontend** for creating the citation indices within the `<source>` tags. Specifically, the identifier used to group sources and assign the `id` attribute in `<source id="{index}">` is now calculated using: `doc_meta.get("source", None) or source.get("source", {}).get("id", None) or "N/A"` This ensures that: - Each unique logical citation source (as identified by the frontend logic) gets a single, consistent index number. - Documents belonging to the same logical citation group (e.g., multiple web results grouped under one citation, or multiple chunks from the same file) will now correctly share the same `<source id="...">` index in the backend-generated context. - The indices (`[1]`, `[2]`, etc.) referenced in the model's generation will directly correspond to the citation blocks rendered by the frontend. This alignment will hopefully ensure citation references function correctly. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-20 04:36:32 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#23067