mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 19:38:46 -05:00
[GH-ISSUE #22154] issue: RAG template duplication for web search tool & disabling citations ability not working #58310
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Master-Pr0grammer on GitHub (Mar 2, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/22154
Check Existing Issues
Installation Method
Pip Install
Open WebUI Version
v0.8.7 (latest)
Ollama Version (if applicable)
No response
Operating System
Ubuntu 22.04
Browser (if applicable)
Chrome
Confirmation
README.md.Expected Behavior
When the model decides to use the native web search tools in open webui, the RAG template for citations should be pasted into they system prompt ONCE. and if citations are turned off for the model, this should not happen at all.
Actual Behavior
For every single web tool call, the RAG template is pasted into the system prompt. so if the model calls the tool 5 times, the RAG prompt will be pasted and completely duplicated 5 times into the system prompt.
This can confuse the model, degrade performance, and also makes inference much more costly to run since the prompt cache needs to be fully reprocessed every time the model calls a web search tool.
Steps to Reproduce
launch open webui
Give a model access to the web search tools
ask it to perform deep research and invoke web search tools several times
look at the openai API logs and see that the system prompt contains duplicates of the RAG template
Repeat the above but with
citationsdisabled for the model, and see that it still pastes the citations/RAG prompt into the system prompt, and still duplicates it.Logs & Screenshots
Here is the system prompt after a few tool calls in the logs of my OpenAI API backend:
Additional Information
fixing this issue, and adding the ability to prevent the system prompt from being dynamically modified in such a way is extremely important because doing so drastically slows down inference, the same process in open webui takes 10x as long as it does in LM Studio on the same hardware.
At minimum, having the ability to turn off citations, and it actually turning off citations for web search, which would enable functional prompt caching should work properly as it is intended.
As a suggestion, to keep the citations feature and have better prompt caching, you could append the instructions to the system prompt for citations ONCE before any chat, then show the source ID for citations in the tool output. this would allow citations to function as intended while massively speeding up inference and making it much cheaper to operate by utilizing prompt caching.
@Classic298 commented on GitHub (Mar 2, 2026):
Thanks for the detailed report! I looked into this and here's what I found:
RAG Template Duplication
The duplication you're seeing happens specifically because you have the RAG_SYSTEM_CONTEXT environment variable set to True (the default is False). When this is set, the RAG template gets appended to the system message on each tool call iteration in the native function calling loop. For the user message path (the default), the message is properly restored before each re-application, so duplication doesn't happen.
There's a fix coming for this. The system message will be properly saved and restored between tool call iterations, same as we already do for the user message.
Citations Toggle
The "citations" capability toggle in the model editor is a frontend-only display control. It hides the citation UI elements (the source bubbles and clickable markers) but doesn't suppress the backend RAG template injection. This is by design: the RAG context helps the model ground its answers in the provided sources, which is useful regardless of whether the citation markers are rendered in the UI.
If you want to fully prevent the RAG template from being injected, you would need to customize the RAG_TEMPLATE config to be minimal/empty or disable the RAG prompt from being sent altogether, I believe in the admin interface settings, rather than toggling citations (which again is only for the frontend citation rendering, by design).
Prompt Caching
Your observation about prompt caching is valid for your setup. With RAG_SYSTEM_CONTEXT=True enabled AND the duplication bug, the system prompt keeps changing and growing on each tool call, which does invalidate the prompt cache. After the duplication fix, the system message will be stable across iterations (just updated with the accumulated sources rather than re-appended), which should help with caching.
@Classic298 commented on GitHub (Mar 2, 2026):
testing wanted https://github.com/open-webui/open-webui/pull/22157
@Classic298 commented on GitHub (Mar 2, 2026):
e0d4c3ec92