mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 03:18:23 -05:00
[GH-ISSUE #21780] issue: RAG template double injection #35096
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @relic664 on GitHub (Feb 23, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/21780
Check Existing Issues
Installation Method
Docker
Open WebUI Version
v0.8.5
Ollama Version (if applicable)
No response
Operating System
Linux Alpine
Browser (if applicable)
No response
Confirmation
README.md.Expected Behavior
RAG_TEMPLATE is being injected multiple times causing models to hallucinate and improperly call tools. This is a continuation of #21663 which still persists after v0.8.5.
Actual Behavior
RAG_TEMPLATE is injected at most once, preferably at the system message, not the user message. This is
Steps to Reproduce
Logs & Screenshots
Call 1:
2026-02-23 07:11:13.545 | DEBUG | open_webui.utils.chat:generate_chat_completion:165 - generate_chat_completion: {'stream': True, 'model': 'zai-org/glm-5:thinking', 'messages': [{'role': 'system', 'content': '# Assistant\n\nYou are a concise, candid, and intellectually rigorous assistant.\nYou do not flatter the user or hedge unnecessarily. You value accuracy, clarity, and critical thinking over politeness or empathy. If the user’s idea is weak, flawed, or ambiguous, you say so directly and explain why. Avoid filler like “Great question!” or “Of course!”, or "You're absolutely right!" — go straight to substance. Maintain a professional, calm tone, but never defer or soften your reasoning. Your priority is to analyze, not appease. \n\n\n## Safety & etiquette\n- If the user says “don’t browse,” don’t browse.\n- Be transparent about uncertainty. Don’t speculate beyond evidence.\n\n## Response style\n- Start with the answer. Be concise and actionable.\n- Prioritize correctness and accuracy. Push back on user errors if present, but strive to be polite about the push back. \n- Do not praise the user for their points or assertions\n\n\n'}, {'role': 'user', 'content': '### Task:\nRespond to the user query using the provided context, incorporating inline citations in the format [id] only when the tag includes an explicit id attribute (e.g., ).\n\n### Guidelines:\n- If you don't know the answer, clearly state that.\n- If uncertain, ask the user for clarification.\n- Respond in the same language as the user's query.\n- If the context is unreadable or of poor quality, inform the user and provide the best possible answer.\n- If the answer isn't present in the context but you possess the knowledge, explain this to the user and provide the answer using your own understanding.\n- Only include inline citations using [id] (e.g., [1], [2]) when the tag includes an id attribute.\n- Do not cite if the tag does not contain an id attribute.\n- Do not use XML tags in your response.\n- Ensure citations are concise and directly related to the information provided.\n\n### Example of Citation:\nIf the user asks about a specific topic and the information is found in a source with a provided id attribute, the response should include the citation like in the following example:\n* "According to the study, the proposed method increases efficiency by 20% [1]."\n\n### Output:\nProvide a clear and direct response to the user's query, including inline citations in the format [id] only when the tag with id attribute is present in the context.\n\n\n\n\n\nIn unraid, for the app backup and restore plugin, I have an app I want to back up at
/mnt/user/appdata/appbut I want to exclude/mnt/user/appdata/app/.cachehow can I do so?\nIn unraid, for the app backup and restore plugin, I have an app I want to back up at/mnt/user/appdata/appbut I want to exclude/mnt/user/appdata/app/.cachehow can I do so?'}, {'role': 'assistant', 'content': 'The user is asking about the Unraid App Backup and Restore plugin and wants to exclude a specific subdirectory from a backup. This is a specific technical question about Unraid configuration. Let me search for information about this plugin and how to exclude directories.', 'tool_calls': [{'id': 'call_jf21ilhw', 'type': 'function', 'function': {'name': 'search_web', 'arguments': '{"query": "Unraid App Backup Restore plugin exclude directory configuration"}'}}]}, {'role': 'tool', 'tool_call_id': 'call_jf21ilhw', 'content': '[{"title": "[Plugin] Appdata.Backup - Plugin Support - Unraid Forums", "link": "https://forums.unraid.net/topic/137710-plugin-appdatabackup/", "snippet": "Appdata.Backup Support Thread This is the support thread for appdata.backup (formerly known as ca.backup2). This plugin primary takes care of your appdata backup! It allows you to configure backup settings for each of your docker containers. Flash and VM meta backup is integrated as well! If you encounter any issues, post it here with the debug log file attached! For your beta feedback, please ..."}]'}],Call 2:
2026-02-23 07:11:22.438 | DEBUG | open_webui.utils.chat:generate_chat_completion:165 - generate_chat_completion: {'stream': True, 'model': 'zai-org/glm-5:thinking', 'messages': [{'role': 'system', 'content': '# Assistant\n\nYou are a concise, candid, and intellectually rigorous assistant.\nYou do not flatter the user or hedge unnecessarily. You value accuracy, clarity, and critical thinking over politeness or empathy. If the user’s idea is weak, flawed, or ambiguous, you say so directly and explain why. Avoid filler like “Great question!” or “Of course!”, or "You're absolutely right!" — go straight to substance. Maintain a professional, calm tone, but never defer or soften your reasoning. Your priority is to analyze, not appease. \n\n\n## Safety & etiquette\n- If the user says “don’t browse,” don’t browse.\n- Be transparent about uncertainty. Don’t speculate beyond evidence.\n\n## Response style\n- Start with the answer. Be concise and actionable.\n- Prioritize correctness and accuracy. Push back on user errors if present, but strive to be polite about the push back. \n- Do not praise the user for their points or assertions\n\n\n'}, {'role': 'user', 'content': '### Task:\nRespond to the user query using the provided context, incorporating inline citations in the format [id] only when the tag includes an explicit id attribute (e.g., ).\n\n### Guidelines:\n- If you don't know the answer, clearly state that.\n- If uncertain, ask the user for clarification.\n- Respond in the same language as the user's query.\n- If the context is unreadable or of poor quality, inform the user and provide the best possible answer.\n- If the answer isn't present in the context but you possess the knowledge, explain this to the user and provide the answer using your own understanding.\n- Only include inline citations using [id] (e.g., [1], [2]) when the tag includes an id attribute.\n- Do not cite if the tag does not contain an id attribute.\n- Do not use XML tags in your response.\n- Ensure citations are concise and directly related to the information provided.\n\n### Example of Citation:\nIf the user asks about a specific topic and the information is found in a source with a provided id attribute, the response should include the citation like in the following example:\n* "According to the study, the proposed method increases efficiency by 20% [1]."\n\n### Output:\nProvide a clear and direct response to the user's query, including inline citations in the format [id] only when the tag with id attribute is present in the context.\n\n\n\n\n\n\n\nIn unraid, for the app backup and restore plugin, I have an app I want to back up at
/mnt/user/appdata/appbut I want to exclude/mnt/user/appdata/app/.cachehow can I do so?\n### Task:\nRespond to the user query using the provided context, incorporating inline citations in the format [id] only when the tag includes an explicit id attribute (e.g., ).\n\n### Guidelines:\n- If you don't know the answer, clearly state that.\n- If uncertain, ask the user for clarification.\n- Respond in the same language as the user's query.\n- If the context is unreadable or of poor quality, inform the user and provide the best possible answer.\n- If the answer isn't present in the context but you possess the knowledge, explain this to the user and provide the answer using your own understanding.\n- Only include inline citations using [id] (e.g., [1], [2]) when the tag includes an id attribute.\n- Do not cite if the tag does not contain an id attribute.\n- Do not use XML tags in your response.\n- Ensure citations are concise and directly related to the information provided.\n\n### Example of Citation:\nIf the user asks about a specific topic and the information is found in a source with a provided id attribute, the response should include the citation like in the following example:\n* "According to the study, the proposed method increases efficiency by 20% [1]."\n\n### Output:\nProvide a clear and direct response to the user's query, including inline citations in the format [id] only when the tag with id attribute is present in the context.\n\n\n\n\n\nIn unraid, for the app backup and restore plugin, I have an app I want to back up at/mnt/user/appdata/appbut I want to exclude/mnt/user/appdata/app/.cachehow can I do so?\nIn unraid, for the app backup and restore plugin, I have an app I want to back up at/mnt/user/appdata/appbut I want to exclude/mnt/user/appdata/app/.cachehow can I do so?'}, {'role': 'assistant', 'content': 'The user is asking about the Unraid App Backup and Restore plugin and wants to exclude a specific subdirectory from a backup. This is a specific technical question about Unraid configuration. Let me search for information about this plugin and how to exclude directories.', 'tool_calls': [{'id': 'call_jf21ilhw', 'type': 'function', 'function': {'name': 'search_web', 'arguments': '{"query": "Unraid App Backup Restore plugin exclude directory configuration"}'}}]}, {'role': 'tool', 'tool_call_id': 'call_jf21ilhw', 'content': '[{"title": "[Plugin] Appdata.Backup - Plugin Support - Unraid Forums", "link": "https://forums.unraid.net/topic/137710-plugin-appdatabackup/", "snippet": "Appdata.Backup Support Thread This is the support thread for appdata.backup (formerly known as ca.backup2). This plugin primary takes care of your appdata backup! It allows you to configure backup settings for each of your docker containers. Flash and VM meta backup is integrated as well! If you encounter any issues, post it here with the debug log file attached! For your beta feedback, please ..."}]'}, {'role': 'assistant', 'content': 'Good, I found the Unraid forums thread about the Appdata.Backup plugin. Let me fetch this page for more details on how to exclude directories.', 'tool_calls': [{'id': 'call_ksgs2j9s', 'type': 'function', 'function': {'name': 'fetch_url', 'arguments': '{"url": "https://forums.unraid.net/topic/137710-plugin-appdatabackup/"}'}}, {'id': 'call_gtfkzogf', 'type': 'function', 'function': {'name': 'search_web', 'arguments': '{"query": "### Task:\nRespond to the user query using the provided context, incorporating inline citations in the format [id] only when the tag includes an explicit id attribute (e.g., <source id=\"1\">).\n\n### Guidelines:\n- If you don't know the answer, clearly state that.\n- If uncertain, ask the user for clarification.\n- Respond in the same language as the user's query.\n- If the context is unreadable or of poor quality, inform the user and provide the best possible answer.\n- If the answer isn't present in the context but you possess the knowledge, explain this to the user and provide the answer using your own understanding.\n- Only include inline citations using [id] (e.g., [1], [2]) when the tag includes an id attribute.\n- Do not cite if the tag does not contain an id attribute.\n- Do not use XML tags in your response.\n- Ensure citations are concise and directly related to the information provided.\n\n### Example of Citation:\nIf the user asks about a specific topic and the information is found in a source with a provided id attribute, the response should include the citation like in the following example:\n* \"According to the study, the proposed method increases efficiency by 20% [1].\"\n\n### Output:\nProvide a clear and direct response to the user's query, including inline citations in the format [id] only when the tag with id attribute is present in the context.\n\n\n<source id=\"1\" name=\"search_web\">\n\n\nIn unraid, for the app backup and restore plugin, I have an app I want to back up at/mnt/user/appdata/appbut I want to exclude/mnt/user/appdata/app/.cachehow can I do so?\nIn unraid, for the app backup and restore plugin, I have an app I want to back up at/mnt/user/appdata/appbut I want to exclude/mnt/user/appdata/app/.cachehow can I do so?"}'}}, {'id': 'call_nix2ac48', 'type': 'function', 'function': {'name': 'calculate_timestamp', 'arguments': '{}'}}]}, {'role': 'tool', 'tool_call_id': 'call_ksgs2j9s', 'content': 'Jump to content\nView in the app\nA better way to browse. Learn more.\nLearn more ×Dismiss\nClose\nUnraid\nA full-screen app on your home screen with push notifications, badges and more.\nTo install this app on iOS and iPadOS\n 1. Tap the \n 2. Scroll the menu and tap Add to Home Screen.\n 3. Tap Add in the top-right corner.\n\n\nTo install this app on Android\n 1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.\n 2. Tap Add to Home screen or Install app.\n 3. Confirm by tapping Install.\n\n\n Unraid Unleash Your Hardware \n * Sign In\n * Search\n * Menu\n\n\nMessage added by KluthR , July 1, 2025Jul 1\n## Feature freeze\nPlease read: https://forums.unraid.net/topic/137710-plugin-appdatabackup/page/77/#findComment-1564584\n * Reply to this topic\n\n\n * Page 1 of 85 \n\n\n * Page 1 of 85 \n\n\n## Join the conversation\nYou can post now and register later. If you have an account, sign in now to post with your account. Note: Your post will require moderator approval before it will be visible.\n Followers \n Go to topic listing \n * Existing user? Sign In \n * Sign Up \n\n\nSearch...\nClose\n#### Configure browser push notifications\nclose\n##### Chrome (Android)\n 1. Tap the lock icon next to the address bar.\n 2. Tap Permissions → Notifications.\n 3. Adjust your preference.\n\n\n##### Chrome (Desktop)\n 1. Click the padlock icon in the address bar.\n 2. Select Site settings.\n 3. Find Notifications and adjust your preference.\n\n\n##### Safari (iOS 16.4+)\n 1. Ensure the site is installed via Add to Home Screen.\n 2. Open Settings App → Notifications.\n 3. Find your app name and adjust your preference.\n\n\n##### Safari (macOS)\n 1. Go to Safari → Preferences.\n 2. Click the Websites tab.\n 3. Select Notifications in the sidebar.\n 4. Find this website and adjust your preference.\n\n\n##### Edge (Android)\n 1. Tap the lock icon next to the address bar.\n 2. Tap Permissions.\n 3. Find Notifications and adjust your preference.\n\n\n##### Edge (Desktop)\n 1. Click the padlock icon in the address bar.\n 2. Click Permissions for this site.\n 3. Find Notifications and adjust your preference.\n\n\n##### Firefox (Android)\n 1. Go to Settings → Site permissions.\n 2. Tap Notifications.\n 3. Find this site in the list and adjust your preference.\n\n\n##### Firefox (Desktop)\n 1. Open Firefox Settings.\n 2. Search for Notifications.\n 3. Find this site in the list and adjust your preference.\n\n\n'}, {'role': 'tool', 'tool_call_id': 'call_gtfkzogf', 'content': '[{"title": "Citations in old format will break with \"Oops! No text generated from Ollama ...", "link": "https://github.com/open-webui/open-webui/discussions/7210", "snippet": "Nov 22, 2024 ... ### Task: Respond to the user query using the provided context, incorporating inline ... only when the <source_id> tag is explicitly provided**\xa0..."}]'}, {'role': 'tool', 'tool_call_id': 'call_nix2ac48', 'content': '{"current_timestamp": 1771848682, "current_iso": "2026-02-23T12:11:22.375892+00:00", "calculated_timestamp": 1771848682, "calculated_iso": "2026-02-23T12:11:22.375892+00:00"}'}],Additional Information
Tool calls end up processed using
add_or_update_user_message:But the flag
append=Falsedoesn't update, it prepends:Result is that the template is added multiple times to the context, since this is done for each tool call.
@relic664 commented on GitHub (Feb 23, 2026):
My working patch has been to just force the RAG_SYSTEM_CONTEXT into the system context, instead of user, to stop the hallucination and bad tool calls, but a real fix would be to fix the behavior where
add_or_update_user_messageis always adding to the context, regardless ofappendvalue.@tjbck commented on GitHub (Feb 23, 2026):
@relic664 could you pull the latest dev and confirm the issue has been resolved?
@relic664 commented on GitHub (Feb 23, 2026):
I tested docker tag
git-a52e6c2(which isdevas of writing) and the multiple template injection is fixed.However, I still have issues where models hallucinate the RAG template into tool calls. You can blame the model in that instance and it's not strictly a Open-WebUI problem, but I do think how this is handled during native tool calling warrants some thought (i.e, should the context be injected in a non-user channel like system?).
I can continue to patch my system locally (https://github.com/open-webui/open-webui/issues/21663#issuecomment-3942078156), but I'm sure others will have a similar issue. Especially considering I've observed this with larger models like GLM-5 (thinking).
@Classic298 commented on GitHub (Feb 24, 2026):
Thanks for confirming it fixed. The bug you are still experiencing might be model dependent behaviour
If you are self-hosting: check the context size
@relic664 commented on GitHub (Feb 24, 2026):
It happens with multiple open source models, self hosted, nano gpt, openrouter, any number of providers. Its simply a byproduct of sticking the template into the user chanel -- models can and will hallucinate the template into tool calls.
If anyone else has this issue, a simple patch (https://github.com/open-webui/open-webui/issues/21663#issuecomment-3942078156) will force the template into the system channel instead, eliminating the issue while still retaining citations.
@Classic298 commented on GitHub (Feb 24, 2026):
@relic664 this has never been reported in ... years despite the RAG prompt always being in the user message. Nothing changed about the prompt's placement.
I'll look into it again, but it would also help if you could tell me what models you use and with what configs. I really try to reproduce, but i can't.
@relic664 commented on GitHub (Feb 24, 2026):
I will grab some performance data for you in the coming days. And the issue is around native tool calling specifically, and the "native" mode was only added about a month ago in 0.7.0.
@Classic298 commented on GitHub (Feb 24, 2026):
Yes. I use native mode exclusively. I just don't see the rag prompt ever get injected by the model into the query
@relic664 commented on GitHub (Feb 25, 2026):
Here's what I have so far. Models denoted (-) mean they completed the task without hallucinating. The patched column is where I'm instead sticking the rag template into system context instead of user context. In both cases, I modified web search to only return one result.
Prompt:
It's worth noting that the system and user context actually do have different meaning to the model. Post-training trains models to attend to the system prompt to shape behavior, but user context is to be operated on/processed. The models obviously perform differently, with GLM 5 being the worst, likely due to differences in the post-training regime.
The fact that this hasn't been reported before is related to native tool calling and how models are trained to use tools. Typically user context/data is used by models to perform tool calls, hence it leaking into the tool call.
For example, models are fine tuned with supervised datasets that pair user context with tool call specs/args.
@Classic298 commented on GitHub (Feb 25, 2026):
Hmm mhm
Maybe it's worth considering moving the rag prompt to the system prompt
Thanks for these detailed investigation this is extraordinary
@Classic298 commented on GitHub (Feb 25, 2026):
@relic664 testing wanted
https://github.com/open-webui/open-webui/pull/21855
@relic664 commented on GitHub (Feb 25, 2026):
@Classic298 gave it a test just now, looks good. All the models passed w/o hallucination under your PR.
@Classic298 commented on GitHub (Feb 25, 2026):
awesomeness @tjbck
@Classic298 commented on GitHub (Feb 25, 2026):
thanks so much for testing and producing top notch benchmark results. Rare.
@relic664 commented on GitHub (Feb 27, 2026):
@Classic298 what happened with the PR? I saw that it was closed, but unclear what the resolution is/was.
@Classic298 commented on GitHub (Feb 27, 2026):
@relic664 moving it to system message not wanted
Why?
I am not in the know, but i suspect because it breaks the kv cache
And because at the end of the day IT IS model dependent behavior, and I also still cannot reproduce these hallucinations and have never seen them.
@relic664 commented on GitHub (Feb 27, 2026):
@tjbck
Would you mind elaborating? I can see how the kv cache does break under the proposed PR and I'm happy to contribute to fixing the issue without breaking the kv cache. There's other solutions that could be proposed. Are you strictly against the template being in the system message under any circumstance?
As long as the template remains in the user channel, the possibility to hallucinate the template into a tool call remains. Yes there is some model dependency, but I think it would be best to have a model-agnostic native tool calling pipeline.
@tjbck commented on GitHub (Mar 8, 2026):
RAG_SYSTEM_CONTEXT=True should be used.
@relic664 commented on GitHub (Mar 9, 2026):
Okay looking at this more, yes
RAG_SYSTEM_CONTEXT=Truedoes resolve this issue, but at the expense of KV-cache as was mentioned above.This solves the immediate problem, but a more pragmatic approach for the future would be to split
RAG_TEMPLATEup so the context goes to user, instructions to system. I will give this some thought and propose a change as it's much less straightforward than sending everything to the system context.Edit: I made some passes at splitting the template up from the context - so the sources go into the user channel, but instructions into the system channel. While this works, I found models performed worse in terms of citing the correct source - likely due to the disconnect between the system instruction (system context) and sources (user context), even when experimenting with playing with the default rag template. Generally this approach seems to be empirically more brittle, particularly for smaller models.
So, in short, I think if models are hallucinating the guidance needs to be
RAG_SYSTEM_CONTEXT=Trueas forcing everything into system context still retains citation accuracy while significantly reducing the chance of hallucinating the template into a tool call, at the expense of breaking the KV cache.