[GH-ISSUE #19098] issue: Prompt & context duplication when RAG template is used #34301

Closed
opened 2026-04-25 08:14:29 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @matiboux on GitHub (Nov 11, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/19098

Check Existing Issues

  • I have searched for any existing and/or related issues.
  • I have searched for any existing and/or related discussions.
  • I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
  • I am using the latest version of Open WebUI.

Installation Method

Git Clone

Open WebUI Version

v0.6.36

Ollama Version (if applicable)

No response

Operating System

Ubuntu 24.04 LTS (WSL2)

Browser (if applicable)

No response

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

When a tool is selected and invoked by the model, the prompt sent to the model should be the message formatted from the RAG template, which contains the context (tool results) and the user prompt.

Actual Behavior

When a tool is selected and used by the model, the prompt sent to the model is the message formatted from the RAG template prepended to the existing user message. The RAG template includes both the context (tool results) and the user prompt, but the existing user message also contains these, resulting in duplicated context and instructions.

Steps to Reproduce

  1. Start Ubuntu 24.04 LTS in WSL2.
  2. Set up Open WebUI per documentation:
    • In one terminal: npm run dev to start the frontend.
    • In another terminal: run sh dev.sh from the backend/ directory to start the backend.
  3. In parallel, start a custom API that proxies OpenAI-compatible requests from the /chat/completions endpoint to a real OpenAI-compatible API, and whose only other role is to print out:
    • All request parameters
    • System and user messages
  4. In the Open WebUI interface, create the admin user.
  5. Go to admin settings and enter a connection setting pointing to the custom OpenAI-compatible proxy API.
  6. In Workspace, create a new tool using the default provided code.
  7. Create a new chat with any model and enable the tool just created.
  8. Type (or ask): "What time is it?"
  9. Inspect the printout in the custom API terminal window: observe that both system/context and prompt are duplicated in the API request.

Note: Alternatively, you could add logs in the backend to print the request parameters, but using a custom API ensures backend code is untouched for this test.

Logs & Screenshots

Below are actual outputs from the custom API proxy used during testing. These show message flow and context duplication during tool invocation.

Tool Selection Request:

Messages since the last assistant response:
- system
```
Available Tools: [{"name": "calculator", "description": "\n        Calculate the result of an equation.\n        ", "parameters": {"properties": {"equation": {"description": "The mathematical equation to calculate.", "type": "string"}}, "required": ["equation"], "type": "object"}}, {"name": "get_current_time", "description": "\n        Get the current time in a more human-readable format.\n        ", "parameters": {"properties": {}, "type": "object"}}, {"name": "get_current_weather", "description": "\n        Get the current weather for a given city.\n        ", "parameters": {"properties": {"city": {"default": "New York, NY", "description": "Get the current weather for a given city.", "type": "string"}}, "type": "object"}}, {"name": "get_user_name_and_email_and_id", "description": "\n        Get the user name, Email and ID from the user object.\n        ", "parameters": {"properties": {}, "type": "object"}}]

Your task is to choose and return the correct tool(s) from the list of available tools based on the query. Follow these guidelines:

- Return only the JSON object, without any additional text or explanation.

- If no tools match the query, return an empty array: 
   {
     "tool_calls": []
   }

- If one or more tools match the query, construct a JSON response containing a "tool_calls" array with objects that include:
   - "name": The tool's name.
   - "parameters": A dictionary of required parameters and their corresponding values.

The format for the JSON response is strictly:
{
  "tool_calls": [
    {"name": "toolName1", "parameters": {"key1": "value1"}},
    {"name": "toolName2", "parameters": {"key2": "value2"}}
  ]
}
```
- user
```
Query: History:
USER: """what time is it?"""
Query: what time is it?
```

LLM Chat Query:

Messages since the last assistant response:
- user
```
### Task:
Respond to the user query using the provided context, incorporating inline citations in the format [id] **only when the <source> tag includes an explicit id attribute** (e.g., <source id="1">).

### Guidelines:
- If you don't know the answer, clearly state that.
- If uncertain, ask the user for clarification.
- Respond in the same language as the user's query.
- If the context is unreadable or of poor quality, inform the user and provide the best possible answer.
- If the answer isn't present in the context but you possess the knowledge, explain this to the user and provide the answer using your own understanding.
- **Only include inline citations using [id] (e.g., [1], [2]) when the <source> tag includes an id attribute.**
- Do not cite if the <source> tag does not contain an id attribute.
- Do not use XML tags in your response.
- Ensure citations are concise and directly related to the information provided.

### Example of Citation:
If the user asks about a specific topic and the information is found in a source with a provided id attribute, the response should include the citation like in the following example:
* "According to the study, the proposed method increases efficiency by 20% [1]."

### Output:
Provide a clear and direct response to the user's query, including inline citations in the format [id] only when the <source> tag with id attribute is present in the context.

<context>
<source id="1" name="test/get_current_time">Current Date and Time = Tuesday, November 11, 2025, 12:56:59 AM</source>
</context>

<user_query>
what time is it?
</user_query>

what time is it?

Tool `test/get_current_time` Output: Current Date and Time = Tuesday, November 11, 2025, 12:56:59 AM
```

Additional Information

Related code responsible for the duplication:

I will create a PR to attempt to resolve this issue.

Note: My testing focused on RAG template use in conjunction with tools. I have not evaluated RAG template behavior with documents.

Originally created by @matiboux on GitHub (Nov 11, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/19098 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Git Clone ### Open WebUI Version v0.6.36 ### Ollama Version (if applicable) _No response_ ### Operating System Ubuntu 24.04 LTS (WSL2) ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior When a tool is selected and invoked by the model, the prompt sent to the model should be the message formatted from the RAG template, which contains the context (tool results) and the user prompt. ### Actual Behavior When a tool is selected and used by the model, the prompt sent to the model is the message formatted from the RAG template prepended to the existing user message. The RAG template includes both the context (tool results) and the user prompt, but the existing user message also contains these, resulting in duplicated context and instructions. ### Steps to Reproduce 1. Start Ubuntu 24.04 LTS in WSL2. 2. Set up Open WebUI per documentation: - In one terminal: `npm run dev` to start the frontend. - In another terminal: run `sh dev.sh` from the `backend/` directory to start the backend. 3. In parallel, start a custom API that proxies OpenAI-compatible requests from the `/chat/completions` endpoint to a real OpenAI-compatible API, and whose only other role is to print out: - All request parameters - System and user messages 4. In the Open WebUI interface, create the admin user. 5. Go to admin settings and enter a connection setting pointing to the custom OpenAI-compatible proxy API. 6. In Workspace, create a new tool using the default provided code. 7. Create a new chat with any model and enable the tool just created. 8. Type (or ask): "What time is it?" 9. Inspect the printout in the custom API terminal window: observe that both system/context and prompt are duplicated in the API request. Note: Alternatively, you could add logs in the backend to print the request parameters, but using a custom API ensures backend code is untouched for this test. ### Logs & Screenshots Below are actual outputs from the custom API proxy used during testing. These show message flow and context duplication during tool invocation. **Tool Selection Request:** ````` Messages since the last assistant response: - system ``` Available Tools: [{"name": "calculator", "description": "\n Calculate the result of an equation.\n ", "parameters": {"properties": {"equation": {"description": "The mathematical equation to calculate.", "type": "string"}}, "required": ["equation"], "type": "object"}}, {"name": "get_current_time", "description": "\n Get the current time in a more human-readable format.\n ", "parameters": {"properties": {}, "type": "object"}}, {"name": "get_current_weather", "description": "\n Get the current weather for a given city.\n ", "parameters": {"properties": {"city": {"default": "New York, NY", "description": "Get the current weather for a given city.", "type": "string"}}, "type": "object"}}, {"name": "get_user_name_and_email_and_id", "description": "\n Get the user name, Email and ID from the user object.\n ", "parameters": {"properties": {}, "type": "object"}}] Your task is to choose and return the correct tool(s) from the list of available tools based on the query. Follow these guidelines: - Return only the JSON object, without any additional text or explanation. - If no tools match the query, return an empty array: { "tool_calls": [] } - If one or more tools match the query, construct a JSON response containing a "tool_calls" array with objects that include: - "name": The tool's name. - "parameters": A dictionary of required parameters and their corresponding values. The format for the JSON response is strictly: { "tool_calls": [ {"name": "toolName1", "parameters": {"key1": "value1"}}, {"name": "toolName2", "parameters": {"key2": "value2"}} ] } ``` - user ``` Query: History: USER: """what time is it?""" Query: what time is it? ``` ````` **LLM Chat Query:** ````` Messages since the last assistant response: - user ``` ### Task: Respond to the user query using the provided context, incorporating inline citations in the format [id] **only when the <source> tag includes an explicit id attribute** (e.g., <source id="1">). ### Guidelines: - If you don't know the answer, clearly state that. - If uncertain, ask the user for clarification. - Respond in the same language as the user's query. - If the context is unreadable or of poor quality, inform the user and provide the best possible answer. - If the answer isn't present in the context but you possess the knowledge, explain this to the user and provide the answer using your own understanding. - **Only include inline citations using [id] (e.g., [1], [2]) when the <source> tag includes an id attribute.** - Do not cite if the <source> tag does not contain an id attribute. - Do not use XML tags in your response. - Ensure citations are concise and directly related to the information provided. ### Example of Citation: If the user asks about a specific topic and the information is found in a source with a provided id attribute, the response should include the citation like in the following example: * "According to the study, the proposed method increases efficiency by 20% [1]." ### Output: Provide a clear and direct response to the user's query, including inline citations in the format [id] only when the <source> tag with id attribute is present in the context. <context> <source id="1" name="test/get_current_time">Current Date and Time = Tuesday, November 11, 2025, 12:56:59 AM</source> </context> <user_query> what time is it? </user_query> what time is it? Tool `test/get_current_time` Output: Current Date and Time = Tuesday, November 11, 2025, 12:56:59 AM ``` ````` ### Additional Information Related code responsible for the duplication: - Prepends the RAG template formatted text before the prompt in the user message: [backend/open_webui/utils/middleware.py#L1493-L1502](https://github.com/open-webui/open-webui/blob/e0d5de16978786b8a7538adf1efcde5258f38faf/backend/open_webui/utils/middleware.py#L1493-L1502) - Appends the tool result context to the user message, duplicating it as well: [backend/open_webui/utils/middleware.py#L484-L488](https://github.com/open-webui/open-webui/blob/e0d5de16978786b8a7538adf1efcde5258f38faf/backend/open_webui/utils/middleware.py#L484-L488) I will create a PR to attempt to resolve this issue. Note: My testing focused on RAG template use in conjunction with tools. I have not evaluated RAG template behavior with documents.
GiteaMirror added the bug label 2026-04-25 08:14:29 -05:00
Author
Owner

@matiboux commented on GitHub (Nov 11, 2025):

The PR #19099 attempts to resolve the issue without modifying this code block:

e0d5de1697/backend/open_webui/utils/middleware.py (L484-L488)

This code block seems out of place, given the comment above it stating that it is for the case when "citation is not enabled for this tool". However, it is executed regardless of that condition, at the same time as the neighboring code for when "citation is enabled for this tool".

After the changes made in PR #19099, the user message is now overwritten using the RAG template, so this code block does not have an effect in that context.

It is recommended that this code block be reviewed to determine if it is still necessary or if it could potentially introduce issues in other scenarios.

<!-- gh-comment-id:3514647997 --> @matiboux commented on GitHub (Nov 11, 2025): The PR #19099 attempts to resolve the issue without modifying this code block: https://github.com/open-webui/open-webui/blob/e0d5de16978786b8a7538adf1efcde5258f38faf/backend/open_webui/utils/middleware.py#L484-L488 This code block seems out of place, given the comment above it stating that it is for the case when "citation is not enabled for this tool". However, it is executed regardless of that condition, at the same time as the neighboring code for when "citation is enabled for this tool". After the changes made in PR #19099, the user message is now overwritten using the RAG template, so this code block does not have an effect in that context. It is recommended that this code block be reviewed to determine if it is still necessary or if it could potentially introduce issues in other scenarios.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#34301