[GH-ISSUE #21726] issue: RAG duplication when view_knowledge_file tool is used #58220

New Issue

GiteaMirror · 2026-05-05T22:35:53-05:00

GiteaMirror commented

2026-05-05 22:35:53 -05:00

Originally created by @susmitds on GitHub (Feb 22, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/21726

Check Existing Issues

I have searched for any existing and/or related issues.
I have searched for any existing and/or related discussions.
I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
I am using the latest version of Open WebUI.

Installation Method

Git Clone

Open WebUI Version

v0.8.3

Ollama Version (if applicable)

No response

Operating System

Windows 11

Browser (if applicable)

No response

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

When File Context is disabled, file contents should only be included in the model context if explicitly retrieved via a tool (e.g., view_knowledge_file).

If the model loads a file using view_knowledge_file (tested with both knowledge base files and uploaded files), the file content should appear only once in the context — via the tool output.

The system should not inject the same file content again through a RAG attachment when File Context is turned off.

Actual Behavior

Even with File Context disabled, when the model loads a file using view_knowledge_file, the file content appears twice in the model context:

Once in the tool output (view_knowledge_file response)
Once again through a RAG attachment injection

This results in duplicated file content in the prompt context.

Steps to Reproduce

Upload a .txt file in the chat.
Ask the model to load the file.
Inspect the model-serving logs.

You will observe that the file content appears twice in the assembled context:

In the tool output
In an additional RAG-injected attachment

Logs & Screenshots

srv log_server_r: request: {"stream": true, "model": "glm-4.5", "messages": [{"role": "system", "content": "You are ChatLM."}, {"role": "user", "content": "<attached_files>\n<file type="file" url="cd642bb7-abaf-4aa2-b8a3-bedef513c184" content_type="text/plain" name="test.txt"/>\n</attached_files>\n\ndo you see any file"}, {"role": "assistant", "content": "Yes, I can see that there's a file attached to your message! Here are the details:\n\nAttached File:\n- Name: test.txt\n- Type: text/plain\n- File ID: cd642bb7-abaf-4aa2-b8a3-bedef513c184\n\nThe file appears to be a plain text file. Would you like me to view the contents of this file? I can retrieve and display its content for you if you'd like to see what's inside."}, {"role": "user", "content": "### Task:\nRespond to the user query using the provided context, incorporating inline citations in the format [id] only when the tag includes an explicit id attribute (e.g., <source id="1">).\n\n### Guidelines:\n- If you don't know the answer, clearly state that.\n- If uncertain, ask the user for clarification.\n- Respond in the same language as the user's query.\n- If the context is unreadable or of poor quality, inform the user and provide the best possible answer.\n- If the answer isn't present in the context but you possess the knowledge, explain this to the user and provide the answer using your own understanding.\n- Only include inline citations using [id] (e.g., [1], [2]) when the tag includes an id attribute.\n- Do not cite if the tag does not contain an id attribute.\n- Do not use XML tags in your response.\n- Ensure citations are concise and directly related to the information provided.\n\n### Example of Citation:\nIf the user asks about a specific topic and the information is found in a source with a provided id attribute, the response should include the citation like in the following example:\n* "According to the study, the proposed method increases efficiency by 20% [1]."\n\n### Output:\nProvide a clear and direct response to the user's query, including inline citations in the format [id] only when the tag with id attribute is present in the context.\n\n\n<source id="1" name="test.txt">i am testing\n\n\nsure.\n /nothink"}, {"role": "assistant", "content": "", "tool_calls": [{"id": "JoLoMs5G3Um86efTT1P4pyzHJKCVPjKt", "type": "function", "function": {"name": "view_knowledge_file", "arguments": "{"file_id": "cd642bb7-abaf-4aa2-b8a3-bedef513c184"}"}}]}, {"role": "tool", "tool_call_id": "JoLoMs5G3Um86efTT1P4pyzHJKCVPjKt", "content": "{"id": "cd642bb7-abaf-4aa2-b8a3-bedef513c184", "filename": "test.txt", "content": "i am testing", "updated_at": 1771737304, "created_at": 1771737304}"}], "stream_options": {"include_usage": true}, "_think_mode": "disabled", "chat_template_kwargs": {"enable_thinking": false}, "tools": [{"type": "function", "function": {"name": "run_parallel_sub_agents", "description": "\n Run multiple independent sub-agent tasks in parallel (concurrently).\n\n Use this instead of calling run_sub_agent multiple times when you have\n 2 or more tasks that do NOT depend on each other's results.\n All tasks share the same model and tools but run in isolated contexts,\n so they execute simultaneously and finish much faster than sequential calls.\n\n ", "parameters": {"properties": {"tasks": {"description": "List of task objects. Each must have "description" and "prompt".", "items": {}, "type": "array"}}, "required": ["tasks"], "type": "object"}}}, {"type": "function", "function": {"name": "run_sub_agent", "description": "\n Delegate a task to a sub-agent for autonomous completion.\n\n MANDATORY: If a task requires more than 2 steps of investigation or complex analysis,\n you MUST NOT perform it yourself. Delegate to this tool immediately.\n Only handle simple 1-2 tool call tasks yourself. When in doubt, delegate.\n\n The sub-agent runs in an isolated context with access to the same tools.\n It executes tools in a loop until completion, returning only the final result\n to keep the main conversation context clean.\n\n ", "parameters": {"properties": {"description": {"description": "Brief task summary (shown to user as status)", "type": "string"}, "prompt": {"description": "Detailed instructions for the sub-agent", "type": "string"}, "max_iterations": {"anyOf": [{"type": "integer"}, {"type": "null"}], "default": null, "description": "Optional override for maximum tool call iterations (uses Valves.MAX_ITERATIONS if omitted)"}}, "required": ["description", "prompt"], "type": "object"}}}, {"type": "function", "function": {"name": "get_current_timestamp", "description": "Get the current Unix timestamp in seconds.", "parameters": {"properties": {}, "type": "object"}}}, {"type": "function", "function": {"name": "calculate_timestamp", "description": "Get the current Unix timestamp, optionally adjusted by days, weeks, months, or years.\nUse this to calculate timestamps for date filtering in search functions.\nExamples: "last week" = weeks_ago=1, "3 days ago" = days_ago=3, "a year ago" = years_ago=1", "parameters": {"properties": {"days_ago": {"default": 0, "description": "Number of days to subtract from current time (default: 0)", "type": "integer"}, "weeks_ago": {"default": 0, "description": "Number of weeks to subtract from current time (default: 0)", "type": "integer"}, "months_ago": {"default": 0, "description": "Number of months to subtract from current time (default: 0)", "type": "integer"}, "years_ago": {"default": 0, "description": "Number of years to subtract from current time (default: 0)", "type": "integer"}}, "type": "object"}}}, {"type": "function", "function": {"name": "list_knowledge_bases", "description": "List the user's accessible knowledge bases.", "parameters": {"properties": {"count": {"default": 10, "description": "Maximum number of KBs to return (default: 10)", "type": "integer"}, "skip": {"default": 0, "description": "Number of results to skip for pagination (default: 0)", "type": "integer"}}, "type": "object"}}}, {"type": "function", "function": {"name": "search_knowledge_bases", "description": "Search the user's accessible knowledge bases by name and description.", "parameters": {"properties": {"query": {"description": "The search query to find matching knowledge bases", "type": "string"}, "count": {"default": 5, "description": "Maximum number of results to return (default: 5)", "type": "integer"}, "skip": {"default": 0, "description": "Number of results to skip for pagination (default: 0)", "type": "integer"}}, "required": ["query"], "type": "object"}}}, {"type": "function", "function": {"name": "query_knowledge_bases", "description": "Search knowledge bases by semantic similarity to query.\nFinds KBs whose name/description match the meaning of your query.\nUse this to discover relevant knowledge bases before querying their files.", "parameters": {"properties": {"query": {"description": "Natural language query describing what you're looking for", "type": "string"}, "count": {"default": 5, "description": "Maximum results (default: 5)", "type": "integer"}}, "required": ["query"], "type": "object"}}}, {"type": "function", "function": {"name": "search_knowledge_files", "description": "Search files across knowledge bases the user has access to.", "parameters": {"properties": {"query": {"description": "The search query to find matching files by filename", "type": "string"}, "knowledge_id": {"anyOf": [{"type": "string"}, {"type": "null"}], "default": null, "description": "Optional KB id to limit search to a specific knowledge base"}, "count": {"default": 5, "description": "Maximum number of results to return (default: 5)", "type": "integer"}, "skip": {"default": 0, "description": "Number of results to skip for pagination (default: 0)", "type": "integer"}}, "required": ["query"], "type": "object"}}}, {"type": "function", "function": {"name": "query_knowledge_files", "description": "Search knowledge base files using semantic/vector search. Searches across collections (KBs),\nindividual files, and notes that the user has access to.", "parameters": {"properties": {"query": {"description": "The search query to find semantically relevant content", "type": "string"}, "knowledge_ids": {"anyOf": [{"items": {"type": "string"}, "type": "array"}, {"type": "null"}], "default": null, "description": "Optional list of KB ids to limit search to specific knowledge bases"}, "count": {"default": 5, "description": "Maximum number of results to return (default: 5)", "type": "integer"}}, "required": ["query"], "type": "object"}}}, {"type": "function", "function": {"name": "view_knowledge_file", "description": "Get the full content of a file from a knowledge base.", "parameters": {"properties": {"file_id": {"description": "The ID of the file to retrieve", "type": "string"}}, "required": ["file_id"], "type": "object"}}}, {"type": "function", "function": {"name": "search_chats", "description": "Search the user's previous chat conversations by title and message content.", "parameters": {"properties": {"query": {"description": "The search query to find matching chats", "type": "string"}, "count": {"default": 5, "description": "Maximum number of results to return (default: 5)", "type": "integer"}, "start_timestamp": {"anyOf": [{"type": "integer"}, {"type": "null"}], "default": null, "description": "Only include chats updated after this Unix timestamp (seconds)"}, "end_timestamp": {"anyOf": [{"type": "integer"}, {"type": "null"}], "default": null, "description": "Only include chats updated before this Unix timestamp (seconds)"}}, "required": ["query"], "type": "object"}}}, {"type": "function", "function": {"name": "view_chat", "description": "Get the full conversation history of a chat by its ID.", "parameters": {"properties": {"chat_id": {"description": "The ID of the chat to retrieve", "type": "string"}}, "required": ["chat_id"], "type": "object"}}}, {"type": "function", "function": {"name": "search_web", "description": "Search the public web for information. Best for current events, external references,\nor topics not covered in internal documents. If knowledge base tools are available,\nconsider checking those first for internal information.", "parameters": {"properties": {"query": {"description": "The search query to look up", "type": "string"}, "count": {"default": 5, "description": "Number of results to return (default: 5)", "type": "integer"}}, "required": ["query"], "type": "object"}}}, {"type": "function", "function": {"name": "fetch_url", "description": "Fetch and extract the main text content from a web page URL.", "parameters": {"properties": {"url": {"description": "The URL to fetch content from", "type": "string"}}, "required": ["url"], "type": "object"}}}, {"type": "function", "function": {"name": "execute_code", "description": "Execute Python code in a sandboxed environment and return the output.\nUse this to perform calculations, data analysis, generate visualizations,\nor run any Python code that would help answer the user's question.", "parameters": {"properties": {"code": {"description": "The Python code to execute", "type": "string"}}, "required": ["code"], "type": "object"}}}, {"type": "function", "function": {"name": "search_notes", "description": "Search the user's notes by title and content.", "parameters": {"properties": {"query": {"description": "The search query to find matching notes", "type": "string"}, "count": {"default": 5, "description": "Maximum number of results to return (default: 5)", "type": "integer"}, "start_timestamp": {"anyOf": [{"type": "integer"}, {"type": "null"}], "default": null, "description": "Only include notes updated after this Unix timestamp (seconds)"}, "end_timestamp": {"anyOf": [{"type": "integer"}, {"type": "null"}], "default": null, "description": "Only include notes updated before this Unix timestamp (seconds)"}}, "required": ["query"], "type": "object"}}}, {"type": "function", "function": {"name": "view_note", "description": "Get the full content of a note by its ID.", "parameters": {"properties": {"note_id": {"description": "The ID of the note to retrieve", "type": "string"}}, "required": ["note_id"], "type": "object"}}}, {"type": "function", "function": {"name": "write_note", "description": "Create a new note with the given title and content.", "parameters": {"properties": {"title": {"description": "The title of the new note", "type": "string"}, "content": {"description": "The markdown content for the note", "type": "string"}}, "required": ["title", "content"], "type": "object"}}}, {"type": "function", "function": {"name": "replace_note_content", "description": "Update the content of a note. Use this to modify task lists, add notes, or update content.", "parameters": {"properties": {"note_id": {"description": "The ID of the note to update", "type": "string"}, "content": {"description": "The new markdown content for the note", "type": "string"}, "title": {"anyOf": [{"type": "string"}, {"type": "null"}], "default": null, "description": "Optional new title for the note"}}, "required": ["note_id", "content"], "type": "object"}}}], "temperature": 0.65, "max_tokens": 70000, "top_p": 0.95}

Additional Information

This issue occurs even when File Context is explicitly disabled in the settings.
The duplication happens for both:
- Files stored in the knowledge base
- Files uploaded directly within the chat session
The duplicated content appears in two distinct parts of the assembled prompt:
1. The view_knowledge_file tool response
2. An automatically injected RAG attachment block
The duplication can be verified by inspecting the final prompt payload or model-serving logs before inference.
This behavior increases token usage and may impact:
- Latency
- Context window utilization
- Model attention allocation
The issue appears to stem from parallel file-handling mechanisms (tool-based retrieval and RAG-based injection) operating simultaneously without arbitration, even when File Context is disabled.
The expected behavior is that when File Context is disabled, automatic RAG injection of file contents should not occur, and file content should only enter the context via explicit tool calls.

Originally created by @susmitds on GitHub (Feb 22, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/21726 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Git Clone ### Open WebUI Version v0.8.3 ### Ollama Version (if applicable) _No response_ ### Operating System Windows 11 ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior ### Expected Behavior When **File Context is disabled**, file contents should only be included in the model context if explicitly retrieved via a tool (e.g., `view_knowledge_file`). If the model loads a file using `view_knowledge_file` (tested with both knowledge base files and uploaded files), the file content should appear **only once** in the context — via the tool output. The system should not inject the same file content again through a RAG attachment when File Context is turned off. ### Actual Behavior ### Actual Behavior Even with **File Context disabled**, when the model loads a file using `view_knowledge_file`, the file content appears **twice** in the model context: 1. Once in the tool output (`view_knowledge_file` response) 2. Once again through a RAG attachment injection This results in duplicated file content in the prompt context. ### Steps to Reproduce ### Steps to Reproduce 1. Upload a `.txt` file in the chat. 2. Ask the model to load the file. 3. Inspect the model-serving logs. You will observe that the file content appears twice in the assembled context: * In the tool output * In an additional RAG-injected attachment ### Logs & Screenshots srv log_server_r: request: {"stream": true, "model": "glm-4.5", "messages": [{"role": "system", "content": "You are ChatLM."}, {"role": "user", "content": "<attached_files>\n<file type=\"file\" url=\"cd642bb7-abaf-4aa2-b8a3-bedef513c184\" content_type=\"text/plain\" name=\"test.txt\"/>\n</attached_files>\n\ndo you see any file"}, {"role": "assistant", "content": "Yes, I can see that there's a file attached to your message! Here are the details:\n\n**Attached File:**\n- **Name:** test.txt\n- **Type:** text/plain\n- **File ID:** cd642bb7-abaf-4aa2-b8a3-bedef513c184\n\nThe file appears to be a plain text file. Would you like me to view the contents of this file? I can retrieve and display its content for you if you'd like to see what's inside."}, {"role": "user", "content": "### Task:\nRespond to the user query using the provided context, incorporating inline citations in the format [id] **only when the <source> tag includes an explicit id attribute** (e.g., <source id=\"1\">).\n\n### Guidelines:\n- If you don't know the answer, clearly state that.\n- If uncertain, ask the user for clarification.\n- Respond in the same language as the user's query.\n- If the context is unreadable or of poor quality, inform the user and provide the best possible answer.\n- If the answer isn't present in the context but you possess the knowledge, explain this to the user and provide the answer using your own understanding.\n- **Only include inline citations using [id] (e.g., [1], [2]) when the <source> tag includes an id attribute.**\n- Do not cite if the <source> tag does not contain an id attribute.\n- Do not use XML tags in your response.\n- Ensure citations are concise and directly related to the information provided.\n\n### Example of Citation:\nIf the user asks about a specific topic and the information is found in a source with a provided id attribute, the response should include the citation like in the following example:\n* \"According to the study, the proposed method increases efficiency by 20% [1].\"\n\n### Output:\nProvide a clear and direct response to the user's query, including inline citations in the format [id] only when the <source> tag with id attribute is present in the context.\n\n<context>\n<source id=\"1\" name=\"test.txt\">i am testing</source>\n</context>\n\nsure.\n /nothink"}, {"role": "assistant", "content": "", "tool_calls": [{"id": "JoLoMs5G3Um86efTT1P4pyzHJKCVPjKt", "type": "function", "function": {"name": "view_knowledge_file", "arguments": "{\"file_id\": \"cd642bb7-abaf-4aa2-b8a3-bedef513c184\"}"}}]}, {"role": "tool", "tool_call_id": "JoLoMs5G3Um86efTT1P4pyzHJKCVPjKt", "content": "{\"id\": \"cd642bb7-abaf-4aa2-b8a3-bedef513c184\", \"filename\": \"test.txt\", \"content\": \"i am testing\", \"updated_at\": 1771737304, \"created_at\": 1771737304}"}], "stream_options": {"include_usage": true}, "_think_mode": "disabled", "chat_template_kwargs": {"enable_thinking": false}, "tools": [{"type": "function", "function": {"name": "run_parallel_sub_agents", "description": "\n Run multiple independent sub-agent tasks in parallel (concurrently).\n\n Use this instead of calling run_sub_agent multiple times when you have\n 2 or more tasks that do NOT depend on each other's results.\n All tasks share the same model and tools but run in isolated contexts,\n so they execute simultaneously and finish much faster than sequential calls.\n\n ", "parameters": {"properties": {"tasks": {"description": "List of task objects. Each must have \"description\" and \"prompt\".", "items": {}, "type": "array"}}, "required": ["tasks"], "type": "object"}}}, {"type": "function", "function": {"name": "run_sub_agent", "description": "\n Delegate a task to a sub-agent for autonomous completion.\n\n MANDATORY: If a task requires more than 2 steps of investigation or complex analysis,\n you MUST NOT perform it yourself. Delegate to this tool immediately.\n Only handle simple 1-2 tool call tasks yourself. When in doubt, delegate.\n\n The sub-agent runs in an isolated context with access to the same tools.\n It executes tools in a loop until completion, returning only the final result\n to keep the main conversation context clean.\n\n ", "parameters": {"properties": {"description": {"description": "Brief task summary (shown to user as status)", "type": "string"}, "prompt": {"description": "Detailed instructions for the sub-agent", "type": "string"}, "max_iterations": {"anyOf": [{"type": "integer"}, {"type": "null"}], "default": null, "description": "Optional override for maximum tool call iterations (uses Valves.MAX_ITERATIONS if omitted)"}}, "required": ["description", "prompt"], "type": "object"}}}, {"type": "function", "function": {"name": "get_current_timestamp", "description": "Get the current Unix timestamp in seconds.", "parameters": {"properties": {}, "type": "object"}}}, {"type": "function", "function": {"name": "calculate_timestamp", "description": "Get the current Unix timestamp, optionally adjusted by days, weeks, months, or years.\nUse this to calculate timestamps for date filtering in search functions.\nExamples: \"last week\" = weeks_ago=1, \"3 days ago\" = days_ago=3, \"a year ago\" = years_ago=1", "parameters": {"properties": {"days_ago": {"default": 0, "description": "Number of days to subtract from current time (default: 0)", "type": "integer"}, "weeks_ago": {"default": 0, "description": "Number of weeks to subtract from current time (default: 0)", "type": "integer"}, "months_ago": {"default": 0, "description": "Number of months to subtract from current time (default: 0)", "type": "integer"}, "years_ago": {"default": 0, "description": "Number of years to subtract from current time (default: 0)", "type": "integer"}}, "type": "object"}}}, {"type": "function", "function": {"name": "list_knowledge_bases", "description": "List the user's accessible knowledge bases.", "parameters": {"properties": {"count": {"default": 10, "description": "Maximum number of KBs to return (default: 10)", "type": "integer"}, "skip": {"default": 0, "description": "Number of results to skip for pagination (default: 0)", "type": "integer"}}, "type": "object"}}}, {"type": "function", "function": {"name": "search_knowledge_bases", "description": "Search the user's accessible knowledge bases by name and description.", "parameters": {"properties": {"query": {"description": "The search query to find matching knowledge bases", "type": "string"}, "count": {"default": 5, "description": "Maximum number of results to return (default: 5)", "type": "integer"}, "skip": {"default": 0, "description": "Number of results to skip for pagination (default: 0)", "type": "integer"}}, "required": ["query"], "type": "object"}}}, {"type": "function", "function": {"name": "query_knowledge_bases", "description": "Search knowledge bases by semantic similarity to query.\nFinds KBs whose name/description match the meaning of your query.\nUse this to discover relevant knowledge bases before querying their files.", "parameters": {"properties": {"query": {"description": "Natural language query describing what you're looking for", "type": "string"}, "count": {"default": 5, "description": "Maximum results (default: 5)", "type": "integer"}}, "required": ["query"], "type": "object"}}}, {"type": "function", "function": {"name": "search_knowledge_files", "description": "Search files across knowledge bases the user has access to.", "parameters": {"properties": {"query": {"description": "The search query to find matching files by filename", "type": "string"}, "knowledge_id": {"anyOf": [{"type": "string"}, {"type": "null"}], "default": null, "description": "Optional KB id to limit search to a specific knowledge base"}, "count": {"default": 5, "description": "Maximum number of results to return (default: 5)", "type": "integer"}, "skip": {"default": 0, "description": "Number of results to skip for pagination (default: 0)", "type": "integer"}}, "required": ["query"], "type": "object"}}}, {"type": "function", "function": {"name": "query_knowledge_files", "description": "Search knowledge base files using semantic/vector search. Searches across collections (KBs),\nindividual files, and notes that the user has access to.", "parameters": {"properties": {"query": {"description": "The search query to find semantically relevant content", "type": "string"}, "knowledge_ids": {"anyOf": [{"items": {"type": "string"}, "type": "array"}, {"type": "null"}], "default": null, "description": "Optional list of KB ids to limit search to specific knowledge bases"}, "count": {"default": 5, "description": "Maximum number of results to return (default: 5)", "type": "integer"}}, "required": ["query"], "type": "object"}}}, {"type": "function", "function": {"name": "view_knowledge_file", "description": "Get the full content of a file from a knowledge base.", "parameters": {"properties": {"file_id": {"description": "The ID of the file to retrieve", "type": "string"}}, "required": ["file_id"], "type": "object"}}}, {"type": "function", "function": {"name": "search_chats", "description": "Search the user's previous chat conversations by title and message content.", "parameters": {"properties": {"query": {"description": "The search query to find matching chats", "type": "string"}, "count": {"default": 5, "description": "Maximum number of results to return (default: 5)", "type": "integer"}, "start_timestamp": {"anyOf": [{"type": "integer"}, {"type": "null"}], "default": null, "description": "Only include chats updated after this Unix timestamp (seconds)"}, "end_timestamp": {"anyOf": [{"type": "integer"}, {"type": "null"}], "default": null, "description": "Only include chats updated before this Unix timestamp (seconds)"}}, "required": ["query"], "type": "object"}}}, {"type": "function", "function": {"name": "view_chat", "description": "Get the full conversation history of a chat by its ID.", "parameters": {"properties": {"chat_id": {"description": "The ID of the chat to retrieve", "type": "string"}}, "required": ["chat_id"], "type": "object"}}}, {"type": "function", "function": {"name": "search_web", "description": "Search the public web for information. Best for current events, external references,\nor topics not covered in internal documents. If knowledge base tools are available,\nconsider checking those first for internal information.", "parameters": {"properties": {"query": {"description": "The search query to look up", "type": "string"}, "count": {"default": 5, "description": "Number of results to return (default: 5)", "type": "integer"}}, "required": ["query"], "type": "object"}}}, {"type": "function", "function": {"name": "fetch_url", "description": "Fetch and extract the main text content from a web page URL.", "parameters": {"properties": {"url": {"description": "The URL to fetch content from", "type": "string"}}, "required": ["url"], "type": "object"}}}, {"type": "function", "function": {"name": "execute_code", "description": "Execute Python code in a sandboxed environment and return the output.\nUse this to perform calculations, data analysis, generate visualizations,\nor run any Python code that would help answer the user's question.", "parameters": {"properties": {"code": {"description": "The Python code to execute", "type": "string"}}, "required": ["code"], "type": "object"}}}, {"type": "function", "function": {"name": "search_notes", "description": "Search the user's notes by title and content.", "parameters": {"properties": {"query": {"description": "The search query to find matching notes", "type": "string"}, "count": {"default": 5, "description": "Maximum number of results to return (default: 5)", "type": "integer"}, "start_timestamp": {"anyOf": [{"type": "integer"}, {"type": "null"}], "default": null, "description": "Only include notes updated after this Unix timestamp (seconds)"}, "end_timestamp": {"anyOf": [{"type": "integer"}, {"type": "null"}], "default": null, "description": "Only include notes updated before this Unix timestamp (seconds)"}}, "required": ["query"], "type": "object"}}}, {"type": "function", "function": {"name": "view_note", "description": "Get the full content of a note by its ID.", "parameters": {"properties": {"note_id": {"description": "The ID of the note to retrieve", "type": "string"}}, "required": ["note_id"], "type": "object"}}}, {"type": "function", "function": {"name": "write_note", "description": "Create a new note with the given title and content.", "parameters": {"properties": {"title": {"description": "The title of the new note", "type": "string"}, "content": {"description": "The markdown content for the note", "type": "string"}}, "required": ["title", "content"], "type": "object"}}}, {"type": "function", "function": {"name": "replace_note_content", "description": "Update the content of a note. Use this to modify task lists, add notes, or update content.", "parameters": {"properties": {"note_id": {"description": "The ID of the note to update", "type": "string"}, "content": {"description": "The new markdown content for the note", "type": "string"}, "title": {"anyOf": [{"type": "string"}, {"type": "null"}], "default": null, "description": "Optional new title for the note"}}, "required": ["note_id", "content"], "type": "object"}}}], "temperature": 0.65, "max_tokens": 70000, "top_p": 0.95} ### Additional Information ### Additional Information * This issue occurs even when **File Context is explicitly disabled** in the settings. * The duplication happens for both: * Files stored in the knowledge base * Files uploaded directly within the chat session * The duplicated content appears in two distinct parts of the assembled prompt: 1. The `view_knowledge_file` tool response 2. An automatically injected RAG attachment block * The duplication can be verified by inspecting the final prompt payload or model-serving logs before inference. * This behavior increases token usage and may impact: * Latency * Context window utilization * Model attention allocation * The issue appears to stem from parallel file-handling mechanisms (tool-based retrieval and RAG-based injection) operating simultaneously without arbitration, even when File Context is disabled. * The expected behavior is that when File Context is disabled, automatic RAG injection of file contents should not occur, and file content should only enter the context via explicit tool calls.

GiteaMirror added the bug label 2026-05-05 22:35:53 -05:00

GiteaMirror closed this issue

2026-05-05 22:35:55 -05:00

GiteaMirror commented

2026-05-05 22:35:56 -05:00

@Classic298 commented on GitHub (Feb 22, 2026):

If you mean the rag system prompt, then why isn't this intended? The model needs the rag system prompt including source to cite it properly

@Classic298 commented on GitHub (Feb 22, 2026): If you mean the rag system prompt, then why isn't this intended? The model needs the rag system prompt including source to cite it properly

GiteaMirror commented

2026-05-05 22:35:56 -05:00

@susmitds commented on GitHub (Feb 22, 2026):

@Classic298

To clarify the issue precisely, here is the full duplication visible in the logs.

The file content appears once inside an injected RAG block:

<source id="1" name="test.txt">i am testing

And then appears again in the tool response:

{
  "id": "cd642bb7-abaf-4aa2-b8a3-bedef513c184",
  "filename": "test.txt",
  "content": "i am testing",
  "updated_at": 1771737304,
  "created_at": 1771737304
}

So the model receives the file content twice:

As a RAG <source> block
As "content" in the tool output

The important detail is this:

The RAG <source> block is not injected during the user message, but is instead added after the tool call — even though File Context is disabled.

This means the system is still performing an automatic RAG injection step after the model retrieves the file explicitly via view_knowledge_file, creating two separate content pathways:

Automatic post-tool RAG injection
Explicit tool-based retrieval

When File Context is disabled, the RAG injection step should never occur.

For small test files this looks harmless, but for real files (20k–50k tokens), this effectively doubles the token load:

One full copy from RAG
One full copy from the tool
Plus overhead from XML and JSON encoding

A 25k-token file becomes ~50k tokens of prompt bloat.

Consequences:

Doubled context window consumption
Higher inference cost
Reduced space for conversation
Higher likelihood of truncation
Longer latency

Again: this has nothing to do with citations or intended RAG behavior.
This is the File Context setting being ignored, causing redundant injection of large file content after the tool call.

@susmitds commented on GitHub (Feb 22, 2026): @Classic298 To clarify the issue precisely, here is the full duplication visible in the logs. The file content appears once inside an injected RAG block: ```xml <source id="1" name="test.txt">i am testing ``` And then appears again in the tool response: ```json { "id": "cd642bb7-abaf-4aa2-b8a3-bedef513c184", "filename": "test.txt", "content": "i am testing", "updated_at": 1771737304, "created_at": 1771737304 } ``` So the model receives the file content twice: 1. As a RAG `<source>` block 2. As `"content"` in the tool output The important detail is this: **The RAG `<source>` block is not injected during the user message, but is instead added *after the tool call* — even though File Context is disabled.** This means the system is still performing an automatic RAG injection step after the model retrieves the file explicitly via `view_knowledge_file`, creating two separate content pathways: * Automatic post-tool RAG injection * Explicit tool-based retrieval When File Context is disabled, the RAG injection step should never occur. For small test files this looks harmless, but for real files (20k–50k tokens), this effectively doubles the token load: * One full copy from RAG * One full copy from the tool * Plus overhead from XML and JSON encoding A 25k-token file becomes ~50k tokens of prompt bloat. Consequences: * Doubled context window consumption * Higher inference cost * Reduced space for conversation * Higher likelihood of truncation * Longer latency Again: this has nothing to do with citations or intended RAG behavior. This is the File Context setting being ignored, causing redundant injection of large file content **after** the tool call.

GiteaMirror commented

2026-05-05 22:35:56 -05:00

@Classic298 commented on GitHub (Feb 22, 2026):

https://github.com/open-webui/open-webui/pull/21733

@Classic298 commented on GitHub (Feb 22, 2026): https://github.com/open-webui/open-webui/pull/21733

GiteaMirror commented

2026-05-05 22:35:57 -05:00

@tjbck commented on GitHub (Feb 23, 2026):

RAG template injection is INTENDED and is REQUIRED for inline citations.

424dba443c

@tjbck commented on GitHub (Feb 23, 2026): RAG template injection is INTENDED and is REQUIRED for inline citations. 424dba443c578008a1dfa0dd999069a12ab74a3a

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/open-webui#58220