issue: Issue with using a tool call and knolwedge documents.

GiteaMirror commented

2025-11-11 17:01:49 -06:00

Owner

Originally created by @rahepler2 on GitHub (Oct 8, 2025).

Check Existing Issues

I have searched for any existing and/or related issues.
I have searched for any existing and/or related discussions.
I am using the latest version of Open WebUI.

Installation Method

Git Clone

Open WebUI Version

0.11.2

Ollama Version (if applicable)

No response

Operating System

RHEL

Browser (if applicable)

Chrome

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

The expected behavior is that the tool call would check if there is anything useful from the user's prompt, if not it would continue on to using the knowledge base attached for RAG.

Actual Behavior

Currently it's running the tool and the RAG chunk check at the same time and giving me a token count error saying that I'm using over 3M tokens in the call and that it needs to be reduced.

Steps to Reproduce

Start docker container for 0.11.2
Create MCP server tool and add it as an external tool connection
Add the knowledge collection and tool to the model
Run the model with a question

Logs & Screenshots

This model's maximum context length is 128000 tokens. However, your messages resulted in 306326 tokens. Please reduce the length of the messages.

This is reduced from the 3M before when I reduced the chunks in the rag retrieval section to < 10

Additional Information

No response

Originally created by @rahepler2 on GitHub (Oct 8, 2025). ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Git Clone ### Open WebUI Version 0.11.2 ### Ollama Version (if applicable) _No response_ ### Operating System RHEL ### Browser (if applicable) Chrome ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior The expected behavior is that the tool call would check if there is anything useful from the user's prompt, if not it would continue on to using the knowledge base attached for RAG. ### Actual Behavior Currently it's running the tool and the RAG chunk check at the same time and giving me a token count error saying that I'm using over 3M tokens in the call and that it needs to be reduced. ### Steps to Reproduce 1. Start docker container for 0.11.2 2. Create MCP server tool and add it as an external tool connection 3. Add the knowledge collection and tool to the model 4. Run the model with a question ### Logs & Screenshots This model's maximum context length is 128000 tokens. However, your messages resulted in 306326 tokens. Please reduce the length of the messages. This is reduced from the 3M before when I reduced the chunks in the rag retrieval section to < 10 ### Additional Information _No response_