mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 11:28:35 -05:00
[GH-ISSUE #20896] issue: Generation stops after tool call when routing Ollama through WebUI (GLM-4.7-Flash in OpenCode) #34855
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @HuysArthur on GitHub (Jan 23, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/20896
Check Existing Issues
Installation Method
Docker
Open WebUI Version
v0.7.2
Ollama Version (if applicable)
v0.14.3
Operating System
Linux Mint 19.1
Browser (if applicable)
No response
Confirmation
README.md.Expected Behavior
When I ask the model to do something in opencode it calls a tool and after that it gives the answer I asked, this works when I use the model directly from the Ollama API as shown below:
user@host dir % opencode run "Create the file 'README' with the contents 'hello world', let me know if it succeeded" --model ollama/glm-4.7-flash:bf16-80k
I'll create the README file with the specified content.
| Write Users/user/dir/README
README created successfully with content "hello world".
Actual Behavior
But when I use the same model from Ollama but route it trough the WebUI API, it always stops the execution after a tool call shown below, it correctly creates the file but after that it stops. This happens all the time in opencode with this configuration every time it calls a tool it stops execution and I have to manually type continue after each tool call
user@host dir % opencode run "Create the file 'README' with the contents 'hello world', let me know if it succeeded" --model webui/glm-4.7-flash:bf16-80k
| Write Users/user/dir/README
Steps to Reproduce
example opencode config:
{
"$schema": "https://opencode.ai/config.json",
"permission": {
"edit": "allow",
"bash": "ask",
"webfetch": "ask",
"doom_loop": "ask",
"external_directory": "ask"
},
"provider": {
"webui": {
"npm": "@ai-sdk/openai-compatible",
"name": "WebUI",
"options": {
"baseURL": "http://{{ip_webui}}:{{port_webui}}/api/v1",
"apiKey": "{{key}}"
},
"models": {
"glm-4.7-flash:bf16-80k": {
"name": "GLM 4.7 Flash"
}
}
},
}
Logs & Screenshots
No weird logs:
Additional Information
config for the model in webUI is all default: glm-4.7-flash_bf16-80k-1769159953751.json
@owui-terminator[bot] commented on GitHub (Jan 23, 2026):
🔍 Similar Issues Found
I found some existing issues that might be related to this one. Please check if any of these are duplicates or contain helpful solutions:
#19864 issue: Ollama Parameters get overriden after native tool calls
by Haervwe • Dec 10, 2025 •
bug#17058 issue: Response cannot be stopped after the tool is called
by EntropyYue • Aug 30, 2025 •
bug,confirmed issue#20775 issue: calls to tools in gpt-oss
by chdid • Jan 18, 2026 •
bug#17729 issue: generation does not continue after tool call if a client is not connected to the UI
by johnnyasantoss • Sep 25, 2025 •
bug#20600 issue: Tool call results not decoded from HTML entities before sending to LLM
by Koumi460 • Jan 12, 2026 •
bug💡 Tips:
This comment was generated automatically by a bot. Please react with a 👍 if this comment was helpful, or a 👎 if it was not.
@Classic298 commented on GitHub (Jan 25, 2026):
Testing wanted https://github.com/open-webui/open-webui/pull/20933
@HuysArthur commented on GitHub (Feb 2, 2026):
I still have the same issue, I build a new docker image from your code at
Classic298:tool-call-early-stop. I tried to replicate my issue and I still have the same problem that the output stops after a tool-call.@Classic298 commented on GitHub (Feb 2, 2026):
What am i seeing here? This does not look like Open WebUI
How did you test my PR
@HuysArthur commented on GitHub (Feb 2, 2026):
I route my models through OpenWebUI. So this is a test run with opencode (cli coding agent), but it communicates with the OpenWebUI API.
@Classic298 commented on GitHub (Feb 2, 2026):
This introduces additional failure points and is also outside of scope - here the conversation handling is done by open code and not by open webui anymore.
@HuysArthur commented on GitHub (Feb 2, 2026):
Okay, so this issue appears to be unrelated to PR #20933. But what is OpenWebUI doing that causes tool calling to halt through the API? Everything works as expected when I route OpenCode directly to Ollama instead. Is there something specific that OpenWebUI handles or modifies between these calls that might be causing the breakage?
@HuysArthur commented on GitHub (Feb 2, 2026):
I was able to resolve the early stopping / halting after tool calls by configuring my Ollama endpoint in Open WebUI as an OpenAI API compatible connection (instead of the native Ollama API one). Tool calling now flows correctly without manual "continue" prompts.
For me this fixes the problem described — feel free to close the issue
@Classic298 commented on GitHub (Feb 2, 2026):
huh
@bannert1337 commented on GitHub (Feb 3, 2026):
I still have the issue on Open WebUI directly. After GLM-4.7-Flash runs native knowledge retrieval it ends generation.
It ends with
\n<summary>Tool Executed</summary>\n</details>I am running the model through llama.cpp and have it connected as OpenAI-compatible endpoint.
@JTHesse commented on GitHub (Feb 10, 2026):
I see the same issue with qwen3-coder-next.
@J-T1 commented on GitHub (Feb 19, 2026):
Issue persist with v0.8.3, whether the endpoint is configured as Ollama or OpenAI API.
@moritzderallerechte commented on GitHub (Mar 18, 2026):
I have a similar issue.
I am accessing OpenRouter through a manifold pipe and use native tool calling.
When logged in with an admin account, everything works as expected.
However when using a normal user account, the response just stops after a tool call. That looks like this:
There are no errors in the OpenWebUI logs and there are no permission settings that correlate to this behaviour.
The tool calling works fine, as the correct response from querying the knowledge base is shown in the UI, but no subsequent call is made to the model.
Does anyone have an idea why this happens? I see no reason for this and am, quite frankly, frustrated...