mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 03:18:23 -05:00
[GH-ISSUE #20600] issue: Tool call results not decoded from HTML entities before sending to LLM #57897
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Koumi460 on GitHub (Jan 12, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/20600
Check Existing Issues
Installation Method
Git Clone
Open WebUI Version
v0.7.2
Ollama Version (if applicable)
0.13.5
Operating System
Debian, Windows
Browser (if applicable)
N/A
Confirmation
README.md.Expected Behavior
Messages sent to the LLM should contain clean, properly formatted JSON without HTML entities:
\"",&,'should be present in the message content sent to LLMActual Behavior
In multi-turn conversations with tool calls:
processDetails()function inserts HTML-escaped content directly into messages"entities instead of proper quotesSteps to Reproduce
""{\\n \\"results\\": ...instead of clean JSON:"{\\n \\"results\\": ...Logs & Screenshots
Additional Information
AFAIK - When tool calls are executed in multi-turn conversations, the tool call results are stored in conversation history database with HTML-escaped entities (e.g.,
",&) in the database. When these messages are loaded from the database and sent back to the LLM in subsequent conversation turns, the HTML entities in tool call results are not properly decoded, causing the model to receive malformed JSON with escaped entities instead of proper quotation marks.The issue is not visible in the chat window, but it does have an impact on the model, degrading its performance, especially if the chat history is tool call heavy. I am not confident that the fix is correct, but I have observed this bug across different deployment instances and with both ollama and llama.cpp.
Suggested solution (?) tested working on my instance (main branch):
File: src/lib/utils/index.ts
Function: processDetails()
Line: ~875
@owui-terminator[bot] commented on GitHub (Jan 12, 2026):
🔍 Similar Issues Found
I found some existing issues that might be related to this one. Please check if any of these are duplicates or contain helpful solutions:
#19864 issue: Ollama Parameters get overriden after native tool calls
by Haervwe • Dec 10, 2025 •
bug#20595 issue: "search_web" tool executed even when "Web Search" control disabled
by SlavikCA • Jan 11, 2026 •
bug#18743 issue: Tool call results intermittently fail to display in UI when result data is large
by kjpoccia • Oct 30, 2025 •
bug💡 Tips:
This comment was generated automatically by a bot. Please react with a 👍 if this comment was helpful, or a 👎 if it was not.
@jegranado commented on GitHub (Jan 13, 2026):
same here - thank you for providing all the details!
we're using OpenAI compatible endpoints, however.
@silentoplayz commented on GitHub (Jan 18, 2026):
I am able to confirm this issue on the latest
devcommit by looking at the backend logs."is spotted several times throughout tool call related debug logging statements.@Bickio commented on GitHub (Jan 28, 2026):
Are we certain that tool calls should be injected into the assistant messages at all?
I'm experiencing issues where the LLM hallucinates tool call results in long conversations, and I believe it's because previous tool calls were injected into its own messages.
Early message in conversation:
Later message in conversation (hallucinates a tool call):
IMO assistant messages should be left exactly as-is (unless explicitly edited by the user), since they act as a reference for the LLM about how to respond
@Classic298 commented on GitHub (Jan 28, 2026):
@Bickio
YES tool calls HAVE TO BE in assistant's side
what you're experiencing is not hallucinating tool calls. It looks like you are not on the latest version, or maybe, wait let me check... there definitely was an issue where tool calls were incorrectly... modified in the next request i.e. quotes in tool calls became html encoded
@Classic298 commented on GitHub (Jan 28, 2026):
oh yeah thats.. exactly this.. sorry i am tired and it's late
https://github.com/open-webui/open-webui/pull/20755
yeah anyways what you are experiencing is exactly because of this
@Classic298 commented on GitHub (Jan 28, 2026):
@Bickio
This is the Open AI API standard.
Tool calls MUST BE in the assistant's message
So yes we are very sure this is intended and should not be changed
No that's just because the tool calls get incorrectly formatted as we found out here and we wanna fix with the PR
@Bickio commented on GitHub (Jan 28, 2026):
Hi @Classic298 I'm not very familiar with this codebase, but my understanding is that:
role: "assistant"assistant messagestool_callsis where the LLM requests the tool to be calledrole: "tool"messageScreenshot from the OpenAI docs to support this:
However, as you can see from my litellm logs, OpenWebUI is injecting the actual tool output into the
role: "assistant"message:@Classic298 commented on GitHub (Jan 28, 2026):
@Bickio
Ok you are switching up the conversation here.
Earlier you claimed tool calls should not be in assistant message. This is wrong, as tool calls have to be in assistant message per OpenAI standard.
Now you are talking about tool results, not tool calls.
Yes tool results per OpenAI standard belong in "role: tools" object and not in the assistant object.
And yes, to be fully standard conform this would need to be changed, but in reality it's barely an issue, even a non issue.
I have never had issues with tool calls, even WITHOUT the fix of PR #20755.
That's because LLMs got finetuned so well that they always create the tool call correctly, with the correct quotes and not the HTML encoded ones. Only weaker LLMs, or those not finetuned well for tool calling would fail here - but oh well - the PR is coming and then the quotes will be correctly returned back to the model again which will repair tool calls for those models
And even though the tool results being directly embedded in the assistant message is not standard conform, it works - it works perfectly well.
If this were an issue that prevents tool calling from properly working, someone would have flagged it long ago, or at all.
And from the screenshot you shared it looks like the AI you use is outputting html encoded chars instead of the actual char, which means the LLM is affected by the wrong formatting which PR #20755 fixes - and not by the fact that the tool results get added to the assistant message instead of the tool role.
If this needs fixing, then this would require a refactor on the frontend and backend, mostly frontend, to handle the format changes.
@Classic298 commented on GitHub (Jan 28, 2026):
If you want you can open a discussion for this - to add tool results to the tool role
But someone will have to implement it
Is it worth implementing it? Nobody has issues with the current behaviour, otherwise someone would have flagged it, and it requires significant work for - ... for no performance gain and not to fix a bug either.
Anyways your tool result report should be tracked in a discussion and separated from this specific issue here, which is strictly about wrong formatting of the quotes which causes weaker/less-well fine tuned models from correctly executing tools repeatedly.
@Bickio commented on GitHub (Jan 28, 2026):
@Classic298 I think you're misrepresenting my intention here
No, I claimed that OpenWebUI should leave assistant messages exactly as they are, which I believe to be correct. Tool calls in assistant messages are clearly fine, since they're added by the LLM and not by OpenWebUI. Apologies for not specifying "tool call outputs" at every point, clearly this caused some confusion.
The model in my screenshot was Claude Opus 4.5, the strongest tool calling model available.
However, I do agree that unescaping the HTML will have some effect on the LLM behaviour. I'm not as confident as you are that it will entirely resolve the issue, or even that the effect will be positive. I guess we'll find out.
Either way, I think pushing my point here is not productive, and if my issue is not entirely resolved by your PR I will raise a new discussion.
Thanks for your time!
@Bickio commented on GitHub (Feb 2, 2026):
In my testing, the hallucinations/performance degradation mentioned by myself and others above, are not resolved by the HTML unescaping changes. Claude models are still being confused by the injected tool output in previous messages.
@Classic298 As requested, I've created a new discussion to discuss the underlying issues and hopefully work towards a solution: https://github.com/open-webui/open-webui/discussions/21098
@Koumi460 commented on GitHub (Feb 2, 2026):
@Classic298
Thank you for creating the PR and commit and fixing this issue. Unfortunately I have not had time to run your code and test it, but I saw other prople did and I'll test it when I can. But thank you.
@Bickio
I think the issue you are bringing up is separate from the issue I logged. That being said, I am also observing the same behaviour you are describing - tool call results hallucinations when the conversation reaches about 80-100k tokens on qwen3-vl:30b-a3b. I am not sure why, it is a weaker model and I am not on the latest owui version, so take it with a pinch of salt. I think opening a separate discussion was a good call.
@Classic298 commented on GitHub (Feb 2, 2026):
@Bickio
how did you test it?
I tested it with a weaker model locally, and it was affected by the HTML stuff, when the fix is applied, it then did the tool calls properly.
@Bickio commented on GitHub (Feb 2, 2026):
@Classic298
My steps were roughly:
As I've said before: What happens when you inject tool output into the assistant message, is that you're effectively telling the LLM "this is how you successfully called the tool previously". Of course if you give it enough examples it will try to call the tool "the same way it did before" or so it thinks.
I would expect some models to be more likely to adhere to their fine-tuning and resistant to context-window examples than others. Is that a sign of a "better" or more aligned model? I don't think so - IMO it is generally desirable for models to follow examples provided in context. It will also vary prompt-by-prompt. Regardless, poisoning the context with incorrect examples will always degrade model performance and increase the chance of hallucinations to some degree.
@Classic298 commented on GitHub (Feb 2, 2026):
You should not use a conversation where the LLM already hallucinated a tool call 5x in a row to test the fix.
You should get the LLM to hallucinate the tool call without the fix
and then you should deploy the fix
and then you should try again to get the LLM to hallucinate the tool call in a new conversation.
If WITH the fix the LLM still hallucinates the tool call then it didnt work - if it now works - then it works.
Using an already broken conversation (where the LLM failed previously) to test this fix is not proper testing etiquette.
That's not how you test LLM related fixes.
If a conversation is already in a state where the input will make the LLM see broken tool calls, then it will repeat the broken tool calls.
Exactly, so why are you testing it with poisioned context that could no longer appear in all future conversations, post-fix-deployment?
@Classic298 commented on GitHub (Feb 2, 2026):
Testing it on a conversation that is already broken is like testing an antibiotic on a dead patient and saying "see, the antibiotic you invented didnt work".
The broken conversation is broken. We cannot - and it doesnt make sense to - have a database migration or something to fix all broken conversations. Besides the fact that you cannot even reliably do this. What are you gonna do? Replace all escaped HTML quotes with normal quotes? What if a conversation had a legitimate HTML quote that shouldnt get replaced?
Anyways, please test - but this time on a fresh conversation. If tool calls work reliably then the fix works.
@Koumi460 commented on GitHub (Feb 2, 2026):
I've tested the PR that @Classic298 made and it fixes the issue with quotes being incorrectly escaped. I am happy with that and confirm it is working for me.
But even in a completely new environment, on this PR, I was still able to replicate the model hallucinating tool calls after maybe 10 turns and 70k tokens with heavy tool calling. Again, I am using weaker model (qwen2-vl:30b q8 on llama.cpp), but I think it is still happening because of how the tool call results get injected into the assistant's response as @Bickio is saying.
Some models will handle this better and some worse I guess. But it appears to me that it would be better to solve this. One side is the confusion of the model, and the other side is prompt caching - when the format of the tool response changes in the conversation history, the cache will miss and force it to re-process the tool response call once injected into the assistant's message. If the tool call response is very long, this could have meaningful impact on responsiveness and potentially costs. At least that is my speculation / current understanding.
I have created a fork of main branch and started playing around with how to fix this, eventually finding out about and getting inspiration from the PR#19578. I managed to get to good working condition, but I will be testing it like this to see if the issues are fixed. Looking at the calls and preliminary testing, it all looks good to me. Feel free to test it out:
Fork with fixed tool call history and parallel call handling
@Bickio commented on GitHub (Feb 2, 2026):
@Classic298 Obviously I was not using a conversation with hallucinations in context... My "5x repeat" methodology was performed by repeatedly rerunning the conversation step where the LLM first hallucinated - all previous messages contained only successful/correct tool calls
@Classic298 commented on GitHub (Feb 2, 2026):
Report back to us if this makes a meaningful difference to https://github.com/open-webui/open-webui/discussions/21098
and it would be better if you posted this comment to https://github.com/open-webui/open-webui/discussions/21098 @Koumi460
@Classic298 commented on GitHub (Feb 2, 2026):
@Bickio Can you please explain how you tested it then? from your explanation it reads like you used a conversation which contained previously faulty tool calls in the context.
Did you or did you not test it on a brand new, fresh conversation?
I can use Claude Sonnet 4.5 reliably even without the fix and it does not hallucinate tool calls.
What is your precise setup and testing setup?
@Classic298 commented on GitHub (Feb 2, 2026):
The following conversation is WITHOUT the fix in place - and i even used sonnet's weaker/dumber brother:
@Classic298 commented on GitHub (Feb 2, 2026):
The only models I even had issues with tool calling were weaker ones - which then started working again after the fix was applied locally for development and testing.
Therefore I struggle to see how you can have failed tool calls with sonnet, even with the fix.
@Classic298 commented on GitHub (Feb 2, 2026):
So if Claude Opus 4.5 cannot do tool calling for you, neither without the fix, nor with the fix, then something is fundamentally wrong in your setup.
If Claude Haiku can do it, then Opus can too.
I tried to replicate ANY tool calling failure as you can see - without the fix applied - but it just worked. With Haiku.
@Bickio commented on GitHub (Feb 2, 2026):
No
If you rerun the step with the first hallucination, there are, by definition, no hallucinations in context
A completely fresh conversation? I did try with fresh conversations too, but I often just reran the conversation from the first hallucination as stated above. Why does this matter? There were no hallucinations in previous messages.
I've been pretty thorough, so you'll need to be more specific about the information you want. I can't give you access to my tools, as they retrieve sensitive data.
What are you trying to achieve with this angle? Just because I can reproduce the bug and you can't means my setup is wrong?
You have at least 3 users who are affected by, and can reproduce this bug. You've admitted that Open WebUI is not using the OpenAI API according to its spec. I've given clear theoretical arguments why Open WebUI's misuse would cause this hallucination, and shown that they occur in practice regardless of the HTML escaping red herring.
No, something is NOT wrong in my setup, something is wrong in Open WebUI, and it's not just this bug. The hostility towards community members trying engage in good faith is baffling, and makes me very concerned for the future of this software.
@Classic298 commented on GitHub (Feb 2, 2026):
Honestly, yes exactly. That is what i am saying here.
At this point it is likely you have some reverse proxy issues, faulty setup of websocket, browser issue, anything really. If you use a middle layer like LiteLLM it may even be that it incorrectly formats the response (as it has done in the past with Gemini 3 outputs in some versions until it was fixed).
The narrative is simple: Is it reproducible with a clean, fresh, setup? Yes? Then it's a bug.
No? Then it's an issue with your specific setup, configuration, network, firewall, reverse proxy, config or anything else.
This is how it is. If we cannot reproduce it even after heavily trying (see images above) then how can we fix it?
If the other things you mentioned throughout the conversation were actually an issue, then I could reproduce the issue.
I saw users who complained for weeks that something does not work.
Only for them to eventually come back and say "woops it was the firewall sorry for the noise" or "yeah i enabled proxy buffering in nginx, that's the culprit" or "yeah it seems g2cli was faulty" or "litellm had an issue".
So often.
And if you cannot provide any way for me to reproduce it, even without the fix of the open PR applied, even on a weaker model, then absolutely, and i mean absolutely everything points to this being a you-problem.
Me trying to help troubleshoot your issue with MY free time as an unpaid, volunteer contributor who works 40+ hrs a week in a full-time job and ALSO has full-time studies to attend AND helps improve open webui is worth a lot. You interpreting my messages as "hostility" when actually I am taking my little time trying to figure out what is wrong with your setup to help you, by literally asking you to explain what your setup is, and you responding with "No" is what is actually hostile.
Tell us! What is your setup! Configs, version, how did you connect the LLMs? Reverse proxy? Firewall? Do all tools fail? What tool fails? Does it fail in new chats? Did you clear browser cache? What browser? What extensions? Did you enable the Microsoft Edge extra Security setting which is also known to cause issues? How did you configure the models inside Open WebUI? What advanced parameters? What capabilities? Do you use any custom filters which might interfere by modifying user messages or the model's output in the inlet() or outlet()? What OS do you run it one? How did you install it? You only provided one single message so far as a screenshot; if we are being generous, because it was only evident that Claude Opus was used and a tool was called but nothing else was visible on these screenshots.
Yes and all of them confirmed the PR working and the tool calls to then work again fully or much better.
Only you, out of the 3 others is claiming it is not fixed. So AGAIN it points to something being wrong in your setup when 3 others are saying it works (see PR comments)
Silentoplayz tested it and it works
Koumi tested it and it works
rbsn-cpu tested it and it works
and so did I
you are the only one for whom the fix allegedly doesn't work
Where? How? You have not explained a single spec of your setup.
@Classic298 commented on GitHub (Feb 2, 2026):
And this one single screenshot you shared is suspicious too: I do not even see a tool call here.
Even if we fix the HTML formatting, there is no tool being called. Where is the tool name? Tool name must be inside quotes and be an actual function name.
So the first "quot;" followed by Rows 1-2 is not a tool call. A tool call cannot be "Rows 1-2()" this is not a valid tool function name.
But the fact you also see raw \n\n's BESIDES the unformatted quot; and some text is telling that something is wrong with your setup.
Whatever you are experiencing: nobody else is experiencing it and we cannot reproduce it and it also has little if not nothing to do with this issue and the PR.
@Bickio commented on GitHub (Feb 2, 2026):
@Classic298 it will take me some time to reply fully, as you've asked for a lot of details.
With regards to that screenshot, as I noted in my discussion post, Open WebUI only injects the tool output, the tool call itself is completely lost. My tool outputs a multi line string, which is what you're seeing there.
@Classic298 commented on GitHub (Feb 2, 2026):
@Bickio so we can get any confusion out of the way because I have an urgent question: are you using native tool calling via admin panel > settings > model > opus > advanced parameters > tool calling: native?
Because as you can see from my screenshots, the tool call does not get lost, and is injected in the assistant message (visibly).
The tool call is not lost.
And if you are using native tool calling, and the tool call IS LOST, then this is ... most interesting. Again: I cannot reproduce it - but then this would mean that whatever you are experiencing is not related to this issue and demands standalone investigation - my money is on middleware issues, LLM translation issues or perhaps reverse proxy or other network related causes.
@Bickio commented on GitHub (Feb 2, 2026):
@Classic298 yes, native tool calling is enabled.
To be clear, the tool call still appears in the UI so yes, not completely lost in that sense. What I meant is that it's not injected into the LLM call. The LLM only sees the tool outputs in previous messages. The screenshot in question is the LLM hallucinating an example of what it sees, which is just the tool output.
@Classic298 commented on GitHub (Feb 2, 2026):
@Bickio commented on GitHub (Feb 2, 2026):
Correct. "Monkey see, monkey do". Because its previous messages show itself producing the tool output, that's what it tries to do.
It outputs data in a format carefully engineered to reduce the issues you mention. The data is presented in something similar to CSV format, which is much more compact than JSON due to lack of key repetition. It has a strict limit on the volume of data (500 lines), and strategically places fresh column headers every 100 lines to ensure the LLM retains that info in its working memory. In practice, the majority of queries will produce a much smaller set of lines (1-10).
Volume of data is not the culprit here - it always works fine in the first message, with as many as 20-30 large tool calls. It's only once Open WebUI starts collapsing the previous messages that the hallucinations appear. I also use the same tool output format in other systems (e.g. mastra) without any hallucinations.
TL;DR yes the output is somewhat large, but the tool is carefully designed to account for this, and I'm well within the limits of what the LLM can handle
@Classic298 commented on GitHub (Feb 2, 2026):
@Bickio
Thanks for confirming. Though i must say, while this does point to your discussion's direction of "tool outputs should be in tool role", you also confirmed that in your screenshot the LLM didn't even attempt a tool call?
Therefore:
then you could not have tested the PR in the way it was meant to be tested - and more importantly: you also weren't experiencing this issue here (#20600) - but something entirely different. Hence it is good you opened the discussion. But also I think and hope we can finally conclude the PR works 🤣
@Bickio commented on GitHub (Feb 2, 2026):
@Classic298 I fear, once again we are misunderstanding each other.
I was able to confirm by looking at the LiteLLM logs, that with your PR applied, the previous messages in the conversation sent to the LLM contained correctly HTML-unescaped tool outputs. In other words, your PR works as intended.
However, it does not prevent the hallucinations, since the LLM is still being shown examples of itself producing the tool outputs. My initial hypothesis was that unescaping the HTML would actually make these poisoned examples more potent to the LLM, however that is difficult to empirically prove. Naturally, some examples will be resolved (e.g. your tests), some will not (e.g. my tests), while others may break which we're completely unaware of as they were working previously. This is the nature of non-deterministic systems.
@Classic298 commented on GitHub (Feb 2, 2026):
@Bickio
Ok then we misunderstood each other - but in a good way
PR works - but the PR was never intended to fix the issue you are experiencing
The issue you are experiencing MIGHT actually be related to the discussion you opened, but not to this issue here, and hence also not fixed by my PR
@Bickio commented on GitHub (Feb 2, 2026):
We can agree on that. I'm just not totally sure what it was meant to fix in that case. The HTML escaping is not visible to the user (except in hallucinations), and the PR hasn't resolved the "degraded LLM performance" mentioned in the original issue.
@Classic298 commented on GitHub (Feb 2, 2026):
@Bickio
On weaker LLMs, when doing a second or third turn - all turns having tool calls - the probability of the LLM attempting to do a tool call, but failing, rises without the fix.
Why?
Because the LLM gets it's own messages sent back (as it has to be) - but in it's own prior messages, the tool calls are formatted using
"instead of the proper " quotes.Therefore it thinks a tool call needs HTML encoded
"elements instead of " for a tool call - which is wrong.Then the LLM will attempt to make a tool call in the current turn which looks like this
"get_weather"for example.This is fixed by the PR, which ensures quotes and other HTML elements are not sent back to the model in HTML encoded form, but in their normal form as they were actually generated by the model, stored in the database and shown in the UI, and not in the raw encoded HTML form.
So the LLMs affected by this are mostly weaker LLMs with not very strong fine tuning (through fine tuning most modern LLMs have very strong function calling, even small models like gpt-oss-20bn).
Any model with good fine tuning will ignore the html encoded characters in it's previous responses and generate correct tool calls anyways.
But weaker models or models with not-so-perfect fine tuning will see their previous answers, where they seemingly used
"instead of " and then repeat that. Because their fine tuning wasn't strong enough to teach them how tool calls have to work.This is what was reported here - and this is what the PR fixes.
@Bickio commented on GitHub (Feb 2, 2026):
I wasn't aware that this was a potential failure mode - I guess it seems plausible, but wouldn't an actual malformed tool call be prevented at the LLM API level, by the tool call schemas?
Where? The original issue only includes a vague mention of "degraded performance" which could equally refer to the same tool output hallucinations I see.
@Classic298 commented on GitHub (Feb 2, 2026):
Yes and Yes.
But we do not have a tool call in this case.
The LLM just outputs
"get_weather"This is not a tool call.
This is just text that almost could be a tool call.
Therefore, no tool is called and no tool is executed.
And this happens because Open WebUI wrongfully sent back encoded HTML quotes instead of just normal quotes of prior messages in the same conversation - leading weaker LLMs to believe that you call tools by writing
"get_weather"instead of "get_weather".Fair - the original issue does not explicitly state "primarily happens with weaker or less well-fine-tuned LLMs".
Here's what I did:
Later on, Original Issue Reporter also said they used qwen2-30b - clearly a smaller model, not the latest model that you can use by Qwen either, so besides being small, also potentially not well fine tuned.
So TLDR: This issue is about degraded performance with small or not-well-fine-tuned models.
A monster like Claude Opus, and even small models, which are well fine-tuned, like Haiku, power through with the tool calls even if you send them corrupted prior messages with broken prior tool calls because of wrongfully encoded HTML quotes.
So through debugging, testing and focusing on fixing what was reported (we indeed should not send accidentally modified LLM turns back to the LLM in any case anyways - and to me this was the core issue i was focusing on primarily), the issue was fixed and the reporter also later confirmed which model they used, confirming my suspicion, with my own tests also, that this primarily affects weaker models.
TLDR for the TLDR:
PS:
This issue is about tool calls, not tool responses. I should have noticed earlier that your screenshots did not even show a tool call to begin with, but the
"at the very beginning led me to believe the screenshot was meant to show a failed tool call - but that's not what the screenshot shows. As you said, it only shows the model hallucinating some tool call result that isn't there.@Classic298 commented on GitHub (Feb 2, 2026):
hope that explains it
@Bickio commented on GitHub (Feb 2, 2026):
Thanks, I was not aware that you'd successfully reproduced the LLM failing to use tool calls. If there's a concrete case where the PR reduces cases of hallucinated tool use, then I agree the PR is valuable on its own.
I do find it interesting that the LLM was hallucinating tool calls in your testing rather than tool outputs. Was your small model where you reproduced the malformed tool call connected via an OpenAI compatible API, or via Ollama? If it's Ollama, I suspect that there may be a difference in how the tool injection is formatted between the two systems - perhaps the Ollama code injects the tool call as well as the output, whereas the OpenAI code only injects the output?
@Classic298 commented on GitHub (Feb 3, 2026):
Yes I was able to reproduce faulty tool calls - the original report was exclusively about tool calls and it's degraded performance
Model connected via OpenAI, but Ollama should handle tool calls just as well.
Well hallucinated tool calls is the wrong word here, if we are being honest. The model knows the tool is available and is trying it's best to call it.
Equally as much as you find it interesting that users have issues with ... well hallucinated is the wrong word here - malformed tool calls by the model due to poisioned context input, I find it interesting that you struggle with simply totally hallucinated model outputs.
@Bickio commented on GitHub (Feb 3, 2026):
Correct me if I'm wrong here but they're not just malformed, the model is putting the tool call in the assistant content instead of tool calls array. So even if it was outputting correct json (no html escaping) the tool call wouldn't actually be processed, right?
@Classic298 commented on GitHub (Feb 3, 2026):
If we wanna be precise, yes. Thats another fault the model makes here. But some models are also (seemingly) trained to output it to the normal model output like DeepSeek V3.2 (full) - which then struggle even more if they see
"in their previous messages.This was also one of the models i tested the PR on. The PR didnt fully resolve it for this model though because DeepSeek V3.2 is trained also on DSML (deepseek markup language), which calls tool vastly different than just OpenAI tool calls - but it still improved tool calling performance.
@pfn commented on GitHub (Feb 3, 2026):
This bug is making tool-calling nigh unusable. My chats are unpredictably getting corrupted by mis-quoted tool output showing up. Especially when it should be in a tool element rather than output by the assistant itself. Assistant role should summarize tool results possibly, but not embed tool results directly (the LLM could embed results, but it's not something the chat host should be doing). Assistant role should only emit tool_call which the chat host links to the actual function call. Once a tool role message is received, the model decides what to do with it in the resulting assistant message, whether embed or not.
@Classic298 commented on GitHub (Feb 3, 2026):
@pfn did you test the PR? its a very minimal one-liner fix
@pfn commented on GitHub (Feb 3, 2026):
@Classic298 does it apply cleanly to the release branch? I currently run out of the docker release image and it may or may not be convenient to apply. If so, I'll make a new local image that incorporates the PR
@Classic298 commented on GitHub (Feb 3, 2026):
@pfn https://github.com/open-webui/open-webui/pull/20755
@pfn commented on GitHub (Feb 4, 2026):
@Classic298 preliminary testing looks positive, it lasted a lot longer before eventually getting confused, for some reason, it starts embedding the json tool result/template into the assistant response, not sure of the source yet, however the escaping issue is gone:
regenerating the response can make it go back to normal again, so I'm not sure where the issue lies for this. the underlying model is glm 4.7 flash w/ thinking enabled
@Bickio commented on GitHub (Feb 4, 2026):
This was always the expected outcome, at least to me. Glad to have confirmation from another user that the HTML escaping is indeed a red herring, and that at best #20755 just delays the tool output hallucinations on some models. On the other hand, at least the hallucinations are slightly more readable now... 😉
@pfn If you're curious about what's going on in your example, I wrote this discussion post to explain: #21098
@pfn commented on GitHub (Feb 4, 2026):
@Bickio indeed, your description of what's going on in #21098 sounds very apt. I don't know how to look at the underlying chat interaction in openwebui, so I can't confirm what you're describing, but I do completely agree that tool role messages must not be merged into assistant role messages
@Classic298 commented on GitHub (Feb 4, 2026):
How is it a red herring if now 6 users confirmed the PR helps with tool calling on subsequent responses?.......................
@pfn commented on GitHub (Feb 4, 2026):
it doesn't seem to address the underlying root cause. it just makes the
"disappear@Classic298 commented on GitHub (Feb 4, 2026):
@pfn Please see what issue you are commenting on ^
@pfn commented on GitHub (Feb 4, 2026):
I am commenting on this issue, being that it isn't the right issue to address. The effort shouldn't be in making the html entities go away, it should be in making the tool and assistant role messages discrete rather than a merged assistant message. That breaks the model's context when they get merged. I applied your one-liner you mentioned, it still results in a broken chat, just without html entities getting in the way.
@Bickio commented on GitHub (Feb 4, 2026):
@Classic298 The only actual user facing issue here is the AI hallucination and degraded performance. The original author of this ticket made the (not unreasonable) assumption that the hallucination was caused by the HTML escaping, which we now know is not true.
@pfn commented on GitHub (Feb 4, 2026):
re this @Bickio -- I'm not reading further in the backlog of messages here, but there is some confusion in what you asked chatgpt:
I've mentioned previously that I am new to using openwebui, so I don't really know how to confirm whether this is actually what is happening, because I do not know how to see the chat stream that is sent to the LLM.
@Bickio commented on GitHub (Feb 4, 2026):
@pfn agreed, and I pointed this out previously too
@Classic298 then agreed that Open WebUI was using the API wrong, but denied that the misuse impacts the user experience, citing the lack of users reporting issues, and suggested that the issue was with my infrastructure or configuration.
It's been a frustrating journey, but I hope we're on the home stretch now
@Classic298 commented on GitHub (Feb 4, 2026):
then how come it fixes it for me and other users?
The answer is simple: because you are experiencing a different issue.
And I stand by it unless someone can give us a reproducible example.
Claude Haiku, Sonnet, Opus - all do tool calling perfectly fine even without very strict adherence to the tool role standard.
Besides, you created a separate discussion for this where to discuss exactly that. This issue is, and I said it again and again, in the scope of what was reported.
Besides, Tim said a tool_output section may be introduced in dev in the PR comments. And yet we are discussing on the wrong issue the wrong topic for something else that might be addressed soon.
@pfn @Bickio opened a discussion for exactly what you want to discuss
and @Bickio, you calling it a red herring when multiple users confirmed this fixes it for them simply proofs once more that you have a different issue than what this issue is about
@Bickio commented on GitHub (Feb 4, 2026):
Fixes what exactly?
The original poster said this:
@pfn said this:
In both cases, after applying your PR:
Both of these cases match both my own original predictions and my personal experience too.
@Classic298 As far as I can tell, the only person who's reported that the PR completely fixes the hallucinations is you
I opened a separate discussion at your request, not because I thought it was a distinct issue
See above. Multiple users (including me) have confirmed that the HTML is now unescaped, yes. And also all of those users have reported that this has not fixed the hallucinations
@Bickio commented on GitHub (Feb 4, 2026):
I can't find the comment you're referring to. Would it be possible to get some clarity on what is planned?
@Classic298 commented on GitHub (Feb 4, 2026):
See there #21135
@Bickio commented on GitHub (Feb 4, 2026):
These comments?
Are you suggesting these users reproduced the hallucinations? Seems more likely to me that they're just confirming that the HTML unescaping correctly unescapes the HTML, which I already agree with.
In fact, one of those users, who appears to be another Open WebUI contributor, posted their testing methodology at the top of this issue:
Indeed, this testing methodology is enough to check the HTML unescaping, but doesn't do anything to reproduce the real issue of LLM hallucinations.
The comment in question is extremely brief:
I assumed there must have been more detail hiding somewhere, hence my request for clarity. Nevertheless, it sounds like a step generally in the right direction, so I'm hopeful we'll see some progress towards fixing this issue at last.
@Classic298 commented on GitHub (Feb 4, 2026):
I am certain that in a few days we will see a commit re: this on dev
@espen96 commented on GitHub (Feb 7, 2026):
I wanted to note that I've seen the same behavior. And at times, rarely, I've seen GLM 4.7 Flash visibly confused in its reasoning. Stating that it now has a tool result it sees after a web search, and user given context.
Later when debugging another tool, that resulted in it following up with a search. I questioned it about the tool call. It was confused, for some reason it was no longer able to recognize that it had made a tool call.
It has also fallen for the problem of attempting to write a tool output as it is effectively being shown a multi shot example of how retrival works.
Looking at search especially, it does appear that all the searching is merged together, and then sent in the user message using the normal search setup? If so, no wonder the LLM sees two sources for the same information and doesn't understand what happened. It searched. It sees the results, but it always had the results?
@jrodguez commented on GitHub (Feb 10, 2026):
I've been experiencing similar issues with gpt-oss:20b served over vLLM and connected to openwebui via openai compatible endpoint. Similar to #20926 , tool calls will work great for a bit but after a few turns, stops generating responses. Checked backend logs and I indeed see the " everywhere. Noticed the same thing happens in opencode like #20896 (but I don't route through openwebui in this instance?). Going to try out the solution reported here.
@Classic298 commented on GitHub (Feb 15, 2026):
Fixed by
f2aca781c8