mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-23 18:25:10 -05:00
[GH-ISSUE #1740] feat: Response rendering performance for long conversations, especially with <code/> blocks
#28148
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @domsleee on GitHub (Apr 25, 2024).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/1740
For long conversations, the response from ollama lags a bit in chat, especially when there are
<code/>blocks in the response that use katex to render.Reproduction
Start with the phrase
solve for x, x^2 = 4, and then keep prompting withdo it again:The first time (good):

After about 18 "do it again", the slow rendering can be seen:

Suggested solutions
An easy improvement is to only run katex on the children of ResponseMessage, I will make a PR for that.
The reason it is causing problems is that
ResponseMessageis rendered for every response, every time a new token is received. So if you have 100 responses and ollama returns 5 words per second, you would have katex being called 500 times / second. In this example, there are other expensive methods being called 500 times / second, such asmarked.parseandmarked.render.Further improvements could be:
ResponseMessage, when only one of the message has changed. Something like angular's ChangeDetectionStrategy#OnPush. Maybe svelte-options immutable, some good discussion here(message.id, message.content.length)Also, the autoscroll seems to not scroll sometimes - a lot of these issues have become more apparent since llama3 is so fast 😁
@gaby commented on GitHub (Mar 20, 2025):
@domsleee @tjbck This is actually still a major issue. I can see from the logs that with both ollama and vllm the response was generated but the UI takes a loooooong time to render the code in the chat.
Seems like it's trying to apply syntax highlighting on every change.
On a code base of ~800 lines, it took almost a minute to render after GPU response