[GH-ISSUE #1740] feat: Response rendering performance for long conversations, especially with <code/> blocks #28148

New Issue

GiteaMirror · 2026-04-25T02:53:39-05:00

GiteaMirror commented

2026-04-25 02:53:39 -05:00

Originally created by @domsleee on GitHub (Apr 25, 2024).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/1740

For long conversations, the response from ollama lags a bit in chat, especially when there are <code/> blocks in the response that use katex to render.

Reproduction

Start with the phrase solve for x, x^2 = 4, and then keep prompting with do it again:

The first time (good):

After about 18 "do it again", the slow rendering can be seen:

Suggested solutions

An easy improvement is to only run katex on the children of ResponseMessage, I will make a PR for that.

The reason it is causing problems is that ResponseMessage is rendered for every response, every time a new token is received. So if you have 100 responses and ollama returns 5 words per second, you would have katex being called 500 times / second. In this example, there are other expensive methods being called 500 times / second, such as marked.parse and marked.render.

Further improvements could be:

Find a way for svelte to not render every ResponseMessage, when only one of the message has changed. Something like angular's ChangeDetectionStrategy#OnPush. Maybe svelte-options immutable, some good discussion here
Cache some of these expensive methods, e.g. use katex renderToString and store the result in a hashmap with a key of (message.id, message.content.length)

Also, the autoscroll seems to not scroll sometimes - a lot of these issues have become more apparent since llama3 is so fast 😁

Originally created by @domsleee on GitHub (Apr 25, 2024). Original GitHub issue: https://github.com/open-webui/open-webui/issues/1740 For long conversations, the response from ollama lags a bit in chat, especially when there are `<code/>` blocks in the response that use katex to render. ## Reproduction Start with the phrase `solve for x, x^2 = 4`, and then keep prompting with `do it again`: The first time (good): ![open-webui-katex-good-small](https://github.com/open-webui/open-webui/assets/14891742/9b7a98e9-4ba9-4045-8877-e74f61259410) After about 18 "do it again", the slow rendering can be seen: ![open-webui-katex-slow-small](https://github.com/open-webui/open-webui/assets/14891742/80218c7c-3367-418c-a15a-0661de144e93) ## Suggested solutions An easy improvement is to only run katex on the children of [ResponseMessage](https://github.com/open-webui/open-webui/blob/1092ee9c1c130519be4f17712dda4850165ee2b9/src/lib/components/chat/Messages/ResponseMessage.svelte), I will make a PR for that. The reason it is causing problems is that `ResponseMessage` is rendered for every response, every time a new token is received. So if you have 100 responses and ollama returns 5 words per second, you would have katex being called 500 times / second. In this example, there are other expensive methods being called 500 times / second, such as `marked.parse` and `marked.render`. Further improvements could be: 1. Find a way for svelte to not render every `ResponseMessage`, when only one of the message has changed. Something like angular's [ChangeDetectionStrategy#OnPush](https://angular.io/api/core/ChangeDetectionStrategy). Maybe [svelte-options immutable](https://svelte.dev/tutorial/svelte-options), some good discussion [here](https://github.com/urql-graphql/urql/issues/704#issuecomment-614770126) 2. Cache some of these expensive methods, e.g. use [katex renderToString](https://katex.org/docs/api) and store the result in a hashmap with a key of `(message.id, message.content.length)` Also, the autoscroll seems to not scroll sometimes - a lot of these issues have become more apparent since llama3 is so fast 😁

GiteaMirror closed this issue

2026-04-25 02:53:40 -05:00

GiteaMirror commented

2026-04-25 02:53:41 -05:00

@gaby commented on GitHub (Mar 20, 2025):

@domsleee @tjbck This is actually still a major issue. I can see from the logs that with both ollama and vllm the response was generated but the UI takes a loooooong time to render the code in the chat.

Seems like it's trying to apply syntax highlighting on every change.

On a code base of ~800 lines, it took almost a minute to render after GPU response

@gaby commented on GitHub (Mar 20, 2025): @domsleee @tjbck This is actually still a major issue. I can see from the logs that with both ollama and vllm the response was generated but the UI takes a loooooong time to render the code in the chat. Seems like it's trying to apply syntax highlighting on every change. On a code base of ~800 lines, it took almost a minute to render after GPU response

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/open-webui#28148

[GH-ISSUE #1740] feat: Response rendering performance for long conversations, especially with <code/> blocks #28148

Reproduction

Suggested solutions

[GH-ISSUE #1740] feat: Response rendering performance for long conversations, especially with `<code/>` blocks #28148