enh/issue: performance optimisation #1431

GiteaMirror commented

@tjbck commented on GitHub (Jul 4, 2024):

If you could share an export of your chat so that we can diagnose which part of the UI is being the bottleneck, that would be sublime!

@tjbck commented on GitHub (Jul 4, 2024): If you could share an export of your chat so that we can diagnose which part of the UI is being the bottleneck, that would be sublime!

GiteaMirror commented

generated.json
generate.py.txt

@vertago1 commented on GitHub (Jul 4, 2024):

I generated a chat large enough to trigger the same behavior since it would be a pain to organically create one I am fine with sharing big enough just for this, and have included it along with the generator.py file (as a .txt since uploading .py isn't allowed) in case you want to play around with different generated content and sizes.

To reproduce the behavior I imported the chat and sent a message through the UI. The client side browser locked up for a bit, but after waiting it will eventually continue and display the new message UI and eventually fill in the assistant message.

The performance is much, much better with a shorter chat. Let me know if that doesn't work for you, or you need more information.

@vertago1 commented on GitHub (Jul 4, 2024): I generated a chat large enough to trigger the same behavior since it would be a pain to organically create one I am fine with sharing big enough just for this, and have included it along with the generator.py file (as a .txt since uploading .py isn't allowed) in case you want to play around with different generated content and sizes. [generated.json](https://github.com/user-attachments/files/16102826/generated.json) [generate.py.txt](https://github.com/user-attachments/files/16102827/generate.py.txt) To reproduce the behavior I imported the chat and sent a message through the UI. The client side browser locked up for a bit, but after waiting it will eventually continue and display the new message UI and eventually fill in the assistant message. The performance is much, much better with a shorter chat. Let me know if that doesn't work for you, or you need more information.

GiteaMirror commented

@GrayXu commented on GitHub (Aug 22, 2024):

Encountered similar issues, I controlled the token generation for each dialogue to only come from the cache, while cutting the chat context length to 0 to prevent speed issues with LLM serving.
However, in long threads, the interface lag was very noticeable.
When page streaming output occurred, the page became unresponsive and uncontrollable.

Not very knowledgeable about front-end, unsure if this performance record can be helpful.

@GrayXu commented on GitHub (Aug 22, 2024): Encountered similar issues, I controlled the token generation for each dialogue to only come from the cache, while cutting the chat context length to 0 to prevent speed issues with LLM serving. However, in long threads, the interface lag was very noticeable. When page streaming output occurred, the page became unresponsive and uncontrollable. ![1724336557070](https://github.com/user-attachments/assets/51866922-08f6-4f5e-b13d-da2d8194fa2c) Not very knowledgeable about front-end, unsure if this performance record can be helpful.

GiteaMirror commented

@tjbck commented on GitHub (Aug 22, 2024):

PR Welcome

@tjbck commented on GitHub (Aug 22, 2024): PR Welcome

GiteaMirror commented

@george-elphick-talieisin commented on GitHub (Sep 4, 2024):

Just to add to this. I don't know if this might be browser specific. I see horrible lockups and slow downs in Safari on macOS but not in Edge on the same machine.

Not sure if this really helps?

@george-elphick-talieisin commented on GitHub (Sep 4, 2024): Just to add to this. I don't know if this might be browser specific. I see horrible lockups and slow downs in Safari on macOS but not in Edge on the same machine. ![image](https://github.com/user-attachments/assets/9ef5b459-a61f-4908-8b39-cb396ca5aeb8) Not sure if this really helps?

GiteaMirror commented

@ingmferrer commented on GitHub (Sep 4, 2024):

I'm experiencing performance issues in Edge, Chrome, and Arc Browser on macOS, specifically when the UI has long conversations. These issues occur particularly when the LLM is responding, which might be related to how the front end handles streaming chunks of data. Additionally, when using Arc Browser on iOS, the browser frequently crashes under similar circumstances, and I have to manually force close the app for it to work again.

@ingmferrer commented on GitHub (Sep 4, 2024): I'm experiencing performance issues in Edge, Chrome, and Arc Browser on macOS, specifically when the UI has long conversations. These issues occur particularly when the LLM is responding, which might be related to how the front end handles streaming chunks of data. Additionally, when using Arc Browser on iOS, the browser frequently crashes under similar circumstances, and I have to manually force close the app for it to work again.

GiteaMirror commented

@george-elphick-talieisin commented on GitHub (Sep 4, 2024):

In my case I get one core on 100% on the safari tab, seemingly consumed by a JavaScript process, but the profiler doesn't tell me what is using all that cpu time.

@george-elphick-talieisin commented on GitHub (Sep 4, 2024): In my case I get one core on 100% on the safari tab, seemingly consumed by a JavaScript process, but the profiler doesn't tell me what is using all that cpu time.

GiteaMirror commented

@quantarion commented on GitHub (Sep 4, 2024):

Same here, Chrome on ChromeOS Flex version 129. It gets very slow as the chat length grows. I often get a prompt from chrome asking me if I want to kill the page or wait for it to finish.

I got this with the profiler in chrome:

However, if I reload the page and continue the chat, the lag is mostly gone.

@quantarion commented on GitHub (Sep 4, 2024): Same here, Chrome on ChromeOS Flex version 129. It gets very slow as the chat length grows. I often get a prompt from chrome asking me if I want to kill the page or wait for it to finish. I got this with the profiler in chrome: ![Screenshot 2024-09-04 13 57 55](https://github.com/user-attachments/assets/e78f5633-87d2-4b0e-aab2-dd52a66ab79d) However, if I reload the page and continue the chat, the lag is mostly gone.

GiteaMirror commented

@george-elphick-talieisin commented on GitHub (Sep 4, 2024):

I've carried on stress-testing Edge on macOS which has slightly better profiling and found slowdown, but not quite as extreme as Safari. Note the following:

Note the "animation time" which is the ui's placeholder box after hitting the Send message button. I'm using Groq here, so the thinking time is too long for it to be groq's fault. Rendering the response uses up quite a lot of CPU, which is articulated in more detail here:

Hope this helps

@george-elphick-talieisin commented on GitHub (Sep 4, 2024): I've carried on stress-testing Edge on macOS which has slightly better profiling and found slowdown, but not quite as extreme as Safari. Note the following: ![image](https://github.com/user-attachments/assets/52089a5e-4e12-46a1-8032-9dfdfa9b8eca) Note the "animation time" which is the ui's placeholder box after hitting the Send message button. I'm using Groq here, so the thinking time is too long for it to be groq's fault. Rendering the response uses up quite a lot of CPU, which is articulated in more detail here: ![image](https://github.com/user-attachments/assets/7dfacfbc-b2b0-4456-ad3b-fc017b161e0f) Hope this helps

GiteaMirror commented

@Keithsc commented on GitHub (Sep 19, 2024):

I completely agree with the previous comments regarding the lag in Open-WebUI, especially with long chat threads. The slowness and high CPU usage make the experience quite frustrating. I've found that restarting Firefox seems to help temporarily, but it's inconvenient. It's good to know this isn't just a Firefox issue. Hopefully, this will be resolved soon, as I sometimes even switch to using Firefox on my phone to continue chatting when the lag becomes too severe.

@Keithsc commented on GitHub (Sep 19, 2024): I completely agree with the previous comments regarding the lag in Open-WebUI, especially with long chat threads. The slowness and high CPU usage make the experience quite frustrating. I've found that restarting Firefox seems to help temporarily, but it's inconvenient. It's good to know this isn't just a Firefox issue. Hopefully, this will be resolved soon, as I sometimes even switch to using Firefox on my phone to continue chatting when the lag becomes too severe.

GiteaMirror commented

@tjbck commented on GitHub (Sep 19, 2024):

I'll start my thorough investigation shortly but I'd greatly appreciate some help from the community here!

@tjbck commented on GitHub (Sep 19, 2024): I'll start my thorough investigation shortly but I'd greatly appreciate some help from the community here!

GiteaMirror commented

@GrayXu commented on GitHub (Sep 20, 2024):

An interesting observation is that in the latest version 0.3.22, there isn't a noticeable lag or blocking in page interactions with long threads, but the model token output still significantly slows down to ~5 token/s.

@GrayXu commented on GitHub (Sep 20, 2024): An interesting observation is that in the latest version 0.3.22, there isn't a noticeable lag or blocking in page interactions with long threads, but the model token output still significantly slows down to ~5 token/s.

GiteaMirror commented

@Keithsc commented on GitHub (Sep 20, 2024):

I upgraded to 0.3.22 also but I still think long threads are a problem, I've looked at Firefox / devtools / console and there seems to be a awful lot of logging happening in there, but I am not a developer so maybe that's normal.

What I have discovered is that if I go to Open-WebUI settings and set "Stream Chat Response" to "off" is that I still get the reply but it's just in a single dump but my cpu is now idle and I don't get any lock ups. Only down side is you don't get to see the response output in real time. This might be a work around for some ?

@Keithsc commented on GitHub (Sep 20, 2024): I upgraded to 0.3.22 also but I still think long threads are a problem, I've looked at Firefox / devtools / console and there seems to be a awful lot of logging happening in there, but I am not a developer so maybe that's normal. What I have discovered is that if I go to Open-WebUI settings and set "Stream Chat Response" to "off" is that I still get the reply but it's just in a single dump but my cpu is now idle and I don't get any lock ups. Only down side is you don't get to see the response output in real time. This might be a work around for some ?

GiteaMirror commented

@quantarion commented on GitHub (Sep 20, 2024):

hey tjbck, just tell me what you need, I can compile/test/send reports as much as you need.

@quantarion commented on GitHub (Sep 20, 2024): hey tjbck, just tell me what you need, I can compile/test/send reports as much as you need.

GiteaMirror commented

@dncc89 commented on GitHub (Sep 21, 2024):

I've found it's parsing the whole chat history into markdown every time when a new token arrives.

@dncc89 commented on GitHub (Sep 21, 2024): I've found it's parsing the whole chat history into markdown every time when a new token arrives.

GiteaMirror commented

@GrayXu commented on GitHub (Sep 22, 2024):

pagination/lazyloading may be a general optimization here

@GrayXu commented on GitHub (Sep 22, 2024): pagination/lazyloading may be a general optimization here

GiteaMirror commented

@thiswillbeyourgithub commented on GitHub (Sep 22, 2024):

I've found it's parsing the whole chat history into markdown every time when a new token arrives.

Hi I'm coming from #4881 and wanted to add that actually I think it might be every time a token is rendered and not every time a new token is received. As from my very limited testing it seems that turning off "Fluidly stream large external response chunks" while mermaid charts are rendered seems to make the generation noticeably faster.

@thiswillbeyourgithub commented on GitHub (Sep 22, 2024): > I've found it's parsing the whole chat history into markdown every time when a new token arrives. Hi I'm coming from #4881 and wanted to add that actually I think it might be every time a token is rendered and not every time a new token is received. As from my very limited testing it seems that turning off "Fluidly stream large external response chunks" while mermaid charts are rendered seems to make the generation noticeably faster.

GiteaMirror commented

@tjbck commented on GitHub (Sep 23, 2024):

Testing wanted in latest dev! Several optimisation techniques have been applied to our message rendering process (e.g. infinite scrolling), so significant performance gain is expected here.

@tjbck commented on GitHub (Sep 23, 2024): Testing wanted in latest dev! Several optimisation techniques have been applied to our message rendering process (e.g. infinite scrolling), so significant performance gain is expected here.

GiteaMirror commented

@ingmferrer commented on GitHub (Sep 24, 2024):

After testing git-822c47c, I noticed a significant improvement. The chat experience feels much smoother, especially in longer conversations. Considering I was using Arc Browser on iOS, the lag has almost completely disappeared. It was previously unusable, so this is a major enhancement.

@ingmferrer commented on GitHub (Sep 24, 2024): After testing [git-822c47c](https://github.com/open-webui/open-webui/pkgs/container/open-webui/278205972?tag=git-822c47c), I noticed a significant improvement. The chat experience feels much smoother, especially in longer conversations. Considering I was using Arc Browser on iOS, the lag has almost completely disappeared. It was previously unusable, so this is a major enhancement.

GiteaMirror commented

@thiswillbeyourgithub commented on GitHub (Sep 24, 2024):

So this fixed my mermaid issue as completions are now just as fast as regular text but I'm suspecting a regression when dealing with vision models. I've tested it with a local Ollama model as well as 4o from OpenAI and Cloud Sonnet. And in all cases, the generation is suspiciously slow, just like for the mermaid before the fix. This was on Brave Browser and on 0.3.26. Am I the only one?

@thiswillbeyourgithub commented on GitHub (Sep 24, 2024): So this fixed my mermaid issue as completions are now just as fast as regular text but I'm suspecting a regression when dealing with vision models. I've tested it with a local Ollama model as well as 4o from OpenAI and Cloud Sonnet. And in all cases, the generation is suspiciously slow, just like for the mermaid before the fix. This was on Brave Browser and on 0.3.26. Am I the only one?

GiteaMirror commented

@kaiiiiiiiii commented on GitHub (Sep 24, 2024):

After testing git-822c47c, I noticed a significant improvement. The chat experience feels much smoother, especially in longer conversations. Considering I was using Arc Browser on iOS, the lag has almost completely disappeared. It was previously unusable, so this is a major enhancement.

Same for me. Experience in Arc on iOS improved x100 🥳

@kaiiiiiiiii commented on GitHub (Sep 24, 2024): > After testing [git-822c47c](https://github.com/open-webui/open-webui/pkgs/container/open-webui/278205972?tag=git-822c47c), I noticed a significant improvement. The chat experience feels much smoother, especially in longer conversations. Considering I was using Arc Browser on iOS, the lag has almost completely disappeared. It was previously unusable, so this is a major enhancement. Same for me. Experience in Arc on iOS improved x100 🥳

GiteaMirror commented

@kevinleeex commented on GitHub (Oct 14, 2024):

Hi @tjbck, thanks for resolving this out, it runs perfect with Chromium based browsers, but the issue still exist on Safari, for both Mac and iPhone.

@kevinleeex commented on GitHub (Oct 14, 2024): Hi @tjbck, thanks for resolving this out, it runs perfect with Chromium based browsers, but the issue still exist on Safari, for both Mac and iPhone.

GiteaMirror commented

@vlbosch commented on GitHub (Oct 15, 2024):

Hi @tjbck, thanks for resolving this out, it runs perfect with Chromium based browsers, but the issue still exist on Safari, for both Mac and iPhone.

I can confirm the issue still arises on Safari. After long chats or prompts with much context, the page hangs until the answer is streamed. Switching between chats with much chats/context is very slow as well.

Please let me know what I can do to help triage the issue on Safari.

@vlbosch commented on GitHub (Oct 15, 2024): > Hi @tjbck, thanks for resolving this out, it runs perfect with Chromium based browsers, but the issue still exist on Safari, for both Mac and iPhone. I can confirm the issue still arises on Safari. After long chats or prompts with much context, the page hangs until the answer is streamed. Switching between chats with much chats/context is very slow as well. Please let me know what I can do to help triage the issue on Safari.

GiteaMirror commented

2025-11-11 14:45:05 -06:00

@kanubacode commented on GitHub (Nov 1, 2024):

I am experiencing this issue for very long chats on Firefox. What takes a second or two starts taking longer than a minute, and sometimes it loses connection and upon re-establishing it encounters a session ID traceback, in which case the page must be reloaded and the response has to be regenerated to continue the chat. The extremely long responses are the real frustration though. If I export my chat file to plain text, it is about 2.5M, which is the size at which it became more than a slowdown issue, but much earlier for a large inconvenience.

@kanubacode commented on GitHub (Nov 1, 2024): I am experiencing this issue for very long chats on Firefox. What takes a second or two starts taking longer than a minute, and sometimes it loses connection and upon re-establishing it encounters a session ID traceback, in which case the page must be reloaded and the response has to be regenerated to continue the chat. The extremely long responses are the real frustration though. If I export my chat file to plain text, it is about 2.5M, which is the size at which it became more than a slowdown issue, but much earlier for a large inconvenience.

GiteaMirror commented

@kanubacode commented on GitHub (Nov 3, 2024):

@tjbck This seems to not be an issue with Chromium -- I was able to continue that chat with fast responses for a couple of days, and then... a different issue with long chats on Chromium:

Uncaught (in promise) RangeError: Maximum call stack size exceeded
    at Ze (Chat.svelte:604:30)
    at Ze (Chat.svelte:611:15)
    at Ze (Chat.svelte:611:15)
    at Ze (Chat.svelte:611:15)
    at Ze (Chat.svelte:611:15)
    at Ze (Chat.svelte:611:15)
    at Ze (Chat.svelte:611:15)
    at Ze (Chat.svelte:611:15)
    at Ze (Chat.svelte:611:15)
    at Ze (Chat.svelte:611:15)

const createMessagesList is overflowing. It's unfortunate that my 2 week old chat is now at a standstill. If there's anything I can do to help address this, let me know.

@kanubacode commented on GitHub (Nov 3, 2024): @tjbck This seems to not be an issue with Chromium -- I was able to continue that chat with fast responses for a couple of days, and then... a different issue with long chats on Chromium: ```js Uncaught (in promise) RangeError: Maximum call stack size exceeded at Ze (Chat.svelte:604:30) at Ze (Chat.svelte:611:15) at Ze (Chat.svelte:611:15) at Ze (Chat.svelte:611:15) at Ze (Chat.svelte:611:15) at Ze (Chat.svelte:611:15) at Ze (Chat.svelte:611:15) at Ze (Chat.svelte:611:15) at Ze (Chat.svelte:611:15) at Ze (Chat.svelte:611:15) ``` `const createMessagesList` is overflowing. It's unfortunate that my 2 week old chat is now at a standstill. If there's anything I can do to help address this, let me know.

GiteaMirror commented

2025-11-11 14:45:05 -06:00

@vertago1 commented on GitHub (Nov 3, 2024):

@tjbck This seems to not be an issue with Chromium -- I was able to continue that chat with fast responses for a couple of days, and then... a different issue with long chats on Chromium:
Uncaught (in promise) RangeError: Maximum call stack size exceeded
    at Ze (Chat.svelte:604:30)
    at Ze (Chat.svelte:611:15)
    at Ze (Chat.svelte:611:15)
    at Ze (Chat.svelte:611:15)
    at Ze (Chat.svelte:611:15)
    at Ze (Chat.svelte:611:15)
    at Ze (Chat.svelte:611:15)
    at Ze (Chat.svelte:611:15)
    at Ze (Chat.svelte:611:15)
    at Ze (Chat.svelte:611:15)
const createMessagesList is overflowing. It's unfortunate that my 2 week old chat is now at a standstill. If there's anything I can do to help address this, let me know.

I wonder if the recursion could be replaced with a non-recursive function and a heap allocated list.

@vertago1 commented on GitHub (Nov 3, 2024): > @tjbck This seems to not be an issue with Chromium -- I was able to continue that chat with fast responses for a couple of days, and then... a different issue with long chats on Chromium: > > ```js > Uncaught (in promise) RangeError: Maximum call stack size exceeded > at Ze (Chat.svelte:604:30) > at Ze (Chat.svelte:611:15) > at Ze (Chat.svelte:611:15) > at Ze (Chat.svelte:611:15) > at Ze (Chat.svelte:611:15) > at Ze (Chat.svelte:611:15) > at Ze (Chat.svelte:611:15) > at Ze (Chat.svelte:611:15) > at Ze (Chat.svelte:611:15) > at Ze (Chat.svelte:611:15) > ``` > > `const createMessagesList` is overflowing. It's unfortunate that my 2 week old chat is now at a standstill. If there's anything I can do to help address this, let me know. I wonder if the recursion could be replaced with a non-recursive function and a heap allocated list.

GiteaMirror commented

@kanubacode commented on GitHub (Nov 3, 2024):

After testing some other browsers, I can confirm that the slowness for my long chat is just as impossible to use with Midori and other Gecko browsers. Similarly, the above stack overflow occurs with other Blink-based browsers such as Brave and Vivaldi.

@kanubacode commented on GitHub (Nov 3, 2024): After testing some other browsers, I can confirm that the slowness for my long chat is just as impossible to use with Midori and other Gecko browsers. Similarly, the above stack overflow occurs with other Blink-based browsers such as Brave and Vivaldi.

GiteaMirror commented

@kevinhq commented on GitHub (Feb 16, 2025):

is this solved? i was going to give up on open web ui. ollama responds fast, but openweb ui is not. i am on 16gb ram, ryzen 5 3600 6-core, with debian 12.

@kevinhq commented on GitHub (Feb 16, 2025): is this solved? i was going to give up on open web ui. ollama responds fast, but openweb ui is not. i am on 16gb ram, ryzen 5 3600 6-core, with debian 12.

GiteaMirror commented

@vertago1 commented on GitHub (Feb 16, 2025):

I haven't noticed performance issues on large chats anymore, but there are times when I need to adjust down the context length I am using down from the max so I have enough vram and doesn't switch back to CPU for ollama.

@vertago1 commented on GitHub (Feb 16, 2025): I haven't noticed performance issues on large chats anymore, but there are times when I need to adjust down the context length I am using down from the max so I have enough vram and doesn't switch back to CPU for ollama.

GiteaMirror commented

@merrime-n commented on GitHub (Jun 1, 2025):

What is the current state of this issue? I am also experiencing serious performance issues working with Open WebUI; for instance, old and relatively large chat are loaded too slowly. Sometimes the admin panel tabs take forever to load. It's a pity because Open WebUI has been a lifesaver for us and only performance is its problem.

@merrime-n commented on GitHub (Jun 1, 2025): What is the current state of this issue? I am also experiencing serious performance issues working with Open WebUI; for instance, old and relatively large chat are loaded too slowly. Sometimes the admin panel tabs take forever to load. It's a pity because Open WebUI has been a lifesaver for us and only performance is its problem.

GiteaMirror commented