mirror of
https://github.com/open-webui/open-webui.git
synced 2026-03-12 10:04:14 -05:00
enh/issue: performance optimisation #1431
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @vertago1 on GitHub (Jul 4, 2024).
Originally assigned to: @tjbck on GitHub.
Is your feature request related to a problem? Please describe.
The performance of long chat threads is really bad. I don't know if it is a browser client side issue or an issue on the server side.
Describe the solution you'd like
A performance fix would be great, but even the ability to do a shallow clone where only the last context window or so of the chat is cloned would help.
Describe alternatives you've considered
I am resorting to starting a fresh chat with memory items added and trying to use the prompt to pick up where the last chat left off.
Additional context
I have tried a few different browsers, but the best fix so far has been to start a new chat and try my best to get it back on roughly the same track.
@tjbck commented on GitHub (Jul 4, 2024):
If you could share an export of your chat so that we can diagnose which part of the UI is being the bottleneck, that would be sublime!
@vertago1 commented on GitHub (Jul 4, 2024):
I generated a chat large enough to trigger the same behavior since it would be a pain to organically create one I am fine with sharing big enough just for this, and have included it along with the generator.py file (as a .txt since uploading .py isn't allowed) in case you want to play around with different generated content and sizes.
generated.json
generate.py.txt
To reproduce the behavior I imported the chat and sent a message through the UI. The client side browser locked up for a bit, but after waiting it will eventually continue and display the new message UI and eventually fill in the assistant message.
The performance is much, much better with a shorter chat. Let me know if that doesn't work for you, or you need more information.
@GrayXu commented on GitHub (Aug 22, 2024):
Encountered similar issues, I controlled the token generation for each dialogue to only come from the cache, while cutting the chat context length to 0 to prevent speed issues with LLM serving.
However, in long threads, the interface lag was very noticeable.
When page streaming output occurred, the page became unresponsive and uncontrollable.
Not very knowledgeable about front-end, unsure if this performance record can be helpful.
@tjbck commented on GitHub (Aug 22, 2024):
PR Welcome
@george-elphick-talieisin commented on GitHub (Sep 4, 2024):
Just to add to this. I don't know if this might be browser specific. I see horrible lockups and slow downs in Safari on macOS but not in Edge on the same machine.
Not sure if this really helps?
@ingmferrer commented on GitHub (Sep 4, 2024):
I'm experiencing performance issues in Edge, Chrome, and Arc Browser on macOS, specifically when the UI has long conversations. These issues occur particularly when the LLM is responding, which might be related to how the front end handles streaming chunks of data. Additionally, when using Arc Browser on iOS, the browser frequently crashes under similar circumstances, and I have to manually force close the app for it to work again.
@george-elphick-talieisin commented on GitHub (Sep 4, 2024):
In my case I get one core on 100% on the safari tab, seemingly consumed by a JavaScript process, but the profiler doesn't tell me what is using all that cpu time.
@quantarion commented on GitHub (Sep 4, 2024):
Same here, Chrome on ChromeOS Flex version 129. It gets very slow as the chat length grows. I often get a prompt from chrome asking me if I want to kill the page or wait for it to finish.
I got this with the profiler in chrome:
However, if I reload the page and continue the chat, the lag is mostly gone.
@george-elphick-talieisin commented on GitHub (Sep 4, 2024):
I've carried on stress-testing Edge on macOS which has slightly better profiling and found slowdown, but not quite as extreme as Safari. Note the following:
Note the "animation time" which is the ui's placeholder box after hitting the Send message button. I'm using Groq here, so the thinking time is too long for it to be groq's fault. Rendering the response uses up quite a lot of CPU, which is articulated in more detail here:
Hope this helps
@Keithsc commented on GitHub (Sep 19, 2024):
I completely agree with the previous comments regarding the lag in Open-WebUI, especially with long chat threads. The slowness and high CPU usage make the experience quite frustrating. I've found that restarting Firefox seems to help temporarily, but it's inconvenient. It's good to know this isn't just a Firefox issue. Hopefully, this will be resolved soon, as I sometimes even switch to using Firefox on my phone to continue chatting when the lag becomes too severe.
@tjbck commented on GitHub (Sep 19, 2024):
I'll start my thorough investigation shortly but I'd greatly appreciate some help from the community here!
@GrayXu commented on GitHub (Sep 20, 2024):
An interesting observation is that in the latest version 0.3.22, there isn't a noticeable lag or blocking in page interactions with long threads, but the model token output still significantly slows down to ~5 token/s.
@Keithsc commented on GitHub (Sep 20, 2024):
I upgraded to 0.3.22 also but I still think long threads are a problem, I've looked at Firefox / devtools / console and there seems to be a awful lot of logging happening in there, but I am not a developer so maybe that's normal.
What I have discovered is that if I go to Open-WebUI settings and set "Stream Chat Response" to "off" is that I still get the reply but it's just in a single dump but my cpu is now idle and I don't get any lock ups. Only down side is you don't get to see the response output in real time. This might be a work around for some ?
@quantarion commented on GitHub (Sep 20, 2024):
hey tjbck, just tell me what you need, I can compile/test/send reports as much as you need.
@dncc89 commented on GitHub (Sep 21, 2024):
I've found it's parsing the whole chat history into markdown every time when a new token arrives.
@GrayXu commented on GitHub (Sep 22, 2024):
pagination/lazyloading may be a general optimization here
@thiswillbeyourgithub commented on GitHub (Sep 22, 2024):
Hi I'm coming from #4881 and wanted to add that actually I think it might be every time a token is rendered and not every time a new token is received. As from my very limited testing it seems that turning off "Fluidly stream large external response chunks" while mermaid charts are rendered seems to make the generation noticeably faster.
@tjbck commented on GitHub (Sep 23, 2024):
Testing wanted in latest dev! Several optimisation techniques have been applied to our message rendering process (e.g. infinite scrolling), so significant performance gain is expected here.
@ingmferrer commented on GitHub (Sep 24, 2024):
After testing git-822c47c, I noticed a significant improvement. The chat experience feels much smoother, especially in longer conversations. Considering I was using Arc Browser on iOS, the lag has almost completely disappeared. It was previously unusable, so this is a major enhancement.
@thiswillbeyourgithub commented on GitHub (Sep 24, 2024):
So this fixed my mermaid issue as completions are now just as fast as regular text but I'm suspecting a regression when dealing with vision models. I've tested it with a local Ollama model as well as 4o from OpenAI and Cloud Sonnet. And in all cases, the generation is suspiciously slow, just like for the mermaid before the fix. This was on Brave Browser and on 0.3.26. Am I the only one?
@kaiiiiiiiii commented on GitHub (Sep 24, 2024):
Same for me. Experience in Arc on iOS improved x100 🥳
@kevinleeex commented on GitHub (Oct 14, 2024):
Hi @tjbck, thanks for resolving this out, it runs perfect with Chromium based browsers, but the issue still exist on Safari, for both Mac and iPhone.
@vlbosch commented on GitHub (Oct 15, 2024):
I can confirm the issue still arises on Safari. After long chats or prompts with much context, the page hangs until the answer is streamed. Switching between chats with much chats/context is very slow as well.
Please let me know what I can do to help triage the issue on Safari.
@kanubacode commented on GitHub (Nov 1, 2024):
I am experiencing this issue for very long chats on Firefox. What takes a second or two starts taking longer than a minute, and sometimes it loses connection and upon re-establishing it encounters a session ID traceback, in which case the page must be reloaded and the response has to be regenerated to continue the chat. The extremely long responses are the real frustration though. If I export my chat file to plain text, it is about 2.5M, which is the size at which it became more than a slowdown issue, but much earlier for a large inconvenience.
@kanubacode commented on GitHub (Nov 3, 2024):
@tjbck This seems to not be an issue with Chromium -- I was able to continue that chat with fast responses for a couple of days, and then... a different issue with long chats on Chromium:
const createMessagesListis overflowing. It's unfortunate that my 2 week old chat is now at a standstill. If there's anything I can do to help address this, let me know.@vertago1 commented on GitHub (Nov 3, 2024):
I wonder if the recursion could be replaced with a non-recursive function and a heap allocated list.
@kanubacode commented on GitHub (Nov 3, 2024):
After testing some other browsers, I can confirm that the slowness for my long chat is just as impossible to use with Midori and other Gecko browsers. Similarly, the above stack overflow occurs with other Blink-based browsers such as Brave and Vivaldi.
@kevinhq commented on GitHub (Feb 16, 2025):
is this solved? i was going to give up on open web ui. ollama responds fast, but openweb ui is not. i am on 16gb ram, ryzen 5 3600 6-core, with debian 12.
@vertago1 commented on GitHub (Feb 16, 2025):
I haven't noticed performance issues on large chats anymore, but there are times when I need to adjust down the context length I am using down from the max so I have enough vram and doesn't switch back to CPU for ollama.
@merrime-n commented on GitHub (Jun 1, 2025):
What is the current state of this issue? I am also experiencing serious performance issues working with Open WebUI; for instance, old and relatively large chat are loaded too slowly. Sometimes the admin panel tabs take forever to load. It's a pity because Open WebUI has been a lifesaver for us and only performance is its problem.
@DocStatic97 commented on GitHub (Jun 5, 2025):
I too would like to know the state of this issue.
Large chats (or one with one or two images) take ages to load to simply never do.