mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 11:28:35 -05:00
[GH-ISSUE #1081] feat: smarter ollama load balancing #27855
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @tjbck on GitHub (Mar 7, 2024).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/1081
right now it just uses
random.choices, refer to my comment here:8ed5759d0e/backend/apps/ollama/main.py (L39)@asedmammad commented on GitHub (Mar 17, 2024):
@tjbck I can work on this one. I am considering implementing a configuration parameter that allows users to select from various strategies.
@tjbck commented on GitHub (Mar 17, 2024):
@asedmammad Feel free to create a draft PR! I'll also get actively involved in this one!
@lewismacnow commented on GitHub (Apr 12, 2024):
I'd be very thankful for an ability to restrict models and/or Users to a specific connection (i.e. prevent/stop load balancing in open-webui in some situations).
I drew some examples of these situations in #1527
My reason for this is:
The loading/unloading process adds seconds to the response so if multiple users are activiely using multiple models which are shared bewteen connections, we will potentially add time to responses (whilst the model is removed/added to memory) instead of benefiting from load balancing over multiple connections.
@longregen commented on GitHub (Apr 30, 2024):
I'd like to bump this since now Ollama has support for multiple loaded models at the same time: https://github.com/ollama/ollama/pull/3418
In my testing with the "rc5" of 0.33, it worked just fine without changes on open-webui
@bkev commented on GitHub (Aug 11, 2024):
I think it would be really useful to have a preference for a certain node when it's online if that's possible? For example, openweb UI running on a low powered server with Ollama, but a higher powered node, when powered up and available is preferred for to better performance.
@haydonryan commented on GitHub (Sep 23, 2024):
100% came here to post this. I have an epyc server that stays on 24/7 and gives ok performance. I'd much rather my desktop (with a 3090) serve the model when it's on. At the moment I have found a workaround to copy the model name to "llama3.1:latest-desktop" so it appears as something different in the list and manually select that, but would love something more automated.
Keep up the great work with openwebui!
@bkev commented on GitHub (Sep 28, 2024):
If there are 2 of the same models on 2 different servers, how do you rename them so it's clear which server they are on?
@haydonryan commented on GitHub (Sep 30, 2024):
just call it something like blah-desktop (or whatever you want to identify. Note it justcopies the ID, does not copy all the bits so it's space efficient.
@uocnb commented on GitHub (Jan 9, 2025):
Hi there, I would like to bump on this feature. It would be be very useful if we have ollama server / connection name. It will be easy for priority ordering (better with conditions but it will be more complex), prefix / suffix in the model selection.