mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 10:58:17 -05:00
[GH-ISSUE #6452] Ollama load balancing doesn't supprot different model names with the same ID. #29900
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @haydonryan on GitHub (Oct 26, 2024).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/6452
Bug Report
Probably also related to https://github.com/open-webui/open-webui/issues/1081
Installation Method
docker-compose.
Environment
Open WebUI Version: v0.3.33
Ollama (if applicable): 0.3.6
Operating System: Arch linux
Browser (if applicable): Brave
Confirmation:
Expected Behavior:
I expect different model names to appear as different models in the openwebui dropdown.
Actual Behavior:
OpenWeb UI is seeing these as the same model (because they are, just aliased). I htink this is becuase the ID numbers are the same. Openwebui is ignoring the model names and combining the two. however this results in a 404 because it's only associating one model name with the index. When openwebui load blances to the other model, it can't find it as the call to ollama uses the model name not the ID.
Description
Bug Summary:
Custom Model Names do not work correctly with loadbalancing in openwebui.
Reproduction Details
Steps to Reproduce:
I have two machines that run Ollama servers - epyc 16 core based that only has CPU, and a desktop with a 3090 in it. I used
ollama cpto rename the models so that if the desktop was on i could choose the fast option.Epyc:
NAME ID SIZE
codestral:22b-v0.1-q8_0-cpu-16 8dde0029a91f 23 GB
Desktop:
codestral:22b-v0.1-q8_0-desktop 8dde0029a91f 23 GB
Logs and Screenshots
Screenshots/Screen Recordings (if applicable):

Additional Information
The only reason i'm using custom model names is because there's no ollama server load balancing where it allows you to prefer one server over the other (or routes to the more performant server).
My needs are simple so I'd be happy with "if desktop is offline then use server cpu"
@tjbck commented on GitHub (Oct 26, 2024):
Ollama load balancing will be deprecated in favour of #5680 in the near future! Stay tuned!