Enhancement: Ollama server prioritisation or selection #1050

Closed
opened 2025-11-11 14:36:17 -06:00 by GiteaMirror · 1 comment
Owner

Originally created by @nexy7574 on GitHub (May 27, 2024).

Is your feature request related to a problem? Please describe.
Me and a couple friends all run our own instances of Ollama, which are all running on different hardware. For instance, one friend is running Ollama with a GPU, however on a PC, which is not always online. I am running an ollama instance on both my pc, which is CPU-only, however significantly more powerful than my homeserver, which is also running ollama, however with a weak 4-core CPU. In open-webui, adding multiple servers appears to only load-balance, offering no choice of which server runs what, and doesn't allow a failover-style method.

Describe the solution you'd like
A way to define a server priority list. I.e:

  1. Instance 1 with GPU
  2. Instance 2 with powerful CPU - utilised if Instance 1 is unreachable
  3. Instance 3 with weak CPU - always available low-power backup option

Alternatively, being able to select which instance is used in a chat would be equally useful.

Describe alternatives you've considered
Intercepting requests via a reverse proxy, or simply hosting an OWUI instance for each ollama instance (this is suboptimal since we'd then have to create accounts for each instance)

Originally created by @nexy7574 on GitHub (May 27, 2024). **Is your feature request related to a problem? Please describe.** Me and a couple friends all run our own instances of Ollama, which are all running on different hardware. For instance, one friend is running Ollama with a GPU, however on a PC, which is not always online. I am running an ollama instance on both my pc, which is CPU-only, however significantly more powerful than my homeserver, which is also running ollama, however with a weak 4-core CPU. In open-webui, adding multiple servers appears to only load-balance, offering no choice of which server runs what, and doesn't allow a failover-style method. **Describe the solution you'd like** A way to define a server priority list. I.e: 1. Instance 1 with GPU 2. Instance 2 with powerful CPU - utilised if Instance 1 is unreachable 3. Instance 3 with weak CPU - always available low-power backup option Alternatively, being able to select which instance is used in a chat would be equally useful. **Describe alternatives you've considered** Intercepting requests via a reverse proxy, or simply hosting an OWUI instance for each ollama instance (this is suboptimal since we'd then have to create accounts for each instance)
Author
Owner

@tjbck commented on GitHub (May 27, 2024):

Duplicate #1081, #1785, Let's continue our discussion there.

@tjbck commented on GitHub (May 27, 2024): Duplicate #1081, #1785, Let's continue our discussion there.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#1050