Enhancement: Ollama server prioritisation or selection #1050

New Issue

GiteaMirror · 2025-11-11T14:36:17-06:00

GiteaMirror commented

2025-11-11 14:36:17 -06:00

Originally created by @nexy7574 on GitHub (May 27, 2024).

Is your feature request related to a problem? Please describe.
Me and a couple friends all run our own instances of Ollama, which are all running on different hardware. For instance, one friend is running Ollama with a GPU, however on a PC, which is not always online. I am running an ollama instance on both my pc, which is CPU-only, however significantly more powerful than my homeserver, which is also running ollama, however with a weak 4-core CPU. In open-webui, adding multiple servers appears to only load-balance, offering no choice of which server runs what, and doesn't allow a failover-style method.

Describe the solution you'd like
A way to define a server priority list. I.e:

Instance 1 with GPU
Instance 2 with powerful CPU - utilised if Instance 1 is unreachable
Instance 3 with weak CPU - always available low-power backup option

Alternatively, being able to select which instance is used in a chat would be equally useful.

Describe alternatives you've considered
Intercepting requests via a reverse proxy, or simply hosting an OWUI instance for each ollama instance (this is suboptimal since we'd then have to create accounts for each instance)

Originally created by @nexy7574 on GitHub (May 27, 2024). **Is your feature request related to a problem? Please describe.** Me and a couple friends all run our own instances of Ollama, which are all running on different hardware. For instance, one friend is running Ollama with a GPU, however on a PC, which is not always online. I am running an ollama instance on both my pc, which is CPU-only, however significantly more powerful than my homeserver, which is also running ollama, however with a weak 4-core CPU. In open-webui, adding multiple servers appears to only load-balance, offering no choice of which server runs what, and doesn't allow a failover-style method. **Describe the solution you'd like** A way to define a server priority list. I.e: 1. Instance 1 with GPU 2. Instance 2 with powerful CPU - utilised if Instance 1 is unreachable 3. Instance 3 with weak CPU - always available low-power backup option Alternatively, being able to select which instance is used in a chat would be equally useful. **Describe alternatives you've considered** Intercepting requests via a reverse proxy, or simply hosting an OWUI instance for each ollama instance (this is suboptimal since we'd then have to create accounts for each instance)

GiteaMirror closed this issue

2025-11-11 14:36:18 -06:00

GiteaMirror commented

2025-11-11 14:36:18 -06:00

@tjbck commented on GitHub (May 27, 2024):

Duplicate #1081, #1785, Let's continue our discussion there.

@tjbck commented on GitHub (May 27, 2024): Duplicate #1081, #1785, Let's continue our discussion there.

GiteaMirror referenced this issue

2025-11-11 17:24:28 -06:00

[PR #1050] [MERGED] feat: added ocr functionality to the pdf loader #7360

GiteaMirror referenced this issue

2026-04-20 03:03:08 -05:00