[GH-ISSUE #13381] Custom cloud models with remote server configuration #70896

Open
opened 2026-05-04 23:24:58 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @uvulpos on GitHub (Dec 8, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13381

Hey there,

I want to have the opportunity to proxy traffic between two ollama instances so I can share one big graphics card, or work with less power mobile, without to reconfigure my application for development. Is there a possibility to allow that with the cloud models?


Conversation Summary:

  • Possibility to connect ollama to ollama instance via custom cloud model (proxy)
  • Connection configuration inside the cloud model
    • Authentication (Bearer, Basic,...)
Originally created by @uvulpos on GitHub (Dec 8, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13381 Hey there, I want to have the opportunity to proxy traffic between two ollama instances so I can share one big graphics card, or work with less power mobile, without to reconfigure my application for development. Is there a possibility to allow that with the cloud models? --- ## Conversation Summary: - Possibility to connect ollama to ollama instance via custom cloud model (proxy) - Connection configuration inside the cloud model - Authentication (Bearer, Basic,...)
GiteaMirror added the feature request label 2026-05-04 23:24:58 -05:00
Author
Owner

@fcorneli commented on GitHub (Dec 8, 2025):

With possibility to add Bearer or Basic authentication headers.

<!-- gh-comment-id:3628917177 --> @fcorneli commented on GitHub (Dec 8, 2025): With possibility to add Bearer or Basic authentication headers.
Author
Owner

@rick-github commented on GitHub (Dec 8, 2025):

Some clarity on the goal would help with coming up with a solution.

  1. You want multiple ollama servers to use a one big graphics card: run multiple servers, they will share the card.
  2. You want to run multiple servers that use a cloud model: run multiple servers, they will share the cloud.
  3. You want to run your own models in Ollama Cloud. Not supported at this time.
  4. You want to run an ollama server offsite, and have multiple clients connect: run nginx/caddy to provide access control.
<!-- gh-comment-id:3628964538 --> @rick-github commented on GitHub (Dec 8, 2025): Some clarity on the goal would help with coming up with a solution. 1. You want multiple ollama servers to use a one big graphics card: run multiple servers, they will share the card. 2. You want to run multiple servers that use a cloud model: run multiple servers, they will share the cloud. 3. You want to run your own models in Ollama Cloud. Not supported at this time. 4. You want to run an ollama server offsite, and have multiple clients connect: run nginx/caddy to provide access control.
Author
Owner

@uvulpos commented on GitHub (Dec 8, 2025):

The idea is to share a big ollama gpu server for many devs, but in a confidential environment to save resources (like ollama cloud but self-hosted). You can create own cloud models that contain the configuration of the remote server and maybe even alternative models, in case the bigger model is not available due to no cell or something. So you can configure your ollama instance and it routes the request automatically and even stores authentication via keychain vault, windows vault etc.
.
Image
.
The usage could be guided to create that model, you can push it to a private company registry and after that you have to insert the bearer token once to authenticate (if required) and then just proxy all requests to the big gpu servers or internally if not available. I have a docker alike usage in mind.
.
Image
.
Having an ollama cluster sounds also interesting, but that's another issue

<!-- gh-comment-id:3629152003 --> @uvulpos commented on GitHub (Dec 8, 2025): The idea is to share a big ollama gpu server for many devs, but in a confidential environment to save resources (like ollama cloud but self-hosted). You can create own cloud models that contain the configuration of the remote server and maybe even alternative models, in case the bigger model is not available due to no cell or something. So you can configure your ollama instance and it routes the request automatically and even stores authentication via keychain vault, windows vault etc. . <img width="1016" height="596" alt="Image" src="https://github.com/user-attachments/assets/4a53b9a3-43f6-4a11-93b2-316f001119f6" /> . The usage could be guided to create that model, you can push it to a private company registry and after that you have to insert the bearer token once to authenticate (if required) and then just proxy all requests to the big gpu servers or internally if not available. I have a docker alike usage in mind. . <img width="804" height="413" alt="Image" src="https://github.com/user-attachments/assets/03483e02-aebc-4a31-9a7f-cfbdf5fa2a0c" /> . Having an ollama cluster sounds also interesting, but that's another issue
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#70896