[GH-ISSUE #12100] How is it possible to instantiate an ollama on linux using turbo mode AND queriable through the API? #54555

Closed
opened 2026-04-29 06:20:06 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @dblas on GitHub (Aug 27, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12100

Doc says turbo is available through CLI as well as through API.
That's right I tested both modes.
BUT, it seems impossible to use both AT THE SAME TIME. For automation for example.

Querying the API with turbo mode enabled works but cloud-served models are not live which is a good thing but unable to produce recent news.
For that a client must be used as a proxy.
How is it possible to do that?
Because when using a code (python for example or even curl) that points to the proxy, the proxy only answers in local mode (models doesn't exist) and not in turbo mode even if it was launched with turbo mode activated (OLLAMA_HOST=ollama.com ollama run gpt-oss:120b).

How can we resolve this mystery?

db

Originally created by @dblas on GitHub (Aug 27, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12100 Doc says turbo is available through CLI as well as through API. That's right I tested both modes. BUT, it seems impossible to use both AT THE SAME TIME. For automation for example. Querying the API with turbo mode enabled works but cloud-served models are not live which is a good thing but unable to produce recent news. For that a client must be used as a proxy. How is it possible to do that? Because when using a code (python for example or even curl) that points to the proxy, the proxy only answers in local mode (models doesn't exist) and not in turbo mode even if it was launched with turbo mode activated (OLLAMA_HOST=ollama.com ollama run gpt-oss:120b). How can we resolve this mystery? db
GiteaMirror added the feature request label 2026-04-29 06:20:06 -05:00
Author
Owner

@rick-github commented on GitHub (Aug 27, 2025):

I believe a proxy mode is being developed for the server. In the meantime, a simple script can enable access to both turbo and local servers, depending on the model.

#!/usr/bin/env python3

import ollama
import argparse
import os

class OllamaMerged(ollama.Client):
  turbo_models = []

  def __init__(self):
    self.turbo = ollama.Client(host="https://ollama.com",
                   headers={"Authorization": f"Bearer {os.getenv('OLLAMA_API_KEY', 'api-key')}"})
    self.turbo_models = [m.model for m in self.turbo.list().get("models", [])]
    self.local = ollama.Client(host=os.getenv("OLLAMA_HOST", "http://localhost:11434"))

  def chat(self, model, **kwargs):
    if model in self.turbo_models:
      return self.turbo.chat(model, **kwargs)
    return self.local.chat(model, **kwargs)
ollama = OllamaMerged()

messages=[{"role":"user", "content":"why is the sky blue?"}]

print(ollama.chat(model="qwen2.5:0.5b", messages=messages))
print(ollama.chat(model="gpt-oss:20b", messages=messages))
<!-- gh-comment-id:3229362679 --> @rick-github commented on GitHub (Aug 27, 2025): I believe a proxy mode is being developed for the server. In the meantime, a simple script can enable access to both turbo and local servers, depending on the model. ```python #!/usr/bin/env python3 import ollama import argparse import os class OllamaMerged(ollama.Client): turbo_models = [] def __init__(self): self.turbo = ollama.Client(host="https://ollama.com", headers={"Authorization": f"Bearer {os.getenv('OLLAMA_API_KEY', 'api-key')}"}) self.turbo_models = [m.model for m in self.turbo.list().get("models", [])] self.local = ollama.Client(host=os.getenv("OLLAMA_HOST", "http://localhost:11434")) def chat(self, model, **kwargs): if model in self.turbo_models: return self.turbo.chat(model, **kwargs) return self.local.chat(model, **kwargs) ollama = OllamaMerged() messages=[{"role":"user", "content":"why is the sky blue?"}] print(ollama.chat(model="qwen2.5:0.5b", messages=messages)) print(ollama.chat(model="gpt-oss:20b", messages=messages)) ```
Author
Owner

@dblas commented on GitHub (Aug 27, 2025):

ok, thank you Rick.
I understand the trick but it doesn't work the way I'm expecting.
Indeed, the script goes for turbo but doesn't use the local ollama server to interrogate the web in the meantime.
The result is "Note: My training data only goes up to September 2021, so I do not have access to real‑world events that occurred in August 2025"
An idea?
db

<!-- gh-comment-id:3229678113 --> @dblas commented on GitHub (Aug 27, 2025): ok, thank you Rick. I understand the trick but it doesn't work the way I'm expecting. Indeed, the script goes for turbo but doesn't use the local ollama server to interrogate the web in the meantime. The result is "Note: My training data only goes up to September 2021, so I do not have access to real‑world events that occurred in August 2025" An idea? db
Author
Owner

@rick-github commented on GitHub (Aug 27, 2025):

https://github.com/ollama/ollama/issues/11749

<!-- gh-comment-id:3229688099 --> @rick-github commented on GitHub (Aug 27, 2025): https://github.com/ollama/ollama/issues/11749
Author
Owner

@dblas commented on GitHub (Aug 27, 2025):

Ok, it's WIP then.
Thank you very much.
db

<!-- gh-comment-id:3229719837 --> @dblas commented on GitHub (Aug 27, 2025): Ok, it's WIP then. Thank you very much. db
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#54555