[GH-ISSUE #13801] api/generate: running -cloud models through the local proxy fails #55553

Closed
opened 2026-04-29 09:24:04 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @drifkin on GitHub (Jan 20, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/13801

Originally assigned to: @drifkin on GitHub.

What is the issue?

Originally reported in #12370, h/t to @rick-github for the following repro:

$ prompt='<|start|>system<|message|>Talk like a pirate\n# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|><|start|>user<|message|>hello<|end|><|start|>assistant'
$ curl -s https://ollama.com/api/generate -H "Authorization: $OLLAMA_API_KEY" -d '{"model":"gpt-oss:20b","prompt":"'"$prompt"'","raw":true,"stream":false}' | jq 
{
  "model": "gpt-oss:20b",
  "created_at": "2025-09-22T18:49:33.835263946Z",
  "response": "<|channel|>analysis<|message|>User says \"hello\". They want to talk like a pirate.",
  "done": true,
  "total_duration": 224292188,
  "prompt_eval_count": 34,
  "eval_count": 16
}
$ curl -s localhost:11434/api/generate -H "Authorization: $OLLAMA_API_KEY" -d '{"model":"gpt-oss:20b-cloud","prompt":"'"$prompt"'","raw":true,"stream":false}' | jq
{
  "error": "400 Bad Request: raw mode does not support template, system, or context"
}

Relevant log output


OS

No response

GPU

No response

CPU

No response

Ollama version

0.14.2

Originally created by @drifkin on GitHub (Jan 20, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/13801 Originally assigned to: @drifkin on GitHub. ### What is the issue? Originally reported in #12370, h/t to @rick-github for the following repro: ``` $ prompt='<|start|>system<|message|>Talk like a pirate\n# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|><|start|>user<|message|>hello<|end|><|start|>assistant' $ curl -s https://ollama.com/api/generate -H "Authorization: $OLLAMA_API_KEY" -d '{"model":"gpt-oss:20b","prompt":"'"$prompt"'","raw":true,"stream":false}' | jq { "model": "gpt-oss:20b", "created_at": "2025-09-22T18:49:33.835263946Z", "response": "<|channel|>analysis<|message|>User says \"hello\". They want to talk like a pirate.", "done": true, "total_duration": 224292188, "prompt_eval_count": 34, "eval_count": 16 } $ curl -s localhost:11434/api/generate -H "Authorization: $OLLAMA_API_KEY" -d '{"model":"gpt-oss:20b-cloud","prompt":"'"$prompt"'","raw":true,"stream":false}' | jq { "error": "400 Bad Request: raw mode does not support template, system, or context" } ``` ### Relevant log output ```shell ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version 0.14.2
GiteaMirror added the bug label 2026-04-29 09:24:04 -05:00
Author
Owner

@webysther commented on GitHub (Feb 4, 2026):

There no workaround?

<!-- gh-comment-id:3848581773 --> @webysther commented on GitHub (Feb 4, 2026): There no workaround?
Author
Owner

@drifkin commented on GitHub (Feb 4, 2026):

There no workaround?

I'm reworking how this works in an upcoming change bundled with a few other things, apologies this is taking a while. Hard to pin down the exact time, but very likely in the next week or two. You can hit ollama.com/api/generate directly in the meantime

<!-- gh-comment-id:3849253941 --> @drifkin commented on GitHub (Feb 4, 2026): > There no workaround? I'm reworking how this works in an upcoming change bundled with a few other things, apologies this is taking a while. Hard to pin down the exact time, but very likely in the next week or two. You can hit ollama.com/api/generate directly in the meantime
Author
Owner

@webysther commented on GitHub (Feb 4, 2026):

There no workaround?

I'm reworking how this works in an upcoming change bundled with a few other things, apologies this is taking a while. Hard to pin down the exact time, but very likely in the next week or two. You can hit ollama.com/api/generate directly in the meantime

Actually what I tried to do was connect to litellm and over vs code to copilot, I don't control how the api is consumed.

Thank you, great project!

<!-- gh-comment-id:3849573386 --> @webysther commented on GitHub (Feb 4, 2026): > > There no workaround? > > I'm reworking how this works in an upcoming change bundled with a few other things, apologies this is taking a while. Hard to pin down the exact time, but very likely in the next week or two. You can hit ollama.com/api/generate directly in the meantime Actually what I tried to do was connect to litellm and over vs code to copilot, I don't control how the api is consumed. Thank you, great project!
Author
Owner

@goodslav commented on GitHub (Mar 2, 2026):

Any news on fix for this issue?

<!-- gh-comment-id:3985056837 --> @goodslav commented on GitHub (Mar 2, 2026): Any news on fix for this issue?
Author
Owner

@drifkin commented on GitHub (Mar 2, 2026):

Any news on fix for this issue?

very likely to come out this week, at least in an RC, if not final build!

from a local change I'm working on:

curl -s localhost:11434/api/generate -d '{"model":"gpt-oss:20b-cloud","prompt":"<|start|>system<|message|>Talk like a pirate\n# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|><|start|>user<|message|>hello<|end|><|start|>assistant","raw":true,"stream":false}'
{"model":"gpt-oss:20b","created_at":"2026-03-02T20:49:58.015015682Z","response":"Ahoy, matey! How be ye on this fine day? 🌊🏴‍☠️","thinking":"User says \"hello\". They want the pirate style. We need to respond with a pirate style, complying.","done":true,"done_reason":"stop","total_duration":4404566266,"prompt_eval_count":34,"eval_count":54}
<!-- gh-comment-id:3986838313 --> @drifkin commented on GitHub (Mar 2, 2026): > Any news on fix for this issue? very likely to come out this week, at least in an RC, if not final build! from a local change I'm working on: ``` curl -s localhost:11434/api/generate -d '{"model":"gpt-oss:20b-cloud","prompt":"<|start|>system<|message|>Talk like a pirate\n# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|><|start|>user<|message|>hello<|end|><|start|>assistant","raw":true,"stream":false}' {"model":"gpt-oss:20b","created_at":"2026-03-02T20:49:58.015015682Z","response":"Ahoy, matey! How be ye on this fine day? 🌊🏴‍☠️","thinking":"User says \"hello\". They want the pirate style. We need to respond with a pirate style, complying.","done":true,"done_reason":"stop","total_duration":4404566266,"prompt_eval_count":34,"eval_count":54} ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#55553