[GH-ISSUE #12636] OpenAI API Raw Completions does not seem to work properly #54901

Open
opened 2026-04-29 07:55:07 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @Nixellion on GitHub (Oct 15, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12636

What is the issue?

I think the OpenAI API Raw Completions does not work properly. It works like the "chat completion" API. The model responds as if it received a user message within a chat endpoint.

For example, when sending this request to raw completions API:

{'prompt': 'I think ', 'stream': False, 'temperature': 0.1, 'top_p': 0.9, 'top_k': 20, 'repetition_penalty': 1.15, 'max_tokens': 100, 'model': 'anymodel'}

The expected output would be the model to continue the sentence. Instead the reply that I get is:

It sounds like you're about to share something important. I'm here to listen and support you in any way I can. Please go ahead and tell me what's on your mind—I'm all ears!

Which indicates that internally in Ollama the /v1/completions API gets processed into a chat template. Which should not be happening. The /v1/completions API should be equal to using the raw: True parameter with Ollama API.

By the way, adding raw: True to the payload does not affect this (and frankly is not up to OpenAPI spec and might break other tools\server or schemas if added).

However the responses come as I would expect when I test the same prompts with the official Ollama Python API, using client.generate with raw=True parameter, or the equivalent curl command.

From my very limited understanding of Ollama's code and architecture, perhaps a options["raw"] = True in openai.go's func FromCompleteRequest is what's missing?

Relevant log output


OS

Linux

GPU

Irrelevant

CPU

Irrelevant

Ollama version

0.12.3

Originally created by @Nixellion on GitHub (Oct 15, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12636 ### What is the issue? I think the OpenAI API Raw Completions does not work properly. It works like the "chat completion" API. The model responds as if it received a user message within a chat endpoint. For example, when sending this request to raw completions API: `{'prompt': 'I think ', 'stream': False, 'temperature': 0.1, 'top_p': 0.9, 'top_k': 20, 'repetition_penalty': 1.15, 'max_tokens': 100, 'model': 'anymodel'}` The expected output would be the model to continue the sentence. Instead the reply that I get is: `It sounds like you're about to share something important. I'm here to listen and support you in any way I can. Please go ahead and tell me what's on your mind—I'm all ears!` Which indicates that internally in Ollama the `/v1/completions` API gets processed into a chat template. Which should not be happening. The `/v1/completions` API should be equal to using the raw: True parameter with Ollama API. By the way, adding `raw: True` to the payload does not affect this (and frankly is not up to OpenAPI spec and might break other tools\server or schemas if added). However the responses come as I would expect when I test the same prompts with the official Ollama Python API, using client.generate with raw=True parameter, or the equivalent curl command. From my very limited understanding of Ollama's code and architecture, perhaps a `options["raw"] = True` in `openai.go`'s `func FromCompleteRequest` is what's missing? ### Relevant log output ```shell ``` ### OS Linux ### GPU Irrelevant ### CPU Irrelevant ### Ollama version 0.12.3
GiteaMirror added the bug label 2026-04-29 07:55:07 -05:00
Author
Owner

@Nixellion commented on GitHub (Mar 1, 2026):

Further testing revealed that ollama's own API's raw parameter also does not seem to quite work as described either.

What I end up having to do is create a separate "raw" version of the model, where I comment out RENDERER and PARSER attributes. This works as a workaround, but is annoying as I have to do it with every new model that I want to use in raw mode (which I prefer doing, because I prefer to control templating on my end).

<!-- gh-comment-id:3980302568 --> @Nixellion commented on GitHub (Mar 1, 2026): Further testing revealed that ollama's own API's `raw` parameter also does not seem to quite work as described either. What I end up having to do is create a separate "raw" version of the model, where I comment out `RENDERER `and `PARSER` attributes. This works as a workaround, but is annoying as I have to do it with every new model that I want to use in raw mode (which I prefer doing, because I prefer to control templating on my end).
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#54901