[GH-ISSUE #6544] Specifying options via openai client extra_body are not handled by ollama #50628

Closed
opened 2026-04-28 16:38:03 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @gaardhus on GitHub (Aug 28, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6544

What is the issue?

So I've been trying to set the num_ctx for the mistral-nemo through the openai api client, however it does not seem to have an effect.

        # Ollama client
        client = OllamaAsyncClient(host=base_url, **kwargs)
        chat_completion = await client.chat(
            messages=messages,
            model=model,
            stream=stream,
            options={
                "num_ctx": 64_000,
                "temperature": temperature,
                "stop": stop_tokens,
            },
            **kwargs,
        )
        message = chat_completion["message"]["content"].strip()
        
        # OpenAI client
        client = AsyncOpenAI(api_key=api_key, base_url=base_url, **kwargs)
        chat_completion = await client.chat.completions.create(
            messages=messages,
            model=model,
            temperature=temperature,
            stream=stream,
            stop=stop_tokens,
            extra_body={"options": {"num_ctx": 64_000}},
            **kwargs,
        )
        message = chat_completion.choices[0].message.content.strip()

Doing it with ollama works, but the extra_body argument from openai seems to be handled as an extra field rather than merged with the rest of the request.

'model': 'mistral-nemo:12b-instruct-2407-q8_0', 'stop': None, 'stream': False, 'temperature': 0.7}, 'extra_json': {'options': {'num_ctx': 64000}}}

I guess the solution would be to unpack the extra_json field on the server end?

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.3.8

Originally created by @gaardhus on GitHub (Aug 28, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6544 ### What is the issue? So I've been trying to set the num_ctx for the mistral-nemo through the openai api client, however it does not seem to have an effect. ```python # Ollama client client = OllamaAsyncClient(host=base_url, **kwargs) chat_completion = await client.chat( messages=messages, model=model, stream=stream, options={ "num_ctx": 64_000, "temperature": temperature, "stop": stop_tokens, }, **kwargs, ) message = chat_completion["message"]["content"].strip() # OpenAI client client = AsyncOpenAI(api_key=api_key, base_url=base_url, **kwargs) chat_completion = await client.chat.completions.create( messages=messages, model=model, temperature=temperature, stream=stream, stop=stop_tokens, extra_body={"options": {"num_ctx": 64_000}}, **kwargs, ) message = chat_completion.choices[0].message.content.strip() ``` Doing it with ollama works, but the extra_body argument from openai seems to be handled as an extra field rather than merged with the rest of the request. ``` 'model': 'mistral-nemo:12b-instruct-2407-q8_0', 'stop': None, 'stream': False, 'temperature': 0.7}, 'extra_json': {'options': {'num_ctx': 64000}}} ``` I guess the solution would be to unpack the extra_json field on the server end? ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.3.8
GiteaMirror added the bug label 2026-04-28 16:38:03 -05:00
Author
Owner

@rick-github commented on GitHub (Aug 28, 2024):

ollama sticks to the OpenAI API spec which doesn't allow changing the context window size via num_ctx. There is a pending PR (https://github.com/ollama/ollama/pull/6504) which allows changing the size via max_tokens.

<!-- gh-comment-id:2316154057 --> @rick-github commented on GitHub (Aug 28, 2024): ollama sticks to the OpenAI API spec which doesn't allow changing the context window size via `num_ctx`. There is a pending PR (https://github.com/ollama/ollama/pull/6504) which allows changing the size via `max_tokens`.
Author
Owner

@gaardhus commented on GitHub (Aug 28, 2024):

Ah sweet, that would solve my issue. However I would expect there are other parameters/options which can't be specified via the openai client, so wouldn't it be easier to use the extra_body(/json) field for these things? Or would you rather create per use integrations?

<!-- gh-comment-id:2316197041 --> @gaardhus commented on GitHub (Aug 28, 2024): Ah sweet, that would solve my issue. However I would expect there are other parameters/options which can't be specified via the openai client, so wouldn't it be easier to use the extra_body(/json) field for these things? Or would you rather create per use integrations?
Author
Owner

@rick-github commented on GitHub (Aug 28, 2024):

extra_body is not mentioned in the Open AI spec, it appears to be an addition to the python bindings, which copies the contents into a extra_json field and then calls build_request and then I gave up tracing the code.

<!-- gh-comment-id:2316227921 --> @rick-github commented on GitHub (Aug 28, 2024): `extra_body` is not mentioned in the [Open AI spec](https://platform.openai.com/docs/api-reference/introduction), it appears to be an addition to the python bindings, which copies the contents into a `extra_json` field and then calls `build_request` and then I gave up tracing the code.
Author
Owner

@jmorganca commented on GitHub (Sep 4, 2024):

Hi @gaardhus thanks for the issue. As mentioned by @rick-github Ollama's OpenAI compatible endpoints try as much as possible to adhere to their OpenAI spec. For extending the context window, the currently supported method is by creating a model with the custom num_ctx (working on easier ways than this).

Create a Modelfile:

FROM <model>

PARAMETER num_ctx 64000

Then

ollama create <new-model>

Then you can use <new-model>

<!-- gh-comment-id:2329044557 --> @jmorganca commented on GitHub (Sep 4, 2024): Hi @gaardhus thanks for the issue. As mentioned by @rick-github Ollama's OpenAI compatible endpoints try as much as possible to adhere to their OpenAI spec. For extending the context window, the currently supported method is by creating a model with the custom `num_ctx` (working on easier ways than this). Create a `Modelfile`: ``` FROM <model> PARAMETER num_ctx 64000 ``` Then ``` ollama create <new-model> ``` Then you can use `<new-model>`
Author
Owner

@devilteo911 commented on GitHub (Sep 5, 2024):

I'm not sure if this issue should be closed, as this feels more like a workaround than a solution. @jmorganca, I found this issue through your PR, which seems like a more comprehensive approach. Hopefully, it will get merged soon!

<!-- gh-comment-id:2332681906 --> @devilteo911 commented on GitHub (Sep 5, 2024): I'm not sure if this issue should be closed, as this feels more like a workaround than a solution. @jmorganca, I found this issue through your PR, which seems like a more comprehensive approach. Hopefully, it will get merged soon!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#50628