[GH-ISSUE #8354] Dynamic context size in OpenAI API compatibility. #5356

Closed
opened 2026-04-12 16:33:42 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @x0wllaar on GitHub (Jan 9, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8354

I noticed that the issue (#5356 ) regarding dynamically setting the context size (num_ctx) in OpenAI API was closed with a note saying it wasn't possible due to limitations of the API. However, I'd like to reopen this discussion as there seems to be a using the extra_body parameter available in the OpenAI API clients. This parameter allows to pass arbitrary data/parameters to the endpoint, and will be useful here.

It should be possible to pass additional parameters through the API call with the extra_body option, as shown below:

client.chat.completions.create(
    model="phi4",
    messages=messages,
    extra_body={"num_ctx": 16384},
)

See https://github.com/openai/openai-python/blob/main/src/openai/resources/chat/completions.py#L102 for normal and https://github.com/openai/openai-python/blob/main/src/openai/resources/beta/chat/completions.py#L101 for structured completion functions.

I would also like to note that other implementations of OpenAI API already use this approach, for example, here's vLLM: https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#extra-parameters

I'm willing to work on a pull request if you think this approach is feasible. There already is a similar PR #5357, I'm ready to help getting it to a mergeable state.

Thank you so much for all your work on Ollama!

Originally created by @x0wllaar on GitHub (Jan 9, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8354 I noticed that the issue (#5356 ) regarding dynamically setting the context size (num_ctx) in OpenAI API was closed with a note saying it wasn't possible due to limitations of the API. However, I'd like to reopen this discussion as there seems to be a using the extra_body parameter available in the OpenAI API clients. This parameter allows to pass arbitrary data/parameters to the endpoint, and will be useful here. It should be possible to pass additional parameters through the API call with the extra_body option, as shown below: ``` client.chat.completions.create( model="phi4", messages=messages, extra_body={"num_ctx": 16384}, ) ``` See https://github.com/openai/openai-python/blob/main/src/openai/resources/chat/completions.py#L102 for normal and https://github.com/openai/openai-python/blob/main/src/openai/resources/beta/chat/completions.py#L101 for structured completion functions. I would also like to note that other implementations of OpenAI API already use this approach, for example, here's vLLM: https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#extra-parameters I'm willing to work on a pull request if you think this approach is feasible. There already is a similar PR #5357, I'm ready to help getting it to a mergeable state. Thank you so much for all your work on Ollama!
GiteaMirror added the feature request label 2026-04-12 16:33:42 -05:00
Author
Owner

@rick-github commented on GitHub (Jan 9, 2025):

https://github.com/ollama/ollama/pull/6504

<!-- gh-comment-id:2578965503 --> @rick-github commented on GitHub (Jan 9, 2025): https://github.com/ollama/ollama/pull/6504
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#5356