[GH-ISSUE #2722] How can I specify the context window size using OpenAI compatible API? #1633

Closed
opened 2026-04-12 11:34:17 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @egoist on GitHub (Feb 24, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2722

I wonder if there's a way to apply context size to https://github.com/ollama/ollama/blob/main/docs/openai.md

Originally created by @egoist on GitHub (Feb 24, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2722 I wonder if there's a way to apply context size to https://github.com/ollama/ollama/blob/main/docs/openai.md
GiteaMirror added the question label 2026-04-12 11:34:17 -05:00
Author
Owner

@chigkim commented on GitHub (Jun 26, 2024):

Did you find an answer?
I thought I might get luck with max_tokens parameter, but it doesn't seem to do anything according to log.
It doesn't set --num-predict for llama.cpp either.

<!-- gh-comment-id:2191587435 --> @chigkim commented on GitHub (Jun 26, 2024): Did you find an answer? I thought I might get luck with max_tokens parameter, but it doesn't seem to do anything according to log. It doesn't set --num-predict for llama.cpp either.
Author
Owner

@pdevine commented on GitHub (Jul 18, 2024):

Hey guys, sorry for the slow response. You should be able to use the max_tokens field to control the context size.

<!-- gh-comment-id:2237739427 --> @pdevine commented on GitHub (Jul 18, 2024): Hey guys, sorry for the slow response. You should be able to use the `max_tokens` field to control the context size.
Author
Owner

@chigkim commented on GitHub (Jul 19, 2024):

If max_tokens controls the context size, how can we control num_predict with OpenAI API?
Isn't max_tokens on OpenAI API supposed to control how many tokens to generate (num_predict on Ollama, --predict in llama.cpp), not the context size (num_ctx on Ollama, --ctx-size in llama.cpp)?
OpenAI API Docs: "The maximum number of tokens that can be generated in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length."

<!-- gh-comment-id:2238282320 --> @chigkim commented on GitHub (Jul 19, 2024): If max_tokens controls the context size, how can we control num_predict with OpenAI API? Isn't max_tokens on OpenAI API supposed to control how many tokens to generate (num_predict on Ollama, --predict in llama.cpp), not the context size (num_ctx on Ollama, --ctx-size in llama.cpp)? [OpenAI API Docs](https://platform.openai.com/docs/api-reference/chat): "The maximum number of tokens that can be generated in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length."
Author
Owner

@hellopahe commented on GitHub (Jul 23, 2024):

My question is the same to @chigkim 's, It seems like ollama openapi server will truncate input length to default n_ctx=2048, no matter request parameter was brought either by max_tokens or num_ctx.

<!-- gh-comment-id:2244960612 --> @hellopahe commented on GitHub (Jul 23, 2024): My question is the same to @chigkim 's, It seems like ollama openapi server will truncate input length to default `n_ctx=2048`, no matter request parameter was brought either by `max_tokens` or `num_ctx`.
Author
Owner

@de-code commented on GitHub (Feb 10, 2025):

For anyone coming here looking for the answer. Copying from the aforementioned ollama openai docs:

The OpenAI API does not have a way of setting the context size for a model.

(A PR that that would have added the option of setting the context lengthy via the API was closed)

The current way to change the context length is by creating a model (e.g. via the Modelfile mentioned in the docs)

<!-- gh-comment-id:2647766433 --> @de-code commented on GitHub (Feb 10, 2025): For anyone coming here looking for the answer. Copying from the aforementioned [ollama openai docs](https://github.com/ollama/ollama/blob/main/docs/openai.md): > The OpenAI API does not have a way of setting the context size for a model. (A [PR that that would have added the option of setting the context lengthy via the API](https://github.com/ollama/ollama/pull/8672) was closed) The current way to change the context length is by creating a model (e.g. via the Modelfile mentioned in the docs)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#1633