[GH-ISSUE #2723] Updating max_tokens for LLM by OpenAI library doesn't work #27395

Closed
opened 2026-04-22 04:43:11 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @shashade2012 on GitHub (Feb 24, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2723

I need to adjust the default token limit for my Large Language Model (LLM). Currently, I’m using Ollama with the Mistral model and have created two clients—one using the Ollama Python library and the other using the OpenAI library. Specifically, I want to increase the default maximum token limit to handle longer prompts. When I attempted to update the options parameter ‘num_ctx’ in the Ollama Python Library, it worked successfully.”
response = client.chat(model=MODEL, options = {"num_ctx": 2048}, messages=messages, stream = False,)
But it didn't work when try OpenAI library to update max_tokens.
response = client.chat.completions.create(model=MODEL, messages=messages, max_tokens = 2048, stream = False,)

But according to the following reference, it seems max_tokens is supported.
https://github.com/ollama/ollama/blob/main/docs/openai.md#supported-request-fields

Please help check if really support updating max_tokens for OpenAI library. Or is there any other way to update tokens limits for OpenAI?

Thanks.
BRs
Bruce

Originally created by @shashade2012 on GitHub (Feb 24, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2723 I need to adjust the default token limit for my Large Language Model (LLM). Currently, I’m using Ollama with the Mistral model and have created two clients—one using the Ollama Python library and the other using the OpenAI library. Specifically, I want to increase the default maximum token limit to handle longer prompts. When I attempted to update the options parameter ‘num_ctx’ in the Ollama Python Library, it worked successfully.” `response = client.chat(model=MODEL, options = {"num_ctx": 2048}, messages=messages, stream = False,)` But it didn't work when try OpenAI library to update max_tokens. `response = client.chat.completions.create(model=MODEL, messages=messages, max_tokens = 2048, stream = False,)` But according to the following reference, it seems max_tokens is supported. https://github.com/ollama/ollama/blob/main/docs/openai.md#supported-request-fields Please help check if really support updating max_tokens for OpenAI library. Or is there any other way to update tokens limits for OpenAI? Thanks. BRs Bruce
Author
Owner

@Malnes commented on GitHub (Mar 12, 2024):

Same.

<!-- gh-comment-id:1991645683 --> @Malnes commented on GitHub (Mar 12, 2024): Same.
Author
Owner

@jmorganca commented on GitHub (Mar 13, 2024):

Hi there, sorry this isn't easy to do today. I'll merge this with #2963

<!-- gh-comment-id:1993336462 --> @jmorganca commented on GitHub (Mar 13, 2024): Hi there, sorry this isn't easy to do today. I'll merge this with #2963
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#27395