[GH-ISSUE #5013] How to prevent the model from automatically releasing after 5 minutes when requesting an OpenAI package? #3172

Closed
opened 2026-04-12 13:39:40 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @GoEnthusiast on GitHub (Jun 13, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5013

from openai import OpenAI

client = OpenAI(
base_url='http://localhost:11434/v1/',

# required but ignored
api_key='ollama',

)

chat_completion = client.chat.completions.create(
messages=[
{
'role': 'user',
'content': 'Say this is a test',
}
],
model='llama3',
)

In this code, how should I set the "keep alive": -1 request parameter to prevent the model from being released without a request for 5 minutes?

Originally created by @GoEnthusiast on GitHub (Jun 13, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5013 from openai import OpenAI client = OpenAI( base_url='http://localhost:11434/v1/', # required but ignored api_key='ollama', ) chat_completion = client.chat.completions.create( messages=[ { 'role': 'user', 'content': 'Say this is a test', } ], model='llama3', ) In this code, how should I set the "keep alive": -1 request parameter to prevent the model from being released without a request for 5 minutes?
GiteaMirror added the model label 2026-04-12 13:39:40 -05:00
Author
Owner

@JerrettDavis commented on GitHub (Jun 13, 2024):

The default keep_alive or Duration for Ollama is 5 minutes. To extend this you have to pass in a new keep_alive to the request. In your example you're using the OpenAI library, and this library unfortunately does not have support setting a custom timeout. As such, when the Ollama OpenAI middleware is used to process the request, the default 5m timeout is used for all requests.

If you need to drive this functionality separately, it should be possible to introduce a new environment variable (like OLLAMA_OPENAI_KEEPALIVE. The existing OpenAI middleware would need to be adjusted to properly set the keep alive based on the new environment variable here: c69bc19e46/openai/openai.go (L285-L317)

<!-- gh-comment-id:2165683264 --> @JerrettDavis commented on GitHub (Jun 13, 2024): The default `keep_alive` or `Duration` for Ollama is 5 minutes. To extend this you have to pass in a new `keep_alive` to the request. In your example you're using the OpenAI library, and this library unfortunately does not have support setting a custom timeout. As such, when the Ollama OpenAI middleware is used to process the request, the default `5m` timeout is used for all requests. If you need to drive this functionality separately, it should be possible to introduce a new environment variable (like `OLLAMA_OPENAI_KEEPALIVE`. The existing OpenAI middleware would need to be adjusted to properly set the keep alive based on the new environment variable here: https://github.com/ollama/ollama/blob/c69bc19e46bf40b24518444cd6754453ac41cdd0/openai/openai.go#L285-L317
Author
Owner

@GoEnthusiast commented on GitHub (Jun 20, 2024):

The default keep_alive or Duration for Ollama is 5 minutes. To extend this you have to pass in a new keep_alive to the request. In your example you're using the OpenAI library, and this library unfortunately does not have support setting a custom timeout. As such, when the Ollama OpenAI middleware is used to process the request, the default 5m timeout is used for all requests.

If you need to drive this functionality separately, it should be possible to introduce a new environment variable (like OLLAMA_OPENAI_KEEPALIVE. The existing OpenAI middleware would need to be adjusted to properly set the keep alive based on the new environment variable here:

c69bc19e46/openai/openai.go (L285-L317)

Okay, thank you very much for your guidance. I'll give it a try now. Thank you again

<!-- gh-comment-id:2179658606 --> @GoEnthusiast commented on GitHub (Jun 20, 2024): > The default `keep_alive` or `Duration` for Ollama is 5 minutes. To extend this you have to pass in a new `keep_alive` to the request. In your example you're using the OpenAI library, and this library unfortunately does not have support setting a custom timeout. As such, when the Ollama OpenAI middleware is used to process the request, the default `5m` timeout is used for all requests. > > If you need to drive this functionality separately, it should be possible to introduce a new environment variable (like `OLLAMA_OPENAI_KEEPALIVE`. The existing OpenAI middleware would need to be adjusted to properly set the keep alive based on the new environment variable here: > > https://github.com/ollama/ollama/blob/c69bc19e46bf40b24518444cd6754453ac41cdd0/openai/openai.go#L285-L317 Okay, thank you very much for your guidance. I'll give it a try now. Thank you again
Author
Owner

@pdevine commented on GitHub (Jul 9, 2024):

You can also call ollama serve with the OLLAMA_KEEP_ALIVE env variable. This is covered in the FAQ. I'll go ahead and close the issue.

<!-- gh-comment-id:2218137287 --> @pdevine commented on GitHub (Jul 9, 2024): You can also call `ollama serve` with the `OLLAMA_KEEP_ALIVE` env variable. This is covered in the [FAQ](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-keep-a-model-loaded-in-memory-or-make-it-unload-immediately). I'll go ahead and close the issue.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#3172