[GH-ISSUE #11458] keep_alive Parameter Ignored When Using OpenAI SDK with Ollama API #54079

Closed
opened 2026-04-29 05:11:30 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @alexander-fischer on GitHub (Jul 17, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11458

What is the issue?

When using the OpenAI SDK to call the Ollama /chat/completions endpoint and passing keep_alive as an extra argument in the request body, Ollama appears to ignore the provided value. Instead, it defaults to the standard 5 minutes, even if a different value (e.g., "24h") is explicitly set.

My code:

client = openai.OpenAI(base_url="http://localhost:11434/v1", api_key=LLM_API_KEY)

def stream_llm(instruction: str, model_id: str):
    return client.chat.completions.create(
        model=model_id,
        messages=[{"role": "user", "content": instruction}],
        temperature=0.1,
        extra_body={"keep_alive": "24h"},
        stream=True,
    )

Curl command to reproduce:

curl -X POST http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Whats the weather in London?"
      }
    ],
    "model": "gemma3:1b-it-qat",
    "temperature": 0.1,
    "keep_alive": "24h"
  }'

Expected behavior

The keep_alive parameter should be respected, and the session/model should be kept alive for the specified duration (e.g., 24 hours), not the default 5 minutes.

Actual behavior

Ollama defaults to the 5-minute keep_alive, ignoring the value provided in the request body.

Environment:
Ollama version: 0.9.6
OpenAI SDK version: 1.96.1
Operating System: MacOS

Relevant log output


OS

macOS

GPU

No response

CPU

No response

Ollama version

0.9.6

Originally created by @alexander-fischer on GitHub (Jul 17, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11458 ### What is the issue? When using the OpenAI SDK to call the Ollama /chat/completions endpoint and passing keep_alive as an extra argument in the request body, Ollama appears to ignore the provided value. Instead, it defaults to the standard 5 minutes, even if a different value (e.g., "24h") is explicitly set. My code: ```python client = openai.OpenAI(base_url="http://localhost:11434/v1", api_key=LLM_API_KEY) def stream_llm(instruction: str, model_id: str): return client.chat.completions.create( model=model_id, messages=[{"role": "user", "content": instruction}], temperature=0.1, extra_body={"keep_alive": "24h"}, stream=True, ) ``` Curl command to reproduce: ```bash curl -X POST http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Whats the weather in London?" } ], "model": "gemma3:1b-it-qat", "temperature": 0.1, "keep_alive": "24h" }' ``` **Expected behavior** The keep_alive parameter should be respected, and the session/model should be kept alive for the specified duration (e.g., 24 hours), not the default 5 minutes. **Actual behavior** Ollama defaults to the 5-minute keep_alive, ignoring the value provided in the request body. Environment: Ollama version: 0.9.6 OpenAI SDK version: 1.96.1 Operating System: MacOS ### Relevant log output ```shell ``` ### OS macOS ### GPU _No response_ ### CPU _No response_ ### Ollama version 0.9.6
GiteaMirror added the bug label 2026-04-29 05:11:30 -05:00
Author
Owner

@rick-github commented on GitHub (Jul 17, 2025):

#2963

<!-- gh-comment-id:3083765201 --> @rick-github commented on GitHub (Jul 17, 2025): #2963
Author
Owner

@onakatomi commented on GitHub (Nov 12, 2025):

Did you figure this out?

<!-- gh-comment-id:3524336521 --> @onakatomi commented on GitHub (Nov 12, 2025): Did you figure this out?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#54079