[PR #11249] feat: expose Ollama native parameters (think, keep_alive) through OpenAI API #13493

Open
opened 2026-04-13 00:28:47 -05:00 by GiteaMirror · 0 comments
Owner

Original Pull Request: https://github.com/ollama/ollama/pull/11249

State: open
Merged: No


Expose Ollama native parameters through OpenAI API

Problem

Users of the OpenAI-compatible API have been unable to access Ollama's native parameters like think and keep_alive, which are available in the native Ollama API. This has been a long-standing limitation that forces users to choose between OpenAI compatibility and Ollama-specific features.

This is particularly problematic when using libraries like Pydantic AI that only wrap Ollama through the OpenAI API, leaving users without access to important Ollama-specific controls for model behavior and performance tuning.

Related issues:

  • #2963 - Request to provide options in OpenAI compatibility endpoints
  • #11012 - Enable/disable thinking for openai-python clients
  • #3645 - keep_alive parameter not working in OpenAI API

Solution

This PR adds an options field to both ChatCompletionRequest and CompletionRequest that allows passing Ollama native parameters through the OpenAI-compatible endpoints.

What's implemented:

  • think parameter: Enable thinking/reasoning mode for supported models
  • keep_alive parameter: Control model memory duration with proper Duration parsing

The implementation extracts these parameters from the options map and sets them on the underlying Ollama API requests, then removes them from the options to avoid conflicts.

Usage examples:

Python (openai-python):

completion = client.chat.completions.create(
    model="llama3.2",
    messages=[{"role": "user", "content": "Explain quantum computing"}],
    extra_body={
        "options": {
            "think": True,
            "keep_alive": "30m"
        }
    }
)

cURL:

curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "llama3.2",
        "messages": [{"role": "user", "content": "Hello"}],
        "options": {
            "think": true,
            "keep_alive": "5m"
        }
    }'

Implementation details

  • Proper Duration parsing that handles both string formats ("5m", "1h") and numeric values (seconds)
  • Maintains backward compatibility - existing requests work unchanged
  • Added comprehensive test coverage for both chat and completion endpoints
  • Updated documentation with examples in Python, JavaScript, and cURL

Future extensibility

This implementation establishes a foundation for exposing additional Ollama parameters through the options field. Other native parameters that could be implemented using this same pattern include:

Generation parameters:

  • num_ctx - Context window size
  • num_predict - Maximum tokens to predict
  • seed - Random seed for reproducible outputs
  • stop - Custom stop sequences
  • top_k, top_p, min_p - Advanced sampling controls
  • repeat_penalty, repeat_last_n - Repetition control

System parameters:

  • num_gpu - GPU layer allocation
  • num_thread - Thread count
  • num_batch - Batch size

This change makes the OpenAI API more feature-complete while maintaining full compatibility with existing OpenAI clients, and provides a clear path for future parameter support.

**Original Pull Request:** https://github.com/ollama/ollama/pull/11249 **State:** open **Merged:** No --- # Expose Ollama native parameters through OpenAI API ## Problem Users of the OpenAI-compatible API have been unable to access Ollama's native parameters like `think` and `keep_alive`, which are available in the native Ollama API. This has been a long-standing limitation that forces users to choose between OpenAI compatibility and Ollama-specific features. This is particularly problematic when using libraries like [Pydantic AI](https://ai.pydantic.dev/models/openai/#ollama) that only wrap Ollama through the OpenAI API, leaving users without access to important Ollama-specific controls for model behavior and performance tuning. Related issues: - #2963 - Request to provide `options` in OpenAI compatibility endpoints - #11012 - Enable/disable thinking for openai-python clients - #3645 - `keep_alive` parameter not working in OpenAI API ## Solution This PR adds an `options` field to both `ChatCompletionRequest` and `CompletionRequest` that allows passing Ollama native parameters through the OpenAI-compatible endpoints. ### What's implemented: - **`think` parameter**: Enable thinking/reasoning mode for supported models - **`keep_alive` parameter**: Control model memory duration with proper Duration parsing The implementation extracts these parameters from the `options` map and sets them on the underlying Ollama API requests, then removes them from the options to avoid conflicts. ### Usage examples: **Python (openai-python)**: ```python completion = client.chat.completions.create( model="llama3.2", messages=[{"role": "user", "content": "Explain quantum computing"}], extra_body={ "options": { "think": True, "keep_alive": "30m" } } ) ``` **cURL**: ```bash curl http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "llama3.2", "messages": [{"role": "user", "content": "Hello"}], "options": { "think": true, "keep_alive": "5m" } }' ``` ## Implementation details - Proper Duration parsing that handles both string formats ("5m", "1h") and numeric values (seconds) - Maintains backward compatibility - existing requests work unchanged - Added comprehensive test coverage for both chat and completion endpoints - Updated documentation with examples in Python, JavaScript, and cURL ## Future extensibility This implementation establishes a foundation for exposing additional Ollama parameters through the `options` field. Other native parameters that could be implemented using this same pattern include: **Generation parameters**: - `num_ctx` - Context window size - `num_predict` - Maximum tokens to predict - `seed` - Random seed for reproducible outputs - `stop` - Custom stop sequences - `top_k`, `top_p`, `min_p` - Advanced sampling controls - `repeat_penalty`, `repeat_last_n` - Repetition control **System parameters**: - `num_gpu` - GPU layer allocation - `num_thread` - Thread count - `num_batch` - Batch size This change makes the OpenAI API more feature-complete while maintaining full compatibility with existing OpenAI clients, and provides a clear path for future parameter support.
GiteaMirror added the pull-request label 2026-04-13 00:28:47 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#13493