[GH-ISSUE #11325] Support top_k and min_p parameters in OpenAI-compatible API #7473

Closed
opened 2026-04-12 19:32:45 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @filips123 on GitHub (Jul 7, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11325

The official OpenAI API currently doesn't support setting top_k and min_p parameters, and this is also true for Ollama's OpenAI-compatible API. However, these parameters are useful in some cases, so it would be good to still support them in the API.

While Ollama's own API supports setting these parameters. other OpenAI-compatible servers, such as vLLM and llama.cpp's server, have support for them (along with some other parameters not supported in the official OpenAI API). So, to allow writing more generic AI clients that can work with different servers, it would be useful to also add them to Ollama's OpenAI-compatible API, even though they are not the official API parameters.

There also seem to be other parameters available in the Ollama's own API and in vLLM and llama.cpp's server that are not available in Ollama's OpenAI version. It probably makes to also add support for them.

Originally created by @filips123 on GitHub (Jul 7, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11325 The official OpenAI API currently doesn't support setting `top_k` and `min_p` parameters, and this is also true for Ollama's OpenAI-compatible API. However, these parameters are useful in some cases, so it would be good to still support them in the API. While Ollama's own API supports setting these parameters. other OpenAI-compatible servers, such as [vLLM](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html) and [llama.cpp's server](https://github.com/ggml-org/llama.cpp/blob/12f55c302b35cfe900b84c5fe67c262026af9c44/tools/server/server.cpp#L273-L297), have support for them (along with some other parameters not supported in the official OpenAI API). So, to allow writing more generic AI clients that can work with different servers, it would be useful to also add them to Ollama's OpenAI-compatible API, even though they are not the official API parameters. There also seem to be other parameters available in the Ollama's own API and in [vLLM](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html) and [llama.cpp's server](https://github.com/ggml-org/llama.cpp/blob/12f55c302b35cfe900b84c5fe67c262026af9c44/tools/server/server.cpp#L273-L297) that are not available in Ollama's OpenAI version. It probably makes to also add support for them.
GiteaMirror added the feature request label 2026-04-12 19:32:45 -05:00
Author
Owner

@rick-github commented on GitHub (Jul 7, 2025):

The policy of the ollama developers has been to stick to the published API. However, you can work around it be creating a copy of the model with the required parameters, and have clients use that model.

$ ollama run qwen2.5:0.5b
>>> /set parameter top_k 20
Set parameter 'top_k' to '20'
>>> /save qwen2.5:0.5b-topk20
Created new model 'qwen2.5:0.5b-topk20'
>>> /bye
$ ollama show qwen2.5:0.5b-topk20
  Model
    architecture        qwen2      
    parameters          494.03M    
    context length      32768      
    embedding length    896        
    quantization        Q4_K_M     

  Capabilities
    completion    
    tools         

  Parameters
    top_k    20    
$ curl -s localhost:11434/v1/chat/completions -d '{"model":"qwen2.5:0.5b-topk20","messages":[{"role":"user","content":"hello"}]}' | jq
{
  "id": "chatcmpl-545",
  "object": "chat.completion",
  "created": 1751923456,
  "model": "qwen2.5:0.5b-topk20",
  "system_fingerprint": "fp_ollama",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello, how can i assist you?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 30,
    "completion_tokens": 9,
    "total_tokens": 39
  }
}
<!-- gh-comment-id:3046564986 --> @rick-github commented on GitHub (Jul 7, 2025): The policy of the ollama developers has been to stick to the published API. However, you can work around it be creating a copy of the model with the required parameters, and have clients use that model. ```console $ ollama run qwen2.5:0.5b >>> /set parameter top_k 20 Set parameter 'top_k' to '20' >>> /save qwen2.5:0.5b-topk20 Created new model 'qwen2.5:0.5b-topk20' >>> /bye $ ollama show qwen2.5:0.5b-topk20 Model architecture qwen2 parameters 494.03M context length 32768 embedding length 896 quantization Q4_K_M Capabilities completion tools Parameters top_k 20 ``` ```console $ curl -s localhost:11434/v1/chat/completions -d '{"model":"qwen2.5:0.5b-topk20","messages":[{"role":"user","content":"hello"}]}' | jq { "id": "chatcmpl-545", "object": "chat.completion", "created": 1751923456, "model": "qwen2.5:0.5b-topk20", "system_fingerprint": "fp_ollama", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello, how can i assist you?" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 30, "completion_tokens": 9, "total_tokens": 39 } } ```
Author
Owner

@filips123 commented on GitHub (Jul 10, 2025):

Thanks for the suggestion. I'll probably do this for now. But considering that other OpenAI API providers support these extra parameters, I still think it would make sense to support them in Ollama.

<!-- gh-comment-id:3056891258 --> @filips123 commented on GitHub (Jul 10, 2025): Thanks for the suggestion. I'll probably do this for now. But considering that other OpenAI API providers support these extra parameters, I still think it would make sense to support them in Ollama.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#7473