[GH-ISSUE #12779] api: native API should accept max_tokens as alias for num_predict (OpenAI compatibility) #34236

Closed
opened 2026-04-22 17:39:47 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @elazar on GitHub (Oct 25, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12779

What is the issue?

Problem

Ollama's native API endpoints (/api/chat, /api/generate) reject the max_tokens parameter, which is the standard OpenAI parameter name for controlling output length. When clients send requests with max_tokens in the options dict to these endpoints, Ollama logs:

level=WARN msg="invalid option provided" option=max_tokens

This creates an API inconsistency: Ollama's OpenAI-compatible endpoint (/v1/chat/completions) accepts max_tokens and converts it to num_predict, but the native endpoints reject it entirely.

Root Cause: The Options struct in api/types.go doesn't recognize max_tokens, only num_predict. The FromMap function (line 753) logs a warning for any unrecognized option key.

Current Behavior

Native endpoints (/api/chat, /api/generate):

  • Accepts num_predict (Ollama's native parameter)
  • Rejects max_tokens with warning (OpenAI's standard parameter)

OpenAI-compatible endpoint (/v1/chat/completions):

  • Accepts max_tokens → converts to num_predict internally
  • Works correctly with OpenAI-style requests

Result: Developers must remember which parameter name to use depending on which endpoint they're calling, and clients that use the native API see warning logs for legitimate requests.

Impact

While the warning is benign (requests complete successfully with the parameter ignored), it creates problems:

  1. Log noise: Warning messages clutter logs during normal operation
  2. API inconsistency: Different endpoints accept different parameter names
  3. Developer friction: Need to remember endpoint-specific parameter names
  4. Client complexity: Proxies/clients must implement endpoint-specific conversion logic
  5. Confusion: Developers expect OpenAI parameter compatibility but encounter warnings

Affected clients:

  • Open WebUI and other UI frontends
  • Direct API users expecting OpenAI compatibility
  • Custom integrations and scripts
  • Any client that doesn't implement parameter conversion

Proposed Solution

Accept max_tokens as an alias for num_predict in the native API's Options struct, matching the behavior already present in the OpenAI-compatible endpoint.

Implementation follows existing Ollama patterns for deprecated/alias fields:

1. Add MaxTokens field to Options struct (api/types.go:390-407):

type Options struct {
    Runner
    NumKeep          int      `json:"num_keep,omitempty"`
    Seed             int      `json:"seed,omitempty"`
    NumPredict       int      `json:"num_predict,omitempty"`

    // MaxTokens is an alias for NumPredict, provided for OpenAI compatibility.
    // When both are provided, NumPredict takes precedence.
    MaxTokens        int      `json:"max_tokens,omitempty"`

    TopK             int      `json:"top_k,omitempty"`
    // ... rest of fields
}

2. Handle conversion inline within existing FromMap method:

func (opts *Options) FromMap(m map[string]any) error {
    // ... existing field processing logic ...

    // Handle max_tokens alias for OpenAI compatibility
    // This matches the pattern used in openai/openai.go:552-553
    if opts.MaxTokens > 0 && opts.NumPredict == 0 {
        opts.NumPredict = opts.MaxTokens
    }

    return nil
}

Precedence behavior:

  • If only max_tokens provided → use its value for num_predict
  • If only num_predict provided → use its value (unchanged behavior)
  • If both provided → num_predict takes precedence (native parameter wins)

Rationale for Implementation Style

This approach follows existing Ollama code patterns:

  1. Deprecated/Alias Field Pattern: Similar to how Model/Name coexist in CreateRequest (lines 516-517)

    type CreateRequest struct {
        Model string `json:"model"`
        // Deprecated: set the model name with Model instead
        Name string `json:"name"`
    }
    

    Both fields are kept in the struct with a comment explaining their relationship.

  2. Inline Conversion Pattern: Matches existing OpenAI layer conversion (openai/openai.go:534-535)

    if r.MaxTokens != nil {
        options["num_predict"] = *r.MaxTokens
    }
    

    Conversion happens during request processing, not via separate methods.

  3. Precedence via Simple Conditionals: Uses straightforward if logic rather than helper methods

    if opts.MaxTokens > 0 && opts.NumPredict == 0 {
        opts.NumPredict = opts.MaxTokens
    }
    

Why this approach:

  • Consistent with established codebase patterns
  • No new methods introduced (avoids Normalize() pattern not used in Ollama)
  • Minimal code change (~20 lines total)
  • Easy to review and understand

Benefits

  • No breaking changes: Purely additive, existing code continues to work
  • API consistency: Native and OpenAI-compatible endpoints accept same parameters
  • Eliminates warnings: Valid requests no longer trigger log warnings
  • Developer experience: Familiar OpenAI parameter names work everywhere
  • Ecosystem benefit: All clients benefit (not just specific integrations)
  • Minimal change: ~20 lines of code change, straightforward implementation

Testing

Required test cases for api/types_test.go:

func TestOptionsMaxTokensAlias(t *testing.T) {
    tests := []struct {
        name        string
        input       map[string]any
        wantPredict int
    }{
        {
            name:        "only num_predict",
            input:       map[string]any{"num_predict": 100},
            wantPredict: 100,
        },
        {
            name:        "only max_tokens",
            input:       map[string]any{"max_tokens": 200},
            wantPredict: 200,
        },
        {
            name:        "both provided, num_predict takes precedence",
            input:       map[string]any{"num_predict": 100, "max_tokens": 200},
            wantPredict: 100,
        },
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            var opts Options
            if err := opts.FromMap(tt.input); err != nil {
                t.Fatalf("FromMap failed: %v", err)
            }
            if opts.NumPredict != tt.wantPredict {
                t.Errorf("got NumPredict=%d, want %d", opts.NumPredict, tt.wantPredict)
            }
        })
    }
}

Documentation

Update docs/faq.md or API documentation to mention the alias:

## Can I use `max_tokens` instead of `num_predict`?

Yes. The native API endpoints (`/api/chat`, `/api/generate`) accept `max_tokens` as an alias for `num_predict` for OpenAI compatibility.

If both parameters are provided, `num_predict` takes precedence.

Use Cases

1. Direct API Users:

# Now works without warnings
curl http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt": "Why is the sky blue?",
  "options": {
    "max_tokens": 100
  }
}'

2. Client Applications:
Open WebUI and similar proxies can pass through OpenAI-style parameters without conversion:

# Client code doesn't need endpoint-specific logic
payload = {
    "model": "llama2",
    "messages": messages,
    "max_tokens": 100  # Works with both endpoints now
}

3. Migration from OpenAI:
Developers can use familiar parameter names when switching from OpenAI to Ollama.

Related Issues

Related: #7125 proposes adding max_completion_tokens support to the OpenAI-compatible endpoint (/v1/chat/completions). While related (both improve OpenAI compatibility), these are distinct issues:

  • #7125: Feature request to add support for NEW OpenAI parameter (max_completion_tokens)
  • This issue: Bug/inconsistency where EXISTING OpenAI standard parameter (max_tokens) is rejected

Both could be addressed independently or coordinated if desired, but this issue should likely have higher urgency since it addresses current log warnings and API inconsistency.

Relevant log output

time=2025-10-25T10:30:45.123-05:00 level=INFO msg="POST /api/generate" status=200
time=2025-10-25T10:30:45.125-05:00 level=WARN msg="invalid option provided" option=max_tokens

OS

Linux

GPU

AMD

CPU

AMD

Ollama version

0.12.6

Originally created by @elazar on GitHub (Oct 25, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12779 ### What is the issue? # Problem Ollama's native API endpoints (`/api/chat`, `/api/generate`) reject the `max_tokens` parameter, which is the standard OpenAI parameter name for controlling output length. When clients send requests with `max_tokens` in the `options` dict to these endpoints, Ollama logs: ``` level=WARN msg="invalid option provided" option=max_tokens ``` This creates an **API inconsistency**: Ollama's OpenAI-compatible endpoint (`/v1/chat/completions`) accepts `max_tokens` and converts it to `num_predict`, but the native endpoints reject it entirely. **Root Cause:** The `Options` struct in [`api/types.go`](https://github.com/ollama/ollama/blob/v0.12.6/api/types.go#L390-L407) doesn't recognize `max_tokens`, only `num_predict`. The `FromMap` function ([line 753](https://github.com/ollama/ollama/blob/v0.12.6/api/types.go#L753)) logs a warning for any unrecognized option key. # Current Behavior **Native endpoints** (`/api/chat`, `/api/generate`): - ✅ Accepts `num_predict` (Ollama's native parameter) - ❌ Rejects `max_tokens` with warning (OpenAI's standard parameter) **OpenAI-compatible endpoint** (`/v1/chat/completions`): - ✅ Accepts `max_tokens` → converts to `num_predict` internally - ✅ Works correctly with OpenAI-style requests **Result:** Developers must remember which parameter name to use depending on which endpoint they're calling, and clients that use the native API see warning logs for legitimate requests. # Impact While the warning is benign (requests complete successfully with the parameter ignored), it creates problems: 1. **Log noise**: Warning messages clutter logs during normal operation 2. **API inconsistency**: Different endpoints accept different parameter names 3. **Developer friction**: Need to remember endpoint-specific parameter names 4. **Client complexity**: Proxies/clients must implement endpoint-specific conversion logic 5. **Confusion**: Developers expect OpenAI parameter compatibility but encounter warnings **Affected clients:** - Open WebUI and other UI frontends - Direct API users expecting OpenAI compatibility - Custom integrations and scripts - Any client that doesn't implement parameter conversion # Proposed Solution Accept `max_tokens` as an alias for `num_predict` in the native API's `Options` struct, matching the behavior already present in the OpenAI-compatible endpoint. **Implementation follows existing Ollama patterns for deprecated/alias fields:** **1. Add `MaxTokens` field to `Options` struct** ([`api/types.go:390-407`](https://github.com/ollama/ollama/blob/v0.12.6/api/types.go#L390-L407)): ```go type Options struct { Runner NumKeep int `json:"num_keep,omitempty"` Seed int `json:"seed,omitempty"` NumPredict int `json:"num_predict,omitempty"` // MaxTokens is an alias for NumPredict, provided for OpenAI compatibility. // When both are provided, NumPredict takes precedence. MaxTokens int `json:"max_tokens,omitempty"` TopK int `json:"top_k,omitempty"` // ... rest of fields } ``` **2. Handle conversion inline within existing `FromMap` method:** ```go func (opts *Options) FromMap(m map[string]any) error { // ... existing field processing logic ... // Handle max_tokens alias for OpenAI compatibility // This matches the pattern used in openai/openai.go:552-553 if opts.MaxTokens > 0 && opts.NumPredict == 0 { opts.NumPredict = opts.MaxTokens } return nil } ``` **Precedence behavior:** - If only `max_tokens` provided → use its value for `num_predict` - If only `num_predict` provided → use its value (unchanged behavior) - If both provided → `num_predict` takes precedence (native parameter wins) # Rationale for Implementation Style This approach follows existing Ollama code patterns: 1. **Deprecated/Alias Field Pattern**: Similar to how `Model`/`Name` coexist in [`CreateRequest`](https://github.com/ollama/ollama/blob/v0.12.6/api/types.go#L473-L520) ([lines 516-517](https://github.com/ollama/ollama/blob/v0.12.6/api/types.go#L516-L517)) ```go type CreateRequest struct { Model string `json:"model"` // Deprecated: set the model name with Model instead Name string `json:"name"` } ``` Both fields are kept in the struct with a comment explaining their relationship. 2. **Inline Conversion Pattern**: Matches existing OpenAI layer conversion ([`openai/openai.go:534-535`](https://github.com/ollama/ollama/blob/v0.12.6/openai/openai.go#L534-L535)) ```go if r.MaxTokens != nil { options["num_predict"] = *r.MaxTokens } ``` Conversion happens during request processing, not via separate methods. 3. **Precedence via Simple Conditionals**: Uses straightforward `if` logic rather than helper methods ```go if opts.MaxTokens > 0 && opts.NumPredict == 0 { opts.NumPredict = opts.MaxTokens } ``` **Why this approach:** - ✅ Consistent with established codebase patterns - ✅ No new methods introduced (avoids `Normalize()` pattern not used in Ollama) - ✅ Minimal code change (~20 lines total) - ✅ Easy to review and understand # Benefits - ✅ **No breaking changes**: Purely additive, existing code continues to work - ✅ **API consistency**: Native and OpenAI-compatible endpoints accept same parameters - ✅ **Eliminates warnings**: Valid requests no longer trigger log warnings - ✅ **Developer experience**: Familiar OpenAI parameter names work everywhere - ✅ **Ecosystem benefit**: All clients benefit (not just specific integrations) - ✅ **Minimal change**: ~20 lines of code change, straightforward implementation # Testing Required test cases for `api/types_test.go`: ```go func TestOptionsMaxTokensAlias(t *testing.T) { tests := []struct { name string input map[string]any wantPredict int }{ { name: "only num_predict", input: map[string]any{"num_predict": 100}, wantPredict: 100, }, { name: "only max_tokens", input: map[string]any{"max_tokens": 200}, wantPredict: 200, }, { name: "both provided, num_predict takes precedence", input: map[string]any{"num_predict": 100, "max_tokens": 200}, wantPredict: 100, }, } for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { var opts Options if err := opts.FromMap(tt.input); err != nil { t.Fatalf("FromMap failed: %v", err) } if opts.NumPredict != tt.wantPredict { t.Errorf("got NumPredict=%d, want %d", opts.NumPredict, tt.wantPredict) } }) } } ``` # Documentation Update `docs/faq.md` or API documentation to mention the alias: ```markdown ## Can I use `max_tokens` instead of `num_predict`? Yes. The native API endpoints (`/api/chat`, `/api/generate`) accept `max_tokens` as an alias for `num_predict` for OpenAI compatibility. If both parameters are provided, `num_predict` takes precedence. ``` # Use Cases **1. Direct API Users:** ```bash # Now works without warnings curl http://localhost:11434/api/generate -d '{ "model": "llama2", "prompt": "Why is the sky blue?", "options": { "max_tokens": 100 } }' ``` **2. Client Applications:** Open WebUI and similar proxies can pass through OpenAI-style parameters without conversion: ```python # Client code doesn't need endpoint-specific logic payload = { "model": "llama2", "messages": messages, "max_tokens": 100 # Works with both endpoints now } ``` **3. Migration from OpenAI:** Developers can use familiar parameter names when switching from OpenAI to Ollama. # Related Issues **Related:** #7125 proposes adding `max_completion_tokens` support to the OpenAI-compatible endpoint (`/v1/chat/completions`). While related (both improve OpenAI compatibility), these are distinct issues: - **#7125**: Feature request to add support for NEW OpenAI parameter (`max_completion_tokens`) - **This issue**: Bug/inconsistency where EXISTING OpenAI standard parameter (`max_tokens`) is rejected Both could be addressed independently or coordinated if desired, but this issue should likely have higher urgency since it addresses current log warnings and API inconsistency. ### Relevant log output ```shell time=2025-10-25T10:30:45.123-05:00 level=INFO msg="POST /api/generate" status=200 time=2025-10-25T10:30:45.125-05:00 level=WARN msg="invalid option provided" option=max_tokens ``` ### OS Linux ### GPU AMD ### CPU AMD ### Ollama version 0.12.6
GiteaMirror added the bug label 2026-04-22 17:39:47 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#34236