issue: Root-level max_tokens dropped instead of converted to num_predict (Regression from Feb 2025) #6767

New Issue

GiteaMirror · 2025-11-11T17:05:25-06:00

GiteaMirror commented

2025-11-11 17:05:25 -06:00

Originally created by @elazar on GitHub (Oct 25, 2025).

Check Existing Issues

I have searched for any existing and/or related issues.
I have searched for any existing and/or related discussions.
I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.6.33 (tested) / main branch (as of Oct 25, 2025)

Ollama Version (if applicable)

v0.5.11

Operating System

Linux (Ubuntu 22.04) - should also be reproduceable on Debian 12, macOS, and Windows

Browser (if applicable)

N/A (backend bug, affects all clients)

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

When sending OpenAI-style API requests with root-level max_tokens parameter to Open WebUI (which proxies to Ollama), the max_tokens parameter should be converted to Ollama's num_predict parameter and properly limit the output token length.

This behavior worked correctly prior to commit fea169a9c (February 19, 2025).

Actual Behavior

Root-level max_tokens parameter is completely dropped during payload conversion, resulting in:

Output length not being limited as requested by the user
Ollama warnings in logs (when OLLAMA_DEBUG=true): level=WARN msg="invalid option provided" option=max_tokens
Parameter silently ignored with no error feedback to the user

Steps to Reproduce

Prerequisites

Fresh Ubuntu 22.04 system
Docker Engine v24.0.5+ installed
Ollama v0.5.11+ running on host or separate container
Open WebUI v0.6.33 (main branch)

Setup

Install Ollama: curl -fsSL https://ollama.com/install.sh | sh
Pull model: ollama pull llama2
Run Open WebUI:

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -e OLLAMA_BASE_URL=http://host.docker.internal:11434 -v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:main

Reproduce Bug

Get API key from Settings → Account
Send request with root-level max_tokens:

curl -X POST http://localhost:3000/api/chat/completions -H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" -d '{"model": "llama2", "messages": [{"role": "user", "content": "Write a long essay"}], "max_tokens": 50, "stream": false}'

Check Ollama logs: journalctl -u ollama | grep max_tokens
Observe: Response NOT limited to 50 tokens, warning appears in logs

Logs & Screenshots

Browser Console Logs

No relevant errors in browser console - this is a backend payload conversion bug that occurs during server-side request processing before any response reaches the browser.

The client receives what appears to be a successful response, but the max_tokens parameter was dropped during Open WebUI's internal conversion to Ollama format, causing the warning in Ollama's logs.

Docker Container Logs

Open WebUI: No errors (bug is in payload conversion logic)

Ollama Logs:

level=WARN msg="invalid option provided" option=max_tokens

Code Evidence

Regression introduced in commit fea169a9c (Feb 19, 2025)
File: backend/open_webui/utils/payload.py
Root-level max_tokens handling was removed

Additional Information

Root Cause

Commit fea169a9c (Feb 19, 2025) added nested options.max_tokens support but removed root-level handling

Impact

Severity: Medium (regression)
Scope: All OpenAI API clients
Workaround: Nest max_tokens in options (non-standard)

Fix

Restore root-level handling while preserving nested fix

Originally created by @elazar on GitHub (Oct 25, 2025). ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.6.33 (tested) / main branch (as of Oct 25, 2025) ### Ollama Version (if applicable) v0.5.11 ### Operating System Linux (Ubuntu 22.04) - should also be reproduceable on Debian 12, macOS, and Windows ### Browser (if applicable) N/A (backend bug, affects all clients) ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior When sending OpenAI-style API requests with root-level `max_tokens` parameter to Open WebUI (which proxies to Ollama), the `max_tokens` parameter should be converted to Ollama's `num_predict` parameter and properly limit the output token length. This behavior worked correctly prior to commit fea169a9c (February 19, 2025). ### Actual Behavior Root-level `max_tokens` parameter is completely dropped during payload conversion, resulting in: 1. Output length not being limited as requested by the user 2. Ollama warnings in logs (when `OLLAMA_DEBUG=true`): level=WARN msg="invalid option provided" option=max_tokens 3. Parameter silently ignored with no error feedback to the user ### Steps to Reproduce ## Prerequisites - Fresh Ubuntu 22.04 system - Docker Engine v24.0.5+ installed - Ollama v0.5.11+ running on host or separate container - Open WebUI v0.6.33 (main branch) ## Setup 1. Install Ollama: `curl -fsSL https://ollama.com/install.sh | sh` 2. Pull model: `ollama pull llama2` 3. Run Open WebUI: ```sh docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -e OLLAMA_BASE_URL=http://host.docker.internal:11434 -v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:main ``` ## Reproduce Bug 4. Get API key from Settings → Account 5. Send request with root-level max_tokens: ```sh curl -X POST http://localhost:3000/api/chat/completions -H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" -d '{"model": "llama2", "messages": [{"role": "user", "content": "Write a long essay"}], "max_tokens": 50, "stream": false}' ``` 6. Check Ollama logs: `journalctl -u ollama | grep max_tokens` 7. Observe: Response NOT limited to 50 tokens, warning appears in logs ### Logs & Screenshots ## Browser Console Logs No relevant errors in browser console - this is a backend payload conversion bug that occurs during server-side request processing before any response reaches the browser. The client receives what appears to be a successful response, but the `max_tokens` parameter was dropped during Open WebUI's internal conversion to Ollama format, causing the warning in Ollama's logs. ## Docker Container Logs **Open WebUI**: No errors (bug is in payload conversion logic) **Ollama Logs**: ``` level=WARN msg="invalid option provided" option=max_tokens ``` ## Code Evidence Regression introduced in commit fea169a9c (Feb 19, 2025) File: `backend/open_webui/utils/payload.py` Root-level `max_tokens` handling was removed ### Additional Information ## Root Cause Commit fea169a9c (Feb 19, 2025) added nested `options.max_tokens` support but removed root-level handling ## Impact - Severity: Medium (regression) - Scope: All OpenAI API clients - Workaround: Nest `max_tokens` in options (non-standard) ## Fix Restore root-level handling while preserving nested fix

GiteaMirror added the bug label 2025-11-11 17:05:25 -06:00

GiteaMirror closed this issue

2025-11-11 17:05:26 -06:00

GiteaMirror commented

2025-11-11 17:05:26 -06:00

@tjbck commented on GitHub (Oct 26, 2025):

Should be addressed in dev with d11d49a08a

@tjbck commented on GitHub (Oct 26, 2025): Should be addressed in dev with d11d49a08a9147b430aa04ee3f588f096aa7e24a