issue: Root-level max_tokens dropped instead of converted to num_predict (Regression from Feb 2025) #6767

Closed
opened 2025-11-11 17:05:25 -06:00 by GiteaMirror · 1 comment
Owner

Originally created by @elazar on GitHub (Oct 25, 2025).

Check Existing Issues

  • I have searched for any existing and/or related issues.
  • I have searched for any existing and/or related discussions.
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.6.33 (tested) / main branch (as of Oct 25, 2025)

Ollama Version (if applicable)

v0.5.11

Operating System

Linux (Ubuntu 22.04) - should also be reproduceable on Debian 12, macOS, and Windows

Browser (if applicable)

N/A (backend bug, affects all clients)

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

When sending OpenAI-style API requests with root-level max_tokens parameter to Open WebUI (which proxies to Ollama), the max_tokens parameter should be converted to Ollama's num_predict parameter and properly limit the output token length.

This behavior worked correctly prior to commit fea169a9c (February 19, 2025).

Actual Behavior

Root-level max_tokens parameter is completely dropped during payload conversion, resulting in:

  1. Output length not being limited as requested by the user
  2. Ollama warnings in logs (when OLLAMA_DEBUG=true): level=WARN msg="invalid option provided" option=max_tokens
  3. Parameter silently ignored with no error feedback to the user

Steps to Reproduce

Prerequisites

  • Fresh Ubuntu 22.04 system
  • Docker Engine v24.0.5+ installed
  • Ollama v0.5.11+ running on host or separate container
  • Open WebUI v0.6.33 (main branch)

Setup

  1. Install Ollama: curl -fsSL https://ollama.com/install.sh | sh
  2. Pull model: ollama pull llama2
  3. Run Open WebUI:
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -e OLLAMA_BASE_URL=http://host.docker.internal:11434 -v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:main

Reproduce Bug

  1. Get API key from Settings → Account
  2. Send request with root-level max_tokens:
curl -X POST http://localhost:3000/api/chat/completions -H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" -d '{"model": "llama2", "messages": [{"role": "user", "content": "Write a long essay"}], "max_tokens": 50, "stream": false}'
  1. Check Ollama logs: journalctl -u ollama | grep max_tokens
  2. Observe: Response NOT limited to 50 tokens, warning appears in logs

Logs & Screenshots

Browser Console Logs

No relevant errors in browser console - this is a backend payload conversion bug that occurs during server-side request processing before any response reaches the browser.

The client receives what appears to be a successful response, but the max_tokens parameter was dropped during Open WebUI's internal conversion to Ollama format, causing the warning in Ollama's logs.

Docker Container Logs

Open WebUI: No errors (bug is in payload conversion logic)

Ollama Logs:

level=WARN msg="invalid option provided" option=max_tokens

Code Evidence

Regression introduced in commit fea169a9c (Feb 19, 2025)
File: backend/open_webui/utils/payload.py
Root-level max_tokens handling was removed

Additional Information

Root Cause

Commit fea169a9c (Feb 19, 2025) added nested options.max_tokens support but removed root-level handling

Impact

  • Severity: Medium (regression)
  • Scope: All OpenAI API clients
  • Workaround: Nest max_tokens in options (non-standard)

Fix

Restore root-level handling while preserving nested fix

Originally created by @elazar on GitHub (Oct 25, 2025). ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.6.33 (tested) / main branch (as of Oct 25, 2025) ### Ollama Version (if applicable) v0.5.11 ### Operating System Linux (Ubuntu 22.04) - should also be reproduceable on Debian 12, macOS, and Windows ### Browser (if applicable) N/A (backend bug, affects all clients) ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior When sending OpenAI-style API requests with root-level `max_tokens` parameter to Open WebUI (which proxies to Ollama), the `max_tokens` parameter should be converted to Ollama's `num_predict` parameter and properly limit the output token length. This behavior worked correctly prior to commit fea169a9c (February 19, 2025). ### Actual Behavior Root-level `max_tokens` parameter is completely dropped during payload conversion, resulting in: 1. Output length not being limited as requested by the user 2. Ollama warnings in logs (when `OLLAMA_DEBUG=true`): level=WARN msg="invalid option provided" option=max_tokens 3. Parameter silently ignored with no error feedback to the user ### Steps to Reproduce ## Prerequisites - Fresh Ubuntu 22.04 system - Docker Engine v24.0.5+ installed - Ollama v0.5.11+ running on host or separate container - Open WebUI v0.6.33 (main branch) ## Setup 1. Install Ollama: `curl -fsSL https://ollama.com/install.sh | sh` 2. Pull model: `ollama pull llama2` 3. Run Open WebUI: ```sh docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -e OLLAMA_BASE_URL=http://host.docker.internal:11434 -v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:main ``` ## Reproduce Bug 4. Get API key from Settings → Account 5. Send request with root-level max_tokens: ```sh curl -X POST http://localhost:3000/api/chat/completions -H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" -d '{"model": "llama2", "messages": [{"role": "user", "content": "Write a long essay"}], "max_tokens": 50, "stream": false}' ``` 6. Check Ollama logs: `journalctl -u ollama | grep max_tokens` 7. Observe: Response NOT limited to 50 tokens, warning appears in logs ### Logs & Screenshots ## Browser Console Logs No relevant errors in browser console - this is a backend payload conversion bug that occurs during server-side request processing before any response reaches the browser. The client receives what appears to be a successful response, but the `max_tokens` parameter was dropped during Open WebUI's internal conversion to Ollama format, causing the warning in Ollama's logs. ## Docker Container Logs **Open WebUI**: No errors (bug is in payload conversion logic) **Ollama Logs**: ``` level=WARN msg="invalid option provided" option=max_tokens ``` ## Code Evidence Regression introduced in commit fea169a9c (Feb 19, 2025) File: `backend/open_webui/utils/payload.py` Root-level `max_tokens` handling was removed ### Additional Information ## Root Cause Commit fea169a9c (Feb 19, 2025) added nested `options.max_tokens` support but removed root-level handling ## Impact - Severity: Medium (regression) - Scope: All OpenAI API clients - Workaround: Nest `max_tokens` in options (non-standard) ## Fix Restore root-level handling while preserving nested fix
GiteaMirror added the bug label 2025-11-11 17:05:25 -06:00
Author
Owner

@tjbck commented on GitHub (Oct 26, 2025):

Should be addressed in dev with d11d49a08a

@tjbck commented on GitHub (Oct 26, 2025): Should be addressed in dev with d11d49a08a9147b430aa04ee3f588f096aa7e24a
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#6767