[GH-ISSUE #22791] issue: Open WebUI API: incomplete / dropped output with latest Grok models via /api/v1/chat/completions, while older Grok and OpenAI models work

Originally created by @Patrick-0815 on GitHub (Mar 18, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/22791 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version 0.8.10 ### Ollama Version (if applicable) _No response_ ### Operating System Ubuntu 24 ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior Expected behavior The full model output should be returned consistently via /api/v1/chat/completions, regardless of whether the backend model is OpenAI or Grok. ### Actual Behavior When using the Open WebUI API endpoint /api/v1/chat/completions, responses from the latest xAI/Grok models are intermittently incomplete or appear to be dropped/truncated. Describe the bug: Affected models: -grok-4-1-fast-reasoning -grok-4-1-fast-non-reasoning -grok-code-fast-1 Not affected: -OpenAI models -grok-4-fast-reasoning -grok-4-fast-non-reasoning The issue is reproducible through the Open WebUI API interface. It does not appear to be a general client-side transport issue, because other models over the same path work correctly. Actual behavior With the affected Grok models, the output is sometimes incomplete, dropped, or terminates unexpectedly. Observed behavior includes: -incomplete assistant output -stream ends successfully, but visible output is missing or partial -issue is model-specific -issue does not occur with OpenAI models- -issue does not occur with grok-4-fast-reasoning or grok-4-fast-non-reasoning ### Steps to Reproduce Configure xAI/Grok provider in Open WebUI Call /api/v1/chat/completions Use one of the affected models: grok-4-1-fast-reasoning grok-4-1-fast-non-reasoning grok-code-fast-1 Send prompts that produce longer or more complex outputs Compare behavior with: OpenAI models grok-4-fast-reasoning grok-4-fast-non-reasoning ### Logs & Screenshots There are no specific Error Logs. ### Additional Information From API/proxy-level inspection, the affected Grok models appear to emit reasoning-related delta fields (for example reasoning_content / reasoning-style deltas) that differ from the behavior of OpenAI models and older Grok models. In at least some problematic cases, the stream ends with: -finish_reason: "stop" -little or no visible assistant content -in some cases effectively zero visible output tokens, despite the request finishing successfully This suggests there may be an Open WebUI compatibility issue in the API layer when handling newer Grok streaming response formats, especially for reasoning/code-capable models.

GiteaMirror commented

2026-04-20 02:19:16 -05:00

Owner

Originally created by @Patrick-0815 on GitHub (Mar 18, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/22791

Check Existing Issues

I have searched for any existing and/or related issues.
I have searched for any existing and/or related discussions.
I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

0.8.10

Ollama Version (if applicable)

No response

Operating System

Ubuntu 24

Browser (if applicable)

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

Expected behavior
The full model output should be returned consistently via /api/v1/chat/completions, regardless of whether the backend model is OpenAI or Grok.

Actual Behavior

When using the Open WebUI API endpoint /api/v1/chat/completions, responses from the latest xAI/Grok models are intermittently incomplete or appear to be dropped/truncated.

Describe the bug:
Affected models:
-grok-4-1-fast-reasoning
-grok-4-1-fast-non-reasoning
-grok-code-fast-1
Not affected:
-OpenAI models
-grok-4-fast-reasoning
-grok-4-fast-non-reasoning

The issue is reproducible through the Open WebUI API interface.
It does not appear to be a general client-side transport issue, because other models over the same path work correctly.

Actual behavior
With the affected Grok models, the output is sometimes incomplete, dropped, or terminates unexpectedly.
Observed behavior includes:
-incomplete assistant output
-stream ends successfully, but visible output is missing or partial
-issue is model-specific
-issue does not occur with OpenAI models-
-issue does not occur with grok-4-fast-reasoning or grok-4-fast-non-reasoning

Steps to Reproduce

Configure xAI/Grok provider in Open WebUI
Call /api/v1/chat/completions
Use one of the affected models:
grok-4-1-fast-reasoning
grok-4-1-fast-non-reasoning
grok-code-fast-1
Send prompts that produce longer or more complex outputs

Compare behavior with:
OpenAI models
grok-4-fast-reasoning
grok-4-fast-non-reasoning

Logs & Screenshots

There are no specific Error Logs.

Additional Information

From API/proxy-level inspection, the affected Grok models appear to emit reasoning-related delta fields (for example reasoning_content / reasoning-style deltas) that differ from the behavior of OpenAI models and older Grok models.

In at least some problematic cases, the stream ends with:
-finish_reason: "stop"
-little or no visible assistant content
-in some cases effectively zero visible output tokens, despite the request finishing successfully

This suggests there may be an Open WebUI compatibility issue in the API layer when handling newer Grok streaming response formats, especially for reasoning/code-capable models.