mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-06 02:48:13 -05:00
[GH-ISSUE #22791] issue: Open WebUI API: incomplete / dropped output with latest Grok models via /api/v1/chat/completions, while older Grok and OpenAI models work #19820
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Patrick-0815 on GitHub (Mar 18, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/22791
Check Existing Issues
Installation Method
Docker
Open WebUI Version
0.8.10
Ollama Version (if applicable)
No response
Operating System
Ubuntu 24
Browser (if applicable)
No response
Confirmation
README.md.Expected Behavior
Expected behavior
The full model output should be returned consistently via /api/v1/chat/completions, regardless of whether the backend model is OpenAI or Grok.
Actual Behavior
When using the Open WebUI API endpoint /api/v1/chat/completions, responses from the latest xAI/Grok models are intermittently incomplete or appear to be dropped/truncated.
Describe the bug:
Affected models:
-grok-4-1-fast-reasoning
-grok-4-1-fast-non-reasoning
-grok-code-fast-1
Not affected:
-OpenAI models
-grok-4-fast-reasoning
-grok-4-fast-non-reasoning
The issue is reproducible through the Open WebUI API interface.
It does not appear to be a general client-side transport issue, because other models over the same path work correctly.
Actual behavior
With the affected Grok models, the output is sometimes incomplete, dropped, or terminates unexpectedly.
Observed behavior includes:
-incomplete assistant output
-stream ends successfully, but visible output is missing or partial
-issue is model-specific
-issue does not occur with OpenAI models-
-issue does not occur with grok-4-fast-reasoning or grok-4-fast-non-reasoning
Steps to Reproduce
Configure xAI/Grok provider in Open WebUI
Call /api/v1/chat/completions
Use one of the affected models:
grok-4-1-fast-reasoning
grok-4-1-fast-non-reasoning
grok-code-fast-1
Send prompts that produce longer or more complex outputs
Compare behavior with:
OpenAI models
grok-4-fast-reasoning
grok-4-fast-non-reasoning
Logs & Screenshots
There are no specific Error Logs.
Additional Information
From API/proxy-level inspection, the affected Grok models appear to emit reasoning-related delta fields (for example reasoning_content / reasoning-style deltas) that differ from the behavior of OpenAI models and older Grok models.
In at least some problematic cases, the stream ends with:
-finish_reason: "stop"
-little or no visible assistant content
-in some cases effectively zero visible output tokens, despite the request finishing successfully
This suggests there may be an Open WebUI compatibility issue in the API layer when handling newer Grok streaming response formats, especially for reasoning/code-capable models.