[GH-ISSUE #23917] feat: Open WebUI -- Bug Reports & Feature Requests with attached Patches #35635

New Issue

GiteaMirror · 2026-04-25T09:47:56-05:00

GiteaMirror commented

2026-04-25 09:47:56 -05:00

Originally created by @pvyswiss on GitHub (Apr 21, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/23917

Check Existing Issues

I have searched for all existing open AND closed issues and discussions for similar requests. I have found none that is comparable to my request.

Verify Feature Scope

I have read through and understood the scope definition for feature requests in the Issues section. I believe my feature request meets the definition and belongs in the Issues section instead of the Discussions.

Problem Description

I tought a PR is not the right way, since its address several findings. Also made the SDK Caching into from Anthropic, saves you a lot of Tokens.

Open WebUI -- Bug Reports & Feature Requests

These findings were discovered while building an enterprise AI platform using Open WebUI as the API gateway for OpenAI-compatible desktop clients (OpenCode Desktop, Continue.dev). All bugs were found in the API path (direct /api/chat/completions calls without WebSocket/event_emitter). The Web UI chat (browser path with event_emitter) was working correctly in all cases.

Prepared by: PVY.swiss
Open WebUI Version: main (as of 2026-04-07)
Patched files in: patches/backend/open_webui/
Excludes: OnlyOffice/PVYoffice integrations (proprietary)

Bug 1: Stale `Content-Encoding` Header Forwarded from Upstream APIs

File: backend/open_webui/routers/openai.py
Severity: Critical
Affects: All models routed through OpenAI-compatible upstream APIs (Anthropic, OpenAI, etc.)

Symptom

Desktop clients (OpenCode Desktop, OnlyOffice plugin) receive Decompression error: ZlibError when streaming SSE from Claude/OpenAI models. The Web UI works fine.

Root Cause

aiohttp (Open WebUI's HTTP client) sends Accept-Encoding: gzip to upstream APIs. When the upstream responds with Content-Encoding: gzip, aiohttp auto-decompresses the response body but keeps the Content-Encoding: gzip header in the response headers (known behavior, see aiohttp#4462).

Open WebUI's openai.py forwards all upstream headers verbatim to the client:

# 4 locations in openai.py (lines ~1186, ~1273, ~1389, ~1495):
return StreamingResponse(
    stream_wrapper(r, session, stream_chunks_handler),
    status_code=r.status,
    headers=dict(r.headers),   # <-- Includes stale Content-Encoding: gzip
)

The client receives:

Header: Content-Encoding: gzip (tells client "this is compressed")
Body: Plain text SSE chunks (already decompressed by aiohttp)

The client tries to gunzip plain text → ZlibError.

Fix

Strip hop-by-hop / encoding headers that become stale after aiohttp auto-decompression:

_STRIP_PROXY_HEADERS = frozenset({
    'Content-Encoding',
    'Content-Length',
    'Transfer-Encoding',
})
def _clean_proxy_headers(raw_headers) -> dict:

return {k: v for k, v in raw_headers.items() if k not in _STRIP_PROXY_HEADERS}
At all 4 StreamingResponse locations:
headers=_clean_proxy_headers(r.headers),

Impact

Web UI: None (browser path doesn't use the OpenAI proxy for SSE)
API clients: All desktop/programmatic clients that honor Content-Encoding headers are fixed

Bug 2: Premature `finish_reason: "stop"` on First SSE Chunk (Ollama Models)

File: backend/open_webui/utils/misc.py
Severity: Critical
Affects: All Ollama models with thinking/reasoning (DeepSeek R1, Gemma 4) when accessed via API

Symptom

API clients (OpenCode Desktop) see the first SSE chunk, then the stream appears to end. The model seems to hang after "let me think." The Web UI works fine.

Root Cause

openai_chat_chunk_message_template() in misc.py uses truthy checks to determine when to set finish_reason: "stop":

# Line ~501:
if not content and not reasoning_content and not tool_calls:
    template['choices'][0]['finish_reason'] = 'stop'

When Ollama sends the first chunk for a reasoning model, both content and thinking are empty strings (""). Empty strings are falsy in Python, so the condition is True, and finish_reason: "stop" is set on the very first chunk.

API clients that comply with the OpenAI spec (like @ai-sdk/openai-compatible) see finish_reason: "stop" and close the stream immediately -- before any reasoning or content tokens arrive.

The Web UI is unaffected because its browser path never checks finish_reason for stream termination; it reads until data: [DONE].

Fix

Only set finish_reason: "stop" when usage is present (which only happens on the final chunk when Ollama sends done: true):

if usage and not content and not reasoning_content and not tool_calls:
    template['choices'][0]['finish_reason'] = 'stop'

Impact

Web UI: None (browser path ignores finish_reason)
API clients: Stream continues correctly through reasoning and content phases

Bug 3: Missing `role` and Non-Unique Chunk IDs in Ollama SSE Conversion

File: backend/open_webui/utils/response.py
Severity: Major
Affects: All Ollama models when accessed via API

Symptom

API clients fail to correlate SSE chunks or initialize the response correctly. Some SDKs silently drop chunks or fail to render content.

Root Cause (Two Issues)

3a: Missing delta.role: "assistant" on first chunk

The OpenAI SSE spec requires the first chunk to contain delta.role: "assistant". The Ollama-to-OpenAI conversion in convert_streaming_response_ollama_to_openai() never emits role. OpenAI-compatible SDKs expect this field to initialize the message.

3b: Unique UUID per SSE chunk

openai_chat_message_template() generates a new uuid4() for every chunk:

return {
    'id': f'{model}-{str(uuid.uuid4())}',  # NEW UUID EVERY CALL
    ...
}

The OpenAI spec requires all chunks in a single completion to share the same id. SDKs use this ID to correlate chunks into one response. With unique IDs per chunk, SDK chunk correlation breaks.

For comparison:

Anthropic/OpenAI upstream: consistent msg_xxxxx ID across all chunks ✓
Ollama via Open WebUI: unique ID per chunk ✗

Fix

Generate one chatcmpl- prefixed ID per stream and add role: "assistant" to the first chunk:

async def convert_streaming_response_ollama_to_openai(ollama_streaming_response):
    first_chunk = True
    completion_id = f'chatcmpl-{uuid4()}'
async for data in ollama_streaming_response.body_iterator:
    ...
    data[&#39;id&#39;] = completion_id  # Same ID for all chunks

    if first_chunk:
        data[&#39;choices&#39;][0][&#39;delta&#39;][&#39;role&#39;] = &#39;assistant&#39;
        first_chunk = False
    ...</code></pre>

Impact

Web UI: None (browser path doesn't use chunk id or delta.role)
API clients: Proper chunk correlation and response initialization


Bug 4: Ollama-to-OpenAI Conversion Doesn't Expose Reasoning as Standard Content
File: backend/open_webui/utils/response.py
Severity: Feature Request
Affects: Ollama reasoning models (DeepSeek R1, Gemma 4) when accessed via generic OpenAI-compatible clients
Problem
Ollama's native API sends reasoning in message.thinking. Open WebUI's conversion puts this in delta.reasoning_content -- a non-standard field that only DeepSeek-specific SDKs understand.
Generic OpenAI-compatible SDKs (like @ai-sdk/openai-compatible) don't recognize reasoning_content and silently drop the reasoning tokens. Users see no thinking process.
The Web UI handles this correctly because its middleware at line ~3763 explicitly reads reasoning_content, reasoning, and thinking fields and renders them as collapsible blocks.
Suggested Enhancement
Convert Ollama's thinking field into <think> tags inside the standard content field during the Ollama-to-OpenAI streaming conversion. This makes reasoning visible to ALL clients:
if reasoning_content:
    if not in_reasoning:
        in_reasoning = True
        message_content = "<think>\n" + reasoning_content
    else:
        message_content = reasoning_content
    reasoning_content = None  # Don't pass as separate field
elif in_reasoning and message_content:
    in_reasoning = False
    message_content = "\n</think>\n\n" + message_content
Open WebUI's browser middleware already detects <think> tags and renders them as collapsible reasoning blocks, so this is backward-compatible.
Impact

Web UI: <think> tags rendered as collapsible reasoning (same as before)
API clients: Reasoning visible as standard content text with <think> tag markers


Bug 5: Client-Provided tools Crash Ollama Models That Don't Support Tool Calling
File: backend/open_webui/main.py
Severity: Critical
Affects: DeepSeek R1 and other Ollama models without native tool support, when called from API clients that send tools
Symptom
OpenCode Desktop sends two concurrent requests: one for chat, one for title generation (with tools and tool_choice). For DeepSeek R1, Ollama rejects the tools request with "does not support tools". Open WebUI catches this exception but returns null to the client (the except handler at line ~1929 logs at DEBUG level and falls through without returning a proper response). The SDK receives null, crashes, and kills both concurrent streams.
Root Cause
API clients like @ai-sdk/openai-compatible automatically send tools if the caller supports them. Open WebUI's middleware checks function_calling capability for its own internal tools (MCP servers), but does NOT strip client-provided tools from form_data. These client tools pass through to Ollama, which rejects them for models without tool support.
Additionally, the except Exception handler at process_chat (line ~1929) catches the error but only emits it via event_emitter (which is None for API clients). The function implicitly returns None, which FastAPI serializes as null.
Fix
Strip client-provided tools and tool_choice when the model doesn't have function_calling: "native":
if (
    form_data.get('tools')
    and model_info_params.get('function_calling') != 'native'
    and not form_data.get('params', {}).get('function_calling') == 'native'
):
    form_data.pop('tools', None)
    form_data.pop('tool_choice', None)
Also upgrade the error handler from log.debug to log.warning so API path failures are visible:
except Exception as e:
    log.warning(f"Error processing chat payload: {e}")  # was log.debug
Impact

Web UI: None (Web UI tools use tools_dict in middleware, not client-provided tools)
API clients: Tools are stripped gracefully; request proceeds as normal chat. Models with function_calling: "native" (like Gemma 4) still receive tools correctly.


Bug 6: No Token Usage Analytics for API-Key Requests (No chat_id)
File: backend/open_webui/utils/middleware.py
Severity: Major (data loss -- no cost tracking for API clients)
Affects: All models when accessed via API key without chat_id/session_id (OpenCode Desktop, Continue.dev, curl, custom scripts)
Symptom
The /admin/analytics dashboard shows zero token usage for models used exclusively via API clients. Models used from the Web UI browser show correct analytics. The chat_message table has zero rows for API-only models (e.g., pvy-senior-lead-dev used from OpenCode Desktop).
Root Cause
The analytics pipeline requires event_emitter to be non-None, which requires chat_id, session_id, and message_id in the request metadata:
# middleware.py line 2750-2763:
def get_event_emitter_and_caller(metadata):
    event_emitter = None
    event_caller = None
    if (
        'session_id' in metadata and metadata['session_id']
        and 'chat_id' in metadata and metadata['chat_id']
        and 'message_id' in metadata and metadata['message_id']
    ):
        event_emitter = get_event_emitter(metadata)
        event_caller = get_event_call(metadata)
    return event_emitter, event_caller
API clients don't send chat_id/session_id/message_id because they manage their own conversation state. When event_emitter is None, the streaming handler falls to the passthrough branch:
# middleware.py line 3209:
if event_emitter and event_caller:
    # Full handler: usage tracking, DB writes, tool execution, title generation
    ...
else:
    # Line 4688-4722: Simple passthrough -- NO usage tracking, NO DB writes
    async def stream_wrapper(original_generator, events):
        async for data in original_generator:
            yield data
The usage data IS present in the SSE stream (Anthropic returns it with stream_options: {"include_usage": true}), but nobody reads it on the passthrough path.
Data Flow Comparison



Path
chat_id
event_emitter
Usage Tracked
DB Write




Web UI browser
Set by frontend
Non-None
Yes (line 3619-3622)
Yes (line 4614-4635)


API client (OpenCode)
None
None
No (passthrough)
No



Suggested Fix
For API-key requests without chat_id, the middleware should still extract usage from the final SSE chunk and write it to a dedicated api_usage table (or the existing chat_message table with a synthetic chat_id like api:{user_id}:{timestamp}). This enables cost tracking for all API consumers.
Minimal approach -- extract usage in the passthrough branch:
# In the else branch at line 4688:
else:
    async def stream_wrapper(original_generator, events):
        usage = None
        async for data in original_generator:
            # Extract usage from final chunk if present
            if isinstance(data, str) and data.startswith('data: '):
                try:
                    chunk = json.loads(data[6:])
                    raw_usage = chunk.get('usage')
                    if raw_usage:
                        usage = normalize_usage(raw_usage)
                except (json.JSONDecodeError, ValueError):
                    pass
            yield data
    # Write usage to DB even without chat_id
    if usage and metadata.get(&#39;user_id&#39;):
        ChatMessages.create_api_usage_record(
            user_id=metadata[&#39;user_id&#39;],
            model_id=form_data.get(&#39;model&#39;, &#39;&#39;),
            usage=usage,
        )</code></pre>

Impact

Web UI: None (already has full analytics)
API clients: Token usage tracked for cost monitoring and analytics dashboards
Verified: chat_message table has 0 rows for pvy-senior-lead-dev (OpenCode Desktop only), but 22+ rows with usage for pvy-researcher (Web UI browser)


Bug 7: Anthropic OpenAI-Compat Endpoint Needs stream_options for Usage in SSE
File: backend/open_webui/routers/openai.py
Severity: Minor (safety net -- most clients already send it)
Affects: Anthropic models via /v1/chat/completions when client doesn't send stream_options
Symptom
When streaming from Anthropic's OpenAI-compatible endpoint without stream_options: {"include_usage": true}, the final SSE chunk contains no usage field. The middleware's normalize_usage at line 3619 finds nothing to process.
Root Cause
Anthropic's /v1/chat/completions endpoint (OpenAI-compat) supports stream_options (confirmed in their docs: "Fully supported"), but only returns usage when explicitly requested. Unlike their native /v1/messages endpoint which always includes usage in message_start and message_delta events.
Open WebUI doesn't inject stream_options -- it relies on the client to send it. The Web UI browser does send it (when model.info.meta.capabilities.usage is true), but API scripts and the OnlyOffice plugin don't.
Fix
Inject stream_options for Anthropic requests on the OpenAI-compat path:
# After URL routing, before payload serialization:
if (
    is_anthropic_url(url)
    and not _is_anthropic_native
    and isinstance(payload, dict)
    and payload.get('stream')
):
    payload.setdefault('stream_options', {'include_usage': True})
setdefault preserves client-provided values. isinstance(payload, dict) guards against the native proxy path where payload is already serialized to JSON string.
Impact

Web UI: None (already sends stream_options for models with usage capability)
OpenCode Desktop: None (SDK already sends stream_options)
API scripts / curl / OnlyOffice: Usage now included in SSE stream


File Structure
patches/
└── backend/
    └── open_webui/
        ├── main.py              # Bug 5: Strip client tools + error logging
        ├── routers/
        │   └── openai.py        # Bug 1: Strip stale Content-Encoding headers
        │                        # Bug 7: Inject stream_options for Anthropic
        └── utils/
            ├── misc.py           # Bug 2: Fix premature finish_reason stop
            ├── response.py       # Bug 3 + 4: Role, chunk ID, reasoning tags
            └── middleware.py     # Bug 6: API analytics gap (design issue, no patch yet)
Reproduction
All bugs are reproducible with:

Open WebUI main branch (2026-04-07, updated 2026-04-11)
Ollama with DeepSeek R1 32B or Gemma 4 31B
Anthropic Claude via API key
Any OpenAI-compatible API client (curl with --compressed, OpenCode Desktop, etc.)
Upstream API with gzip (Anthropic via Cloudflare)

# Bug 1: Stale Content-Encoding
curl --compressed -sN -H "Authorization: Bearer $KEY" \
  -d '{"model":"claude-model","messages":[{"role":"user","content":"hi"}],"stream":true}' \
  http://localhost:3000/api/chat/completions
# Result: ZlibError or garbled output
Bug 2+3: Premature stop + missing role
curl -sN -H "Authorization: Bearer $KEY" 

-d '{"model":"deepseek-r1:32b","messages":[{"role":"user","content":"hi"}],"stream":true}' 

http://localhost:3000/api/chat/completions
Result: First chunk has finish_reason:"stop", no role, unique IDs per chunk
Bug 5: Tools crash
curl -s -H "Authorization: Bearer $KEY" 

-d '{"model":"deepseek-r1:32b","messages":[{"role":"user","content":"hi"}],"tools":[{"type":"function","function":{"name":"test","parameters":{"type":"object","properties":{}}}}],"stream":true}' 

http://localhost:3000/api/chat/completions
Result: null
Bug 6: API analytics gap
curl -sN -H "Authorization: Bearer $KEY" 

-d '{"model":"claude-opus-4-6","messages":[{"role":"user","content":"hi"}],"stream":true}' 

http://localhost:3000/api/chat/completions
Result: Stream works, usage in final chunk, but NO row in chat_message table
Bug 7: Missing stream_options
curl -sN -H "Authorization: Bearer $KEY" 

-d '{"model":"claude-opus-4-6","messages":[{"role":"user","content":"hi"}],"stream":true}' 

http://localhost:3000/api/chat/completions
Without patch: final chunk has no "usage" field
With patch: final chunk includes "usage":{"prompt_tokens":N,"completion_tokens":N,...}
Open WebUI -- Bug Reports & Feature Requests

These findings were discovered while building an enterprise AI platform using Open WebUI

as the API gateway for OpenAI-compatible desktop clients (OpenCode Desktop, Continue.dev).

All bugs were found in the API path (direct /api/chat/completions calls without WebSocket/event_emitter).

The Web UI chat (browser path with event_emitter) was working correctly in all cases.
Prepared by: PVY.swiss

Open WebUI Version: main (as of 2026-04-07)

Patched files in: patches/backend/open_webui/

Excludes: OnlyOffice/PVYoffice integrations (proprietary)


Bug 1: Stale Content-Encoding Header Forwarded from Upstream APIs
File: backend/open_webui/routers/openai.py

Severity: Critical

Affects: All models routed through OpenAI-compatible upstream APIs (Anthropic, OpenAI, etc.)
Symptom
Desktop clients (OpenCode Desktop, OnlyOffice plugin) receive Decompression error: ZlibError when streaming SSE from Claude/OpenAI models. The Web UI works fine.
Root Cause
aiohttp (Open WebUI's HTTP client) sends Accept-Encoding: gzip to upstream APIs. When the upstream responds with Content-Encoding: gzip, aiohttp auto-decompresses the response body but keeps the Content-Encoding: gzip header in the response headers (known behavior, see [aiohttp#4462](https://github.com/aio-libs/aiohttp/issues/4462)).
Open WebUI's openai.py forwards all upstream headers verbatim to the client:
# 4 locations in openai.py (lines ~1186, ~1273, ~1389, ~1495):
return StreamingResponse(
    stream_wrapper(r, session, stream_chunks_handler),
    status_code=r.status,
    headers=dict(r.headers),   # <-- Includes stale Content-Encoding: gzip
)
The client receives:

Header: Content-Encoding: gzip (tells client "this is compressed")
Body: Plain text SSE chunks (already decompressed by aiohttp)

The client tries to gunzip plain text → ZlibError.
Fix
Strip hop-by-hop / encoding headers that become stale after aiohttp auto-decompression:
_STRIP_PROXY_HEADERS = frozenset({
    'Content-Encoding',
    'Content-Length',
    'Transfer-Encoding',
})

def _clean_proxy_headers(raw_headers) -> dict:
    return {k: v for k, v in raw_headers.items() if k not in _STRIP_PROXY_HEADERS}

# At all 4 StreamingResponse locations:
headers=_clean_proxy_headers(r.headers),
Impact

Web UI: None (browser path doesn't use the OpenAI proxy for SSE)
API clients: All desktop/programmatic clients that honor Content-Encoding headers are fixed


Bug 2: Premature finish_reason: "stop" on First SSE Chunk (Ollama Models)
File: backend/open_webui/utils/misc.py

Severity: Critical

Affects: All Ollama models with thinking/reasoning (DeepSeek R1, Gemma 4) when accessed via API
Symptom
API clients (OpenCode Desktop) see the first SSE chunk, then the stream appears to end. The model seems to hang after "let me think." The Web UI works fine.
Root Cause
openai_chat_chunk_message_template() in misc.py uses truthy checks to determine when to set finish_reason: "stop":
# Line ~501:
if not content and not reasoning_content and not tool_calls:
    template['choices'][0]['finish_reason'] = 'stop'
When Ollama sends the first chunk for a reasoning model, both content and thinking are empty strings (""). Empty strings are falsy in Python, so the condition is True, and finish_reason: "stop" is set on the very first chunk.
API clients that comply with the OpenAI spec (like @ai-sdk/openai-compatible) see finish_reason: "stop" and close the stream immediately -- before any reasoning or content tokens arrive.
The Web UI is unaffected because its browser path never checks finish_reason for stream termination; it reads until data: [DONE].
Fix
Only set finish_reason: "stop" when usage is present (which only happens on the final chunk when Ollama sends done: true):
if usage and not content and not reasoning_content and not tool_calls:
    template['choices'][0]['finish_reason'] = 'stop'
Impact

Web UI: None (browser path ignores finish_reason)
API clients: Stream continues correctly through reasoning and content phases


Bug 3: Missing role and Non-Unique Chunk IDs in Ollama SSE Conversion
File: backend/open_webui/utils/response.py

Severity: Major

Affects: All Ollama models when accessed via API
Symptom
API clients fail to correlate SSE chunks or initialize the response correctly. Some SDKs silently drop chunks or fail to render content.
Root Cause (Two Issues)
3a: Missing delta.role: "assistant" on first chunk
The OpenAI SSE spec requires the first chunk to contain delta.role: "assistant". The Ollama-to-OpenAI conversion in convert_streaming_response_ollama_to_openai() never emits role. OpenAI-compatible SDKs expect this field to initialize the message.
3b: Unique UUID per SSE chunk
openai_chat_message_template() generates a new uuid4() for every chunk:
return {
    'id': f'{model}-{str(uuid.uuid4())}',  # NEW UUID EVERY CALL
    ...
}
The OpenAI spec requires all chunks in a single completion to share the same id. SDKs use this ID to correlate chunks into one response. With unique IDs per chunk, SDK chunk correlation breaks.
For comparison:

Anthropic/OpenAI upstream: consistent msg_xxxxx ID across all chunks ✓
Ollama via Open WebUI: unique ID per chunk ✗

Fix
Generate one chatcmpl- prefixed ID per stream and add role: "assistant" to the first chunk:
async def convert_streaming_response_ollama_to_openai(ollama_streaming_response):
    first_chunk = True
    completion_id = f'chatcmpl-{uuid4()}'

    async for data in ollama_streaming_response.body_iterator:
        ...
        data['id'] = completion_id  # Same ID for all chunks

        if first_chunk:
            data['choices'][0]['delta']['role'] = 'assistant'
            first_chunk = False
        ...
Impact

Web UI: None (browser path doesn't use chunk id or delta.role)
API clients: Proper chunk correlation and response initialization


Bug 4: Ollama-to-OpenAI Conversion Doesn't Expose Reasoning as Standard Content
File: backend/open_webui/utils/response.py

Severity: Feature Request

Affects: Ollama reasoning models (DeepSeek R1, Gemma 4) when accessed via generic OpenAI-compatible clients
Problem
Ollama's native API sends reasoning in message.thinking. Open WebUI's conversion puts this in delta.reasoning_content -- a non-standard field that only DeepSeek-specific SDKs understand.
Generic OpenAI-compatible SDKs (like @ai-sdk/openai-compatible) don't recognize reasoning_content and silently drop the reasoning tokens. Users see no thinking process.
The Web UI handles this correctly because its middleware at line ~3763 explicitly reads reasoning_content, reasoning, and thinking fields and renders them as collapsible blocks.
Suggested Enhancement
Convert Ollama's thinking field into <think> tags inside the standard content field during the Ollama-to-OpenAI streaming conversion. This makes reasoning visible to ALL clients:
if reasoning_content:
    if not in_reasoning:
        in_reasoning = True
        message_content = "<think>\n" + reasoning_content
    else:
        message_content = reasoning_content
    reasoning_content = None  # Don't pass as separate field
elif in_reasoning and message_content:
    in_reasoning = False
    message_content = "\n</think>\n\n" + message_content
Open WebUI's browser middleware already detects <think> tags and renders them as collapsible reasoning blocks, so this is backward-compatible.
Impact

Web UI: <think> tags rendered as collapsible reasoning (same as before)
API clients: Reasoning visible as standard content text with <think> tag markers


Bug 5: Client-Provided tools Crash Ollama Models That Don't Support Tool Calling
File: backend/open_webui/main.py

Severity: Critical

Affects: DeepSeek R1 and other Ollama models without native tool support, when called from API clients that send tools
Symptom
OpenCode Desktop sends two concurrent requests: one for chat, one for title generation (with tools and tool_choice). For DeepSeek R1, Ollama rejects the tools request with "does not support tools". Open WebUI catches this exception but returns null to the client (the except handler at line ~1929 logs at DEBUG level and falls through without returning a proper response). The SDK receives null, crashes, and kills both concurrent streams.
Root Cause
API clients like @ai-sdk/openai-compatible automatically send tools if the caller supports them. Open WebUI's middleware checks function_calling capability for its own internal tools (MCP servers), but does NOT strip client-provided tools from form_data. These client tools pass through to Ollama, which rejects them for models without tool support.
Additionally, the except Exception handler at process_chat (line ~1929) catches the error but only emits it via event_emitter (which is None for API clients). The function implicitly returns None, which FastAPI serializes as null.
Fix
Strip client-provided tools and tool_choice when the model doesn't have function_calling: "native":
if (
    form_data.get('tools')
    and model_info_params.get('function_calling') != 'native'
    and not form_data.get('params', {}).get('function_calling') == 'native'
):
    form_data.pop('tools', None)
    form_data.pop('tool_choice', None)
Also upgrade the error handler from log.debug to log.warning so API path failures are visible:
except Exception as e:
    log.warning(f"Error processing chat payload: {e}")  # was log.debug
Impact

Web UI: None (Web UI tools use tools_dict in middleware, not client-provided tools)
API clients: Tools are stripped gracefully; request proceeds as normal chat. Models with function_calling: "native" (like Gemma 4) still receive tools correctly.


Bug 6: No Token Usage Analytics for API-Key Requests (No chat_id)
File: backend/open_webui/utils/middleware.py

Severity: Major (data loss -- no cost tracking for API clients)

Affects: All models when accessed via API key without chat_id/session_id (OpenCode Desktop, Continue.dev, curl, custom scripts)
Symptom
The /admin/analytics dashboard shows zero token usage for models used exclusively via API clients. Models used from the Web UI browser show correct analytics. The chat_message table has zero rows for API-only models (e.g., pvy-senior-lead-dev used from OpenCode Desktop).
Root Cause
The analytics pipeline requires event_emitter to be non-None, which requires chat_id, session_id, and message_id in the request metadata:
# middleware.py line 2750-2763:
def get_event_emitter_and_caller(metadata):
    event_emitter = None
    event_caller = None
    if (
        'session_id' in metadata and metadata['session_id']
        and 'chat_id' in metadata and metadata['chat_id']
        and 'message_id' in metadata and metadata['message_id']
    ):
        event_emitter = get_event_emitter(metadata)
        event_caller = get_event_call(metadata)
    return event_emitter, event_caller
API clients don't send chat_id/session_id/message_id because they manage their own conversation state. When event_emitter is None, the streaming handler falls to the passthrough branch:
# middleware.py line 3209:
if event_emitter and event_caller:
    # Full handler: usage tracking, DB writes, tool execution, title generation
    ...
else:
    # Line 4688-4722: Simple passthrough -- NO usage tracking, NO DB writes
    async def stream_wrapper(original_generator, events):
        async for data in original_generator:
            yield data
The usage data IS present in the SSE stream (Anthropic returns it with stream_options: {"include_usage": true}), but nobody reads it on the passthrough path.
Data Flow Comparison



Path
chat_id
event_emitter
Usage Tracked
DB Write




Web UI browser
Set by frontend
Non-None
Yes (line 3619-3622)
Yes (line 4614-4635)


API client (OpenCode)
None
None
No (passthrough)
No



Suggested Fix
For API-key requests without chat_id, the middleware should still extract usage from the final SSE chunk and write it to a dedicated api_usage table (or the existing chat_message table with a synthetic chat_id like api:{user_id}:{timestamp}). This enables cost tracking for all API consumers.
Minimal approach -- extract usage in the passthrough branch:
# In the else branch at line 4688:
else:
    async def stream_wrapper(original_generator, events):
        usage = None
        async for data in original_generator:
            # Extract usage from final chunk if present
            if isinstance(data, str) and data.startswith('data: '):
                try:
                    chunk = json.loads(data[6:])
                    raw_usage = chunk.get('usage')
                    if raw_usage:
                        usage = normalize_usage(raw_usage)
                except (json.JSONDecodeError, ValueError):
                    pass
            yield data

        # Write usage to DB even without chat_id
        if usage and metadata.get('user_id'):
            ChatMessages.create_api_usage_record(
                user_id=metadata['user_id'],
                model_id=form_data.get('model', ''),
                usage=usage,
            )
Impact

Web UI: None (already has full analytics)
API clients: Token usage tracked for cost monitoring and analytics dashboards
Verified: chat_message table has 0 rows for pvy-senior-lead-dev (OpenCode Desktop only), but 22+ rows with usage for pvy-researcher (Web UI browser)


Bug 7: Anthropic OpenAI-Compat Endpoint Needs stream_options for Usage in SSE
File: backend/open_webui/routers/openai.py

Severity: Minor (safety net -- most clients already send it)

Affects: Anthropic models via /v1/chat/completions when client doesn't send stream_options
Symptom
When streaming from Anthropic's OpenAI-compatible endpoint without stream_options: {"include_usage": true}, the final SSE chunk contains no usage field. The middleware's normalize_usage at line 3619 finds nothing to process.
Root Cause
Anthropic's /v1/chat/completions endpoint (OpenAI-compat) supports stream_options (confirmed in their docs: "Fully supported"), but only returns usage when explicitly requested. Unlike their native /v1/messages endpoint which always includes usage in message_start and message_delta events.
Open WebUI doesn't inject stream_options -- it relies on the client to send it. The Web UI browser does send it (when model.info.meta.capabilities.usage is true), but API scripts and the OnlyOffice plugin don't.
Fix
Inject stream_options for Anthropic requests on the OpenAI-compat path:
# After URL routing, before payload serialization:
if (
    is_anthropic_url(url)
    and not _is_anthropic_native
    and isinstance(payload, dict)
    and payload.get('stream')
):
    payload.setdefault('stream_options', {'include_usage': True})
setdefault preserves client-provided values. isinstance(payload, dict) guards against the native proxy path where payload is already serialized to JSON string.
Impact

Web UI: None (already sends stream_options for models with usage capability)
OpenCode Desktop: None (SDK already sends stream_options)
API scripts / curl / OnlyOffice: Usage now included in SSE stream


File Structure
patches/
└── backend/
    └── open_webui/
        ├── main.py              # Bug 5: Strip client tools + error logging
        ├── routers/
        │   └── openai.py        # Bug 1: Strip stale Content-Encoding headers
        │                        # Bug 7: Inject stream_options for Anthropic
        └── utils/
            ├── misc.py           # Bug 2: Fix premature finish_reason stop
            ├── response.py       # Bug 3 + 4: Role, chunk ID, reasoning tags
            └── middleware.py     # Bug 6: API analytics gap (design issue, no patch yet)
Reproduction
All bugs are reproducible with:

Open WebUI main branch (2026-04-07, updated 2026-04-11)
Ollama with DeepSeek R1 32B or Gemma 4 31B
Anthropic Claude via API key
Any OpenAI-compatible API client (curl with --compressed, OpenCode Desktop, etc.)
Upstream API with gzip (Anthropic via Cloudflare)

# Bug 1: Stale Content-Encoding
curl --compressed -sN -H "Authorization: Bearer $KEY" \
  -d '{"model":"claude-model","messages":[{"role":"user","content":"hi"}],"stream":true}' \
  http://localhost:3000/api/chat/completions
# Result: ZlibError or garbled output

# Bug 2+3: Premature stop + missing role
curl -sN -H "Authorization: Bearer $KEY" \
  -d '{"model":"deepseek-r1:32b","messages":[{"role":"user","content":"hi"}],"stream":true}' \
  http://localhost:3000/api/chat/completions
# Result: First chunk has finish_reason:"stop", no role, unique IDs per chunk

# Bug 5: Tools crash
curl -s -H "Authorization: Bearer $KEY" \
  -d '{"model":"deepseek-r1:32b","messages":[{"role":"user","content":"hi"}],"tools":[{"type":"function","function":{"name":"test","parameters":{"type":"object","properties":{}}}}],"stream":true}' \
  http://localhost:3000/api/chat/completions
# Result: null

# Bug 6: API analytics gap
curl -sN -H "Authorization: Bearer $KEY" \
  -d '{"model":"claude-opus-4-6","messages":[{"role":"user","content":"hi"}],"stream":true}' \
  http://localhost:3000/api/chat/completions
# Result: Stream works, usage in final chunk, but NO row in chat_message table

# Bug 7: Missing stream_options
curl -sN -H "Authorization: Bearer $KEY" \
  -d '{"model":"claude-opus-4-6","messages":[{"role":"user","content":"hi"}],"stream":true}' \
  http://localhost:3000/api/chat/completions
# Without patch: final chunk has no "usage" field
# With patch: final chunk includes "usage":{"prompt_tokens":N,"completion_tokens":N,...}
Desired Solution you'd like
Native Tooling Support, Anthropic SKD Chaching using the preferred OpenAI-API Method on Desktop Tools such as OpenCode Desktop for popular on Ollama deployed Models like Claude, Sonnet, Qwen, Mistral, Gemma, GLM 5.1/Air 4.5 DeepSeek R1, Deepseek Coder 2.5, Deepseek R3. Patches attached below.
Alternatives Considered
The Fixes also enables native Tool Calling of Ollama hosted Models for Gemma, Anthropic, GLM, Qwen using OpenCode Desktop over the preferred OpenAI-API. That's why we touched the middleware too. Works
Additional Context
submitted-as_FR.zip

Originally created by @pvyswiss on GitHub (Apr 21, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/23917 ### Check Existing Issues - [x] I have searched for all existing **open AND closed** issues and discussions for similar requests. I have found none that is comparable to my request. ### Verify Feature Scope - [x] I have read through and understood the scope definition for feature requests in the Issues section. I believe my feature request meets the definition and belongs in the Issues section instead of the Discussions. ### Problem Description I tought a PR is not the right way, since its address several findings. Also made the SDK Caching into from Anthropic, saves you a lot of Tokens. <h1 id="open-webui----bug-reports-andamp-feature-requests" class="atx">Open WebUI -- Bug Reports & Feature Requests</h1> <blockquote> <p>These findings were discovered while building an enterprise AI platform using Open WebUI as the API gateway for OpenAI-compatible desktop clients (OpenCode Desktop, Continue.dev). All bugs were found in the <strong>API path</strong> (direct <code>/api/chat/completions</code> calls without WebSocket/event_emitter). The Web UI chat (browser path with event_emitter) was working correctly in all cases.</p> <p><strong>Prepared by</strong>: PVY.swiss<br><strong>Open WebUI Version</strong>: main (as of 2026-04-07)<br><strong>Patched files in</strong>: <code>patches/backend/open_webui/</code><br><strong>Excludes</strong>: OnlyOffice/PVYoffice integrations (proprietary)</p> </blockquote> <hr> <h2 id="bug-1-stale-content-encoding-header-forwarded-from-upstream-apis" class="atx">Bug 1: Stale <code>Content-Encoding</code> Header Forwarded from Upstream APIs</h2> <p><strong>File</strong>: <code>backend/open_webui/routers/openai.py</code><br><strong>Severity</strong>: Critical<br><strong>Affects</strong>: All models routed through OpenAI-compatible upstream APIs (Anthropic, OpenAI, etc.)</p> <h3 id="symptom" class="atx">Symptom</h3> <p>Desktop clients (OpenCode Desktop, OnlyOffice plugin) receive <code>Decompression error: ZlibError</code> when streaming SSE from Claude/OpenAI models. The Web UI works fine.</p> <h3 id="root-cause" class="atx">Root Cause</h3> <p><code>aiohttp</code> (Open WebUI's HTTP client) sends <code>Accept-Encoding: gzip</code> to upstream APIs. When the upstream responds with <code>Content-Encoding: gzip</code>, aiohttp <strong>auto-decompresses</strong> the response body but <strong>keeps the <code>Content-Encoding: gzip</code> header</strong> in the response headers (known behavior, see <a href="https://github.com/aio-libs/aiohttp/issues/4462">aiohttp#4462</a>).</p> <p>Open WebUI's <code>openai.py</code> forwards all upstream headers verbatim to the client:</p> <pre><code class="fenced-code-block language-python"># 4 locations in openai.py (lines ~1186, ~1273, ~1389, ~1495): return StreamingResponse( stream_wrapper(r, session, stream_chunks_handler), status_code=r.status, headers=dict(r.headers), # <-- Includes stale Content-Encoding: gzip )</code></pre> <p>The client receives:</p> <ul> <li>Header: <code>Content-Encoding: gzip</code> (tells client "this is compressed")</li> <li>Body: Plain text SSE chunks (already decompressed by aiohttp)</li> </ul> <p>The client tries to gunzip plain text → <code>ZlibError</code>.</p> <h3 id="fix" class="atx">Fix</h3> <p>Strip hop-by-hop / encoding headers that become stale after aiohttp auto-decompression:</p> <pre><code class="fenced-code-block language-python">_STRIP_PROXY_HEADERS = frozenset({ 'Content-Encoding', 'Content-Length', 'Transfer-Encoding', }) def _clean_proxy_headers(raw_headers) -> dict: return {k: v for k, v in raw_headers.items() if k not in _STRIP_PROXY_HEADERS} # At all 4 StreamingResponse locations: headers=_clean_proxy_headers(r.headers),</code></pre> <h3 id="impact" class="atx">Impact</h3> <ul> <li><strong>Web UI</strong>: None (browser path doesn't use the OpenAI proxy for SSE)</li> <li><strong>API clients</strong>: All desktop/programmatic clients that honor <code>Content-Encoding</code> headers are fixed</li> </ul> <hr> <h2 id="bug-2-premature-finish_reason-andquotstopandquot-on-first-sse-chunk-ollama-models" class="atx">Bug 2: Premature <code>finish_reason: "stop"</code> on First SSE Chunk (Ollama Models)</h2> <p><strong>File</strong>: <code>backend/open_webui/utils/misc.py</code><br><strong>Severity</strong>: Critical<br><strong>Affects</strong>: All Ollama models with thinking/reasoning (DeepSeek R1, Gemma 4) when accessed via API</p> <h3 id="symptom-1" class="atx">Symptom</h3> <p>API clients (OpenCode Desktop) see the first SSE chunk, then the stream appears to end. The model seems to hang after "let me think." The Web UI works fine.</p> <h3 id="root-cause-1" class="atx">Root Cause</h3> <p><code>openai_chat_chunk_message_template()</code> in <code>misc.py</code> uses truthy checks to determine when to set <code>finish_reason: "stop"</code>:</p> <pre><code class="fenced-code-block language-python"># Line ~501: if not content and not reasoning_content and not tool_calls: template['choices'][0]['finish_reason'] = 'stop'</code></pre> <p>When Ollama sends the first chunk for a reasoning model, both <code>content</code> and <code>thinking</code> are empty strings (<code>""</code>). Empty strings are <strong>falsy in Python</strong>, so the condition is True, and <code>finish_reason: "stop"</code> is set on the <strong>very first chunk</strong>.</p> <p>API clients that comply with the OpenAI spec (like <code>@ai-sdk/openai-compatible</code>) see <code>finish_reason: "stop"</code> and close the stream immediately -- before any reasoning or content tokens arrive.</p> <p>The Web UI is unaffected because its browser path never checks <code>finish_reason</code> for stream termination; it reads until <code>data: [DONE]</code>.</p> <h3 id="fix-1" class="atx">Fix</h3> <p>Only set <code>finish_reason: "stop"</code> when <code>usage</code> is present (which only happens on the final chunk when Ollama sends <code>done: true</code>):</p> <pre><code class="fenced-code-block language-python">if usage and not content and not reasoning_content and not tool_calls: template['choices'][0]['finish_reason'] = 'stop'</code></pre> <h3 id="impact-1" class="atx">Impact</h3> <ul> <li><strong>Web UI</strong>: None (browser path ignores <code>finish_reason</code>)</li> <li><strong>API clients</strong>: Stream continues correctly through reasoning and content phases</li> </ul> <hr> <h2 id="bug-3-missing-role-and-non-unique-chunk-ids-in-ollama-sse-conversion" class="atx">Bug 3: Missing <code>role</code> and Non-Unique Chunk IDs in Ollama SSE Conversion</h2> <p><strong>File</strong>: <code>backend/open_webui/utils/response.py</code><br><strong>Severity</strong>: Major<br><strong>Affects</strong>: All Ollama models when accessed via API</p> <h3 id="symptom-2" class="atx">Symptom</h3> <p>API clients fail to correlate SSE chunks or initialize the response correctly. Some SDKs silently drop chunks or fail to render content.</p> <h3 id="root-cause-two-issues" class="atx">Root Cause (Two Issues)</h3> <p><strong>3a: Missing <code>delta.role: "assistant"</code> on first chunk</strong></p> <p>The OpenAI SSE spec requires the first chunk to contain <code>delta.role: "assistant"</code>. The Ollama-to-OpenAI conversion in <code>convert_streaming_response_ollama_to_openai()</code> never emits <code>role</code>. OpenAI-compatible SDKs expect this field to initialize the message.</p> <p><strong>3b: Unique UUID per SSE chunk</strong></p> <p><code>openai_chat_message_template()</code> generates a new <code>uuid4()</code> for every chunk:</p> <pre><code class="fenced-code-block language-python">return { 'id': f'{model}-{str(uuid.uuid4())}', # NEW UUID EVERY CALL ... }</code></pre> <p>The OpenAI spec requires all chunks in a single completion to share the <strong>same <code>id</code></strong>. SDKs use this ID to correlate chunks into one response. With unique IDs per chunk, SDK chunk correlation breaks.</p> <p>For comparison:</p> <ul> <li>Anthropic/OpenAI upstream: consistent <code>msg_xxxxx</code> ID across all chunks ✓</li> <li>Ollama via Open WebUI: unique ID per chunk ✗</li> </ul> <h3 id="fix-2" class="atx">Fix</h3> <p>Generate one <code>chatcmpl-</code> prefixed ID per stream and add <code>role: "assistant"</code> to the first chunk:</p> <pre><code class="fenced-code-block language-python">async def convert_streaming_response_ollama_to_openai(ollama_streaming_response): first_chunk = True completion_id = f'chatcmpl-{uuid4()}' async for data in ollama_streaming_response.body_iterator: ... data['id'] = completion_id # Same ID for all chunks if first_chunk: data['choices'][0]['delta']['role'] = 'assistant' first_chunk = False ...</code></pre> <h3 id="impact-2" class="atx">Impact</h3> <ul> <li><strong>Web UI</strong>: None (browser path doesn't use chunk <code>id</code> or <code>delta.role</code>)</li> <li><strong>API clients</strong>: Proper chunk correlation and response initialization</li> </ul> <hr> <h2 id="bug-4-ollama-to-openai-conversion-doesnand39t-expose-reasoning-as-standard-content" class="atx">Bug 4: Ollama-to-OpenAI Conversion Doesn't Expose Reasoning as Standard Content</h2> <p><strong>File</strong>: <code>backend/open_webui/utils/response.py</code><br><strong>Severity</strong>: Feature Request<br><strong>Affects</strong>: Ollama reasoning models (DeepSeek R1, Gemma 4) when accessed via generic OpenAI-compatible clients</p> <h3 id="problem" class="atx">Problem</h3> <p>Ollama's native API sends reasoning in <code>message.thinking</code>. Open WebUI's conversion puts this in <code>delta.reasoning_content</code> -- a non-standard field that only DeepSeek-specific SDKs understand.</p> <p>Generic OpenAI-compatible SDKs (like <code>@ai-sdk/openai-compatible</code>) don't recognize <code>reasoning_content</code> and silently drop the reasoning tokens. Users see no thinking process.</p> <p>The Web UI handles this correctly because its middleware at line ~3763 explicitly reads <code>reasoning_content</code>, <code>reasoning</code>, and <code>thinking</code> fields and renders them as collapsible blocks.</p> <h3 id="suggested-enhancement" class="atx">Suggested Enhancement</h3> <p>Convert Ollama's <code>thinking</code> field into <code><think></code> tags inside the standard <code>content</code> field during the Ollama-to-OpenAI streaming conversion. This makes reasoning visible to ALL clients:</p> <pre><code class="fenced-code-block language-python">if reasoning_content: if not in_reasoning: in_reasoning = True message_content = "<think>\n" + reasoning_content else: message_content = reasoning_content reasoning_content = None # Don't pass as separate field elif in_reasoning and message_content: in_reasoning = False message_content = "\n</think>\n\n" + message_content</code></pre> <p>Open WebUI's browser middleware already detects <code><think></code> tags and renders them as collapsible reasoning blocks, so this is backward-compatible.</p> <h3 id="impact-3" class="atx">Impact</h3> <ul> <li><strong>Web UI</strong>: <code><think></code> tags rendered as collapsible reasoning (same as before)</li> <li><strong>API clients</strong>: Reasoning visible as standard <code>content</code> text with <code><think></code> tag markers</li> </ul> <hr> <h2 id="bug-5-client-provided-tools-crash-ollama-models-that-donand39t-support-tool-calling" class="atx">Bug 5: Client-Provided <code>tools</code> Crash Ollama Models That Don't Support Tool Calling</h2> <p><strong>File</strong>: <code>backend/open_webui/main.py</code><br><strong>Severity</strong>: Critical<br><strong>Affects</strong>: DeepSeek R1 and other Ollama models without native tool support, when called from API clients that send <code>tools</code></p> <h3 id="symptom-3" class="atx">Symptom</h3> <p>OpenCode Desktop sends two concurrent requests: one for chat, one for title generation (with <code>tools</code> and <code>tool_choice</code>). For DeepSeek R1, Ollama rejects the tools request with <code>"does not support tools"</code>. Open WebUI catches this exception but returns <code>null</code> to the client (the <code>except</code> handler at line ~1929 logs at DEBUG level and falls through without returning a proper response). The SDK receives <code>null</code>, crashes, and kills both concurrent streams.</p> <h3 id="root-cause-2" class="atx">Root Cause</h3> <p>API clients like <code>@ai-sdk/openai-compatible</code> automatically send <code>tools</code> if the caller supports them. Open WebUI's middleware checks <code>function_calling</code> capability for its own internal tools (MCP servers), but does NOT strip <strong>client-provided</strong> <code>tools</code> from <code>form_data</code>. These client tools pass through to Ollama, which rejects them for models without tool support.</p> <p>Additionally, the <code>except Exception</code> handler at <code>process_chat</code> (line ~1929) catches the error but only emits it via <code>event_emitter</code> (which is <code>None</code> for API clients). The function implicitly returns <code>None</code>, which FastAPI serializes as <code>null</code>.</p> <h3 id="fix-3" class="atx">Fix</h3> <p>Strip client-provided <code>tools</code> and <code>tool_choice</code> when the model doesn't have <code>function_calling: "native"</code>:</p> <pre><code class="fenced-code-block language-python">if ( form_data.get('tools') and model_info_params.get('function_calling') != 'native' and not form_data.get('params', {}).get('function_calling') == 'native' ): form_data.pop('tools', None) form_data.pop('tool_choice', None)</code></pre> <p>Also upgrade the error handler from <code>log.debug</code> to <code>log.warning</code> so API path failures are visible:</p> <pre><code class="fenced-code-block language-python">except Exception as e: log.warning(f"Error processing chat payload: {e}") # was log.debug</code></pre> <h3 id="impact-4" class="atx">Impact</h3> <ul> <li><strong>Web UI</strong>: None (Web UI tools use <code>tools_dict</code> in middleware, not client-provided <code>tools</code>)</li> <li><strong>API clients</strong>: Tools are stripped gracefully; request proceeds as normal chat. Models with <code>function_calling: "native"</code> (like Gemma 4) still receive tools correctly.</li> </ul> <hr> <h2 id="bug-6-no-token-usage-analytics-for-api-key-requests-no-chat_id" class="atx">Bug 6: No Token Usage Analytics for API-Key Requests (No <code>chat_id</code>)</h2> <p><strong>File</strong>: <code>backend/open_webui/utils/middleware.py</code><br><strong>Severity</strong>: Major (data loss -- no cost tracking for API clients)<br><strong>Affects</strong>: All models when accessed via API key without <code>chat_id</code>/<code>session_id</code> (OpenCode Desktop, Continue.dev, curl, custom scripts)</p> <h3 id="symptom-4" class="atx">Symptom</h3> <p>The <code>/admin/analytics</code> dashboard shows zero token usage for models used exclusively via API clients. Models used from the Web UI browser show correct analytics. The <code>chat_message</code> table has zero rows for API-only models (e.g., <code>pvy-senior-lead-dev</code> used from OpenCode Desktop).</p> <h3 id="root-cause-3" class="atx">Root Cause</h3> <p>The analytics pipeline requires <code>event_emitter</code> to be non-None, which requires <code>chat_id</code>, <code>session_id</code>, and <code>message_id</code> in the request metadata:</p> <pre><code class="fenced-code-block language-python"># middleware.py line 2750-2763: def get_event_emitter_and_caller(metadata): event_emitter = None event_caller = None if ( 'session_id' in metadata and metadata['session_id'] and 'chat_id' in metadata and metadata['chat_id'] and 'message_id' in metadata and metadata['message_id'] ): event_emitter = get_event_emitter(metadata) event_caller = get_event_call(metadata) return event_emitter, event_caller</code></pre> <p>API clients don't send <code>chat_id</code>/<code>session_id</code>/<code>message_id</code> because they manage their own conversation state. When <code>event_emitter</code> is <code>None</code>, the streaming handler falls to the passthrough branch:</p> <pre><code class="fenced-code-block language-python"># middleware.py line 3209: if event_emitter and event_caller: # Full handler: usage tracking, DB writes, tool execution, title generation ... else: # Line 4688-4722: Simple passthrough -- NO usage tracking, NO DB writes async def stream_wrapper(original_generator, events): async for data in original_generator: yield data</code></pre> <p>The usage data IS present in the SSE stream (Anthropic returns it with <code>stream_options: {"include_usage": true}</code>), but nobody reads it on the passthrough path.</p> <h3 id="data-flow-comparison" class="atx">Data Flow Comparison</h3> Path | chat_id | event_emitter | Usage Tracked | DB Write -- | -- | -- | -- | -- Web UI browser | Set by frontend | Non-None | Yes (line 3619-3622) | Yes (line 4614-4635) API client (OpenCode) | None | None | No (passthrough) | No <h3 id="suggested-fix" class="atx">Suggested Fix</h3> <p>For API-key requests without <code>chat_id</code>, the middleware should still extract usage from the final SSE chunk and write it to a dedicated <code>api_usage</code> table (or the existing <code>chat_message</code> table with a synthetic chat_id like <code>api:{user_id}:{timestamp}</code>). This enables cost tracking for all API consumers.</p> <p>Minimal approach -- extract usage in the passthrough branch:</p> <pre><code class="fenced-code-block language-python"># In the else branch at line 4688: else: async def stream_wrapper(original_generator, events): usage = None async for data in original_generator: # Extract usage from final chunk if present if isinstance(data, str) and data.startswith('data: '): try: chunk = json.loads(data[6:]) raw_usage = chunk.get('usage') if raw_usage: usage = normalize_usage(raw_usage) except (json.JSONDecodeError, ValueError): pass yield data # Write usage to DB even without chat_id if usage and metadata.get('user_id'): ChatMessages.create_api_usage_record( user_id=metadata['user_id'], model_id=form_data.get('model', ''), usage=usage, )</code></pre> <h3 id="impact-5" class="atx">Impact</h3> <ul> <li><strong>Web UI</strong>: None (already has full analytics)</li> <li><strong>API clients</strong>: Token usage tracked for cost monitoring and analytics dashboards</li> <li><strong>Verified</strong>: <code>chat_message</code> table has 0 rows for <code>pvy-senior-lead-dev</code> (OpenCode Desktop only), but 22+ rows with usage for <code>pvy-researcher</code> (Web UI browser)</li> </ul> <hr> <h2 id="bug-7-anthropic-openai-compat-endpoint-needs-stream_options-for-usage-in-sse" class="atx">Bug 7: Anthropic OpenAI-Compat Endpoint Needs <code>stream_options</code> for Usage in SSE</h2> <p><strong>File</strong>: <code>backend/open_webui/routers/openai.py</code><br><strong>Severity</strong>: Minor (safety net -- most clients already send it)<br><strong>Affects</strong>: Anthropic models via <code>/v1/chat/completions</code> when client doesn't send <code>stream_options</code></p> <h3 id="symptom-5" class="atx">Symptom</h3> <p>When streaming from Anthropic's OpenAI-compatible endpoint without <code>stream_options: {"include_usage": true}</code>, the final SSE chunk contains no <code>usage</code> field. The middleware's <code>normalize_usage</code> at line 3619 finds nothing to process.</p> <h3 id="root-cause-4" class="atx">Root Cause</h3> <p>Anthropic's <code>/v1/chat/completions</code> endpoint (OpenAI-compat) supports <code>stream_options</code> (confirmed in their docs: "Fully supported"), but only returns usage when explicitly requested. Unlike their native <code>/v1/messages</code> endpoint which always includes usage in <code>message_start</code> and <code>message_delta</code> events.</p> <p>Open WebUI doesn't inject <code>stream_options</code> -- it relies on the client to send it. The Web UI browser does send it (when <code>model.info.meta.capabilities.usage</code> is <code>true</code>), but API scripts and the OnlyOffice plugin don't.</p> <h3 id="fix-4" class="atx">Fix</h3> <p>Inject <code>stream_options</code> for Anthropic requests on the OpenAI-compat path:</p> <pre><code class="fenced-code-block language-python"># After URL routing, before payload serialization: if ( is_anthropic_url(url) and not _is_anthropic_native and isinstance(payload, dict) and payload.get('stream') ): payload.setdefault('stream_options', {'include_usage': True})</code></pre> <p><code>setdefault</code> preserves client-provided values. <code>isinstance(payload, dict)</code> guards against the native proxy path where payload is already serialized to JSON string.</p> <h3 id="impact-6" class="atx">Impact</h3> <ul> <li><strong>Web UI</strong>: None (already sends <code>stream_options</code> for models with <code>usage</code> capability)</li> <li><strong>OpenCode Desktop</strong>: None (SDK already sends <code>stream_options</code>)</li> <li><strong>API scripts / curl / OnlyOffice</strong>: Usage now included in SSE stream</li> </ul> <hr> <h2 id="file-structure" class="atx">File Structure</h2> <pre><code class="fenced-code-block">patches/ └── backend/ └── open_webui/ ├── main.py # Bug 5: Strip client tools + error logging ├── routers/ │ └── openai.py # Bug 1: Strip stale Content-Encoding headers │ # Bug 7: Inject stream_options for Anthropic └── utils/ ├── misc.py # Bug 2: Fix premature finish_reason stop ├── response.py # Bug 3 + 4: Role, chunk ID, reasoning tags └── middleware.py # Bug 6: API analytics gap (design issue, no patch yet)</code></pre> <h2 id="reproduction" class="atx">Reproduction</h2> <p>All bugs are reproducible with:</p> <ul> <li>Open WebUI <code>main</code> branch (2026-04-07, updated 2026-04-11)</li> <li>Ollama with DeepSeek R1 32B or Gemma 4 31B</li> <li>Anthropic Claude via API key</li> <li>Any OpenAI-compatible API client (curl with <code>--compressed</code>, OpenCode Desktop, etc.)</li> <li>Upstream API with gzip (Anthropic via Cloudflare)</li> </ul> <pre><code class="fenced-code-block language-bash"># Bug 1: Stale Content-Encoding curl --compressed -sN -H "Authorization: Bearer $KEY" \ -d '{"model":"claude-model","messages":[{"role":"user","content":"hi"}],"stream":true}' \ http://localhost:3000/api/chat/completions # Result: ZlibError or garbled output # Bug 2+3: Premature stop + missing role curl -sN -H "Authorization: Bearer $KEY" \ -d '{"model":"deepseek-r1:32b","messages":[{"role":"user","content":"hi"}],"stream":true}' \ http://localhost:3000/api/chat/completions # Result: First chunk has finish_reason:"stop", no role, unique IDs per chunk # Bug 5: Tools crash curl -s -H "Authorization: Bearer $KEY" \ -d '{"model":"deepseek-r1:32b","messages":[{"role":"user","content":"hi"}],"tools":[{"type":"function","function":{"name":"test","parameters":{"type":"object","properties":{}}}}],"stream":true}' \ http://localhost:3000/api/chat/completions # Result: null # Bug 6: API analytics gap curl -sN -H "Authorization: Bearer $KEY" \ -d '{"model":"claude-opus-4-6","messages":[{"role":"user","content":"hi"}],"stream":true}' \ http://localhost:3000/api/chat/completions # Result: Stream works, usage in final chunk, but NO row in chat_message table # Bug 7: Missing stream_options curl -sN -H "Authorization: Bearer $KEY" \ -d '{"model":"claude-opus-4-6","messages":[{"role":"user","content":"hi"}],"stream":true}' \ http://localhost:3000/api/chat/completions # Without patch: final chunk has no "usage" field # With patch: final chunk includes "usage":{"prompt_tokens":N,"completion_tokens":N,...}</code></pre> # Open WebUI -- Bug Reports & Feature Requests > These findings were discovered while building an enterprise AI platform using Open WebUI > as the API gateway for OpenAI-compatible desktop clients (OpenCode Desktop, Continue.dev). > All bugs were found in the **API path** (direct `/api/chat/completions` calls without WebSocket/event_emitter). > The Web UI chat (browser path with event_emitter) was working correctly in all cases. > > **Prepared by**: PVY.swiss > **Open WebUI Version**: main (as of 2026-04-07) > **Patched files in**: `patches/backend/open_webui/` > **Excludes**: OnlyOffice/PVYoffice integrations (proprietary) --- ## Bug 1: Stale `Content-Encoding` Header Forwarded from Upstream APIs **File**: `backend/open_webui/routers/openai.py` **Severity**: Critical **Affects**: All models routed through OpenAI-compatible upstream APIs (Anthropic, OpenAI, etc.) ### Symptom Desktop clients (OpenCode Desktop, OnlyOffice plugin) receive `Decompression error: ZlibError` when streaming SSE from Claude/OpenAI models. The Web UI works fine. ### Root Cause `aiohttp` (Open WebUI's HTTP client) sends `Accept-Encoding: gzip` to upstream APIs. When the upstream responds with `Content-Encoding: gzip`, aiohttp **auto-decompresses** the response body but **keeps the `Content-Encoding: gzip` header** in the response headers (known behavior, see [[aiohttp#4462](https://github.com/aio-libs/aiohttp/issues/4462)](https://github.com/aio-libs/aiohttp/issues/4462)). Open WebUI's `openai.py` forwards all upstream headers verbatim to the client: ```python # 4 locations in openai.py (lines ~1186, ~1273, ~1389, ~1495): return StreamingResponse( stream_wrapper(r, session, stream_chunks_handler), status_code=r.status, headers=dict(r.headers), # <-- Includes stale Content-Encoding: gzip ) ``` The client receives: - Header: `Content-Encoding: gzip` (tells client "this is compressed") - Body: Plain text SSE chunks (already decompressed by aiohttp) The client tries to gunzip plain text → `ZlibError`. ### Fix Strip hop-by-hop / encoding headers that become stale after aiohttp auto-decompression: ```python _STRIP_PROXY_HEADERS = frozenset({ 'Content-Encoding', 'Content-Length', 'Transfer-Encoding', }) def _clean_proxy_headers(raw_headers) -> dict: return {k: v for k, v in raw_headers.items() if k not in _STRIP_PROXY_HEADERS} # At all 4 StreamingResponse locations: headers=_clean_proxy_headers(r.headers), ``` ### Impact - **Web UI**: None (browser path doesn't use the OpenAI proxy for SSE) - **API clients**: All desktop/programmatic clients that honor `Content-Encoding` headers are fixed --- ## Bug 2: Premature `finish_reason: "stop"` on First SSE Chunk (Ollama Models) **File**: `backend/open_webui/utils/misc.py` **Severity**: Critical **Affects**: All Ollama models with thinking/reasoning (DeepSeek R1, Gemma 4) when accessed via API ### Symptom API clients (OpenCode Desktop) see the first SSE chunk, then the stream appears to end. The model seems to hang after "let me think." The Web UI works fine. ### Root Cause `openai_chat_chunk_message_template()` in `misc.py` uses truthy checks to determine when to set `finish_reason: "stop"`: ```python # Line ~501: if not content and not reasoning_content and not tool_calls: template['choices'][0]['finish_reason'] = 'stop' ``` When Ollama sends the first chunk for a reasoning model, both `content` and `thinking` are empty strings (`""`). Empty strings are **falsy in Python**, so the condition is True, and `finish_reason: "stop"` is set on the **very first chunk**. API clients that comply with the OpenAI spec (like `@ai-sdk/openai-compatible`) see `finish_reason: "stop"` and close the stream immediately -- before any reasoning or content tokens arrive. The Web UI is unaffected because its browser path never checks `finish_reason` for stream termination; it reads until `data: [DONE]`. ### Fix Only set `finish_reason: "stop"` when `usage` is present (which only happens on the final chunk when Ollama sends `done: true`): ```python if usage and not content and not reasoning_content and not tool_calls: template['choices'][0]['finish_reason'] = 'stop' ``` ### Impact - **Web UI**: None (browser path ignores `finish_reason`) - **API clients**: Stream continues correctly through reasoning and content phases --- ## Bug 3: Missing `role` and Non-Unique Chunk IDs in Ollama SSE Conversion **File**: `backend/open_webui/utils/response.py` **Severity**: Major **Affects**: All Ollama models when accessed via API ### Symptom API clients fail to correlate SSE chunks or initialize the response correctly. Some SDKs silently drop chunks or fail to render content. ### Root Cause (Two Issues) **3a: Missing `delta.role: "assistant"` on first chunk** The OpenAI SSE spec requires the first chunk to contain `delta.role: "assistant"`. The Ollama-to-OpenAI conversion in `convert_streaming_response_ollama_to_openai()` never emits `role`. OpenAI-compatible SDKs expect this field to initialize the message. **3b: Unique UUID per SSE chunk** `openai_chat_message_template()` generates a new `uuid4()` for every chunk: ```python return { 'id': f'{model}-{str(uuid.uuid4())}', # NEW UUID EVERY CALL ... } ``` The OpenAI spec requires all chunks in a single completion to share the **same `id`**. SDKs use this ID to correlate chunks into one response. With unique IDs per chunk, SDK chunk correlation breaks. For comparison: - Anthropic/OpenAI upstream: consistent `msg_xxxxx` ID across all chunks ✓ - Ollama via Open WebUI: unique ID per chunk ✗ ### Fix Generate one `chatcmpl-` prefixed ID per stream and add `role: "assistant"` to the first chunk: ```python async def convert_streaming_response_ollama_to_openai(ollama_streaming_response): first_chunk = True completion_id = f'chatcmpl-{uuid4()}' async for data in ollama_streaming_response.body_iterator: ... data['id'] = completion_id # Same ID for all chunks if first_chunk: data['choices'][0]['delta']['role'] = 'assistant' first_chunk = False ... ``` ### Impact - **Web UI**: None (browser path doesn't use chunk `id` or `delta.role`) - **API clients**: Proper chunk correlation and response initialization --- ## Bug 4: Ollama-to-OpenAI Conversion Doesn't Expose Reasoning as Standard Content **File**: `backend/open_webui/utils/response.py` **Severity**: Feature Request **Affects**: Ollama reasoning models (DeepSeek R1, Gemma 4) when accessed via generic OpenAI-compatible clients ### Problem Ollama's native API sends reasoning in `message.thinking`. Open WebUI's conversion puts this in `delta.reasoning_content` -- a non-standard field that only DeepSeek-specific SDKs understand. Generic OpenAI-compatible SDKs (like `@ai-sdk/openai-compatible`) don't recognize `reasoning_content` and silently drop the reasoning tokens. Users see no thinking process. The Web UI handles this correctly because its middleware at line ~3763 explicitly reads `reasoning_content`, `reasoning`, and `thinking` fields and renders them as collapsible blocks. ### Suggested Enhancement Convert Ollama's `thinking` field into `<think>` tags inside the standard `content` field during the Ollama-to-OpenAI streaming conversion. This makes reasoning visible to ALL clients: ```python if reasoning_content: if not in_reasoning: in_reasoning = True message_content = "<think>\n" + reasoning_content else: message_content = reasoning_content reasoning_content = None # Don't pass as separate field elif in_reasoning and message_content: in_reasoning = False message_content = "\n</think>\n\n" + message_content ``` Open WebUI's browser middleware already detects `<think>` tags and renders them as collapsible reasoning blocks, so this is backward-compatible. ### Impact - **Web UI**: `<think>` tags rendered as collapsible reasoning (same as before) - **API clients**: Reasoning visible as standard `content` text with `<think>` tag markers --- ## Bug 5: Client-Provided `tools` Crash Ollama Models That Don't Support Tool Calling **File**: `backend/open_webui/main.py` **Severity**: Critical **Affects**: DeepSeek R1 and other Ollama models without native tool support, when called from API clients that send `tools` ### Symptom OpenCode Desktop sends two concurrent requests: one for chat, one for title generation (with `tools` and `tool_choice`). For DeepSeek R1, Ollama rejects the tools request with `"does not support tools"`. Open WebUI catches this exception but returns `null` to the client (the `except` handler at line ~1929 logs at DEBUG level and falls through without returning a proper response). The SDK receives `null`, crashes, and kills both concurrent streams. ### Root Cause API clients like `@ai-sdk/openai-compatible` automatically send `tools` if the caller supports them. Open WebUI's middleware checks `function_calling` capability for its own internal tools (MCP servers), but does NOT strip **client-provided** `tools` from `form_data`. These client tools pass through to Ollama, which rejects them for models without tool support. Additionally, the `except Exception` handler at `process_chat` (line ~1929) catches the error but only emits it via `event_emitter` (which is `None` for API clients). The function implicitly returns `None`, which FastAPI serializes as `null`. ### Fix Strip client-provided `tools` and `tool_choice` when the model doesn't have `function_calling: "native"`: ```python if ( form_data.get('tools') and model_info_params.get('function_calling') != 'native' and not form_data.get('params', {}).get('function_calling') == 'native' ): form_data.pop('tools', None) form_data.pop('tool_choice', None) ``` Also upgrade the error handler from `log.debug` to `log.warning` so API path failures are visible: ```python except Exception as e: log.warning(f"Error processing chat payload: {e}") # was log.debug ``` ### Impact - **Web UI**: None (Web UI tools use `tools_dict` in middleware, not client-provided `tools`) - **API clients**: Tools are stripped gracefully; request proceeds as normal chat. Models with `function_calling: "native"` (like Gemma 4) still receive tools correctly. --- ## Bug 6: No Token Usage Analytics for API-Key Requests (No `chat_id`) **File**: `backend/open_webui/utils/middleware.py` **Severity**: Major (data loss -- no cost tracking for API clients) **Affects**: All models when accessed via API key without `chat_id`/`session_id` (OpenCode Desktop, Continue.dev, curl, custom scripts) ### Symptom The `/admin/analytics` dashboard shows zero token usage for models used exclusively via API clients. Models used from the Web UI browser show correct analytics. The `chat_message` table has zero rows for API-only models (e.g., `pvy-senior-lead-dev` used from OpenCode Desktop). ### Root Cause The analytics pipeline requires `event_emitter` to be non-None, which requires `chat_id`, `session_id`, and `message_id` in the request metadata: ```python # middleware.py line 2750-2763: def get_event_emitter_and_caller(metadata): event_emitter = None event_caller = None if ( 'session_id' in metadata and metadata['session_id'] and 'chat_id' in metadata and metadata['chat_id'] and 'message_id' in metadata and metadata['message_id'] ): event_emitter = get_event_emitter(metadata) event_caller = get_event_call(metadata) return event_emitter, event_caller ``` API clients don't send `chat_id`/`session_id`/`message_id` because they manage their own conversation state. When `event_emitter` is `None`, the streaming handler falls to the passthrough branch: ```python # middleware.py line 3209: if event_emitter and event_caller: # Full handler: usage tracking, DB writes, tool execution, title generation ... else: # Line 4688-4722: Simple passthrough -- NO usage tracking, NO DB writes async def stream_wrapper(original_generator, events): async for data in original_generator: yield data ``` The usage data IS present in the SSE stream (Anthropic returns it with `stream_options: {"include_usage": true}`), but nobody reads it on the passthrough path. ### Data Flow Comparison | Path | `chat_id` | `event_emitter` | Usage Tracked | DB Write | | --- | --- | --- | --- | --- | | Web UI browser | Set by frontend | Non-None | Yes (line 3619-3622) | Yes (line 4614-4635) | | API client (OpenCode) | `None` | `None` | **No** (passthrough) | **No** | ### Suggested Fix For API-key requests without `chat_id`, the middleware should still extract usage from the final SSE chunk and write it to a dedicated `api_usage` table (or the existing `chat_message` table with a synthetic chat_id like `api:{user_id}:{timestamp}`). This enables cost tracking for all API consumers. Minimal approach -- extract usage in the passthrough branch: ```python # In the else branch at line 4688: else: async def stream_wrapper(original_generator, events): usage = None async for data in original_generator: # Extract usage from final chunk if present if isinstance(data, str) and data.startswith('data: '): try: chunk = json.loads(data[6:]) raw_usage = chunk.get('usage') if raw_usage: usage = normalize_usage(raw_usage) except (json.JSONDecodeError, ValueError): pass yield data # Write usage to DB even without chat_id if usage and metadata.get('user_id'): ChatMessages.create_api_usage_record( user_id=metadata['user_id'], model_id=form_data.get('model', ''), usage=usage, ) ``` ### Impact - **Web UI**: None (already has full analytics) - **API clients**: Token usage tracked for cost monitoring and analytics dashboards - **Verified**: `chat_message` table has 0 rows for `pvy-senior-lead-dev` (OpenCode Desktop only), but 22+ rows with usage for `pvy-researcher` (Web UI browser) --- ## Bug 7: Anthropic OpenAI-Compat Endpoint Needs `stream_options` for Usage in SSE **File**: `backend/open_webui/routers/openai.py` **Severity**: Minor (safety net -- most clients already send it) **Affects**: Anthropic models via `/v1/chat/completions` when client doesn't send `stream_options` ### Symptom When streaming from Anthropic's OpenAI-compatible endpoint without `stream_options: {"include_usage": true}`, the final SSE chunk contains no `usage` field. The middleware's `normalize_usage` at line 3619 finds nothing to process. ### Root Cause Anthropic's `/v1/chat/completions` endpoint (OpenAI-compat) supports `stream_options` (confirmed in their docs: "Fully supported"), but only returns usage when explicitly requested. Unlike their native `/v1/messages` endpoint which always includes usage in `message_start` and `message_delta` events. Open WebUI doesn't inject `stream_options` -- it relies on the client to send it. The Web UI browser does send it (when `model.info.meta.capabilities.usage` is `true`), but API scripts and the OnlyOffice plugin don't. ### Fix Inject `stream_options` for Anthropic requests on the OpenAI-compat path: ```python # After URL routing, before payload serialization: if ( is_anthropic_url(url) and not _is_anthropic_native and isinstance(payload, dict) and payload.get('stream') ): payload.setdefault('stream_options', {'include_usage': True}) ``` `setdefault` preserves client-provided values. `isinstance(payload, dict)` guards against the native proxy path where payload is already serialized to JSON string. ### Impact - **Web UI**: None (already sends `stream_options` for models with `usage` capability) - **OpenCode Desktop**: None (SDK already sends `stream_options`) - **API scripts / curl / OnlyOffice**: Usage now included in SSE stream --- ## File Structure ``` patches/ └── backend/ └── open_webui/ ├── main.py # Bug 5: Strip client tools + error logging ├── routers/ │ └── openai.py # Bug 1: Strip stale Content-Encoding headers │ # Bug 7: Inject stream_options for Anthropic └── utils/ ├── misc.py # Bug 2: Fix premature finish_reason stop ├── response.py # Bug 3 + 4: Role, chunk ID, reasoning tags └── middleware.py # Bug 6: API analytics gap (design issue, no patch yet) ``` ## Reproduction All bugs are reproducible with: - Open WebUI `main` branch (2026-04-07, updated 2026-04-11) - Ollama with DeepSeek R1 32B or Gemma 4 31B - Anthropic Claude via API key - Any OpenAI-compatible API client (curl with `--compressed`, OpenCode Desktop, etc.) - Upstream API with gzip (Anthropic via Cloudflare) ```bash # Bug 1: Stale Content-Encoding curl --compressed -sN -H "Authorization: Bearer $KEY" \ -d '{"model":"claude-model","messages":[{"role":"user","content":"hi"}],"stream":true}' \ http://localhost:3000/api/chat/completions # Result: ZlibError or garbled output # Bug 2+3: Premature stop + missing role curl -sN -H "Authorization: Bearer $KEY" \ -d '{"model":"deepseek-r1:32b","messages":[{"role":"user","content":"hi"}],"stream":true}' \ http://localhost:3000/api/chat/completions # Result: First chunk has finish_reason:"stop", no role, unique IDs per chunk # Bug 5: Tools crash curl -s -H "Authorization: Bearer $KEY" \ -d '{"model":"deepseek-r1:32b","messages":[{"role":"user","content":"hi"}],"tools":[{"type":"function","function":{"name":"test","parameters":{"type":"object","properties":{}}}}],"stream":true}' \ http://localhost:3000/api/chat/completions # Result: null # Bug 6: API analytics gap curl -sN -H "Authorization: Bearer $KEY" \ -d '{"model":"claude-opus-4-6","messages":[{"role":"user","content":"hi"}],"stream":true}' \ http://localhost:3000/api/chat/completions # Result: Stream works, usage in final chunk, but NO row in chat_message table # Bug 7: Missing stream_options curl -sN -H "Authorization: Bearer $KEY" \ -d '{"model":"claude-opus-4-6","messages":[{"role":"user","content":"hi"}],"stream":true}' \ http://localhost:3000/api/chat/completions # Without patch: final chunk has no "usage" field # With patch: final chunk includes "usage":{"prompt_tokens":N,"completion_tokens":N,...} ``` ### Desired Solution you'd like Native Tooling Support, Anthropic SKD Chaching using the preferred OpenAI-API Method on Desktop Tools such as OpenCode Desktop for popular on Ollama deployed Models like Claude, Sonnet, Qwen, Mistral, Gemma, GLM 5.1/Air 4.5 DeepSeek R1, Deepseek Coder 2.5, Deepseek R3. Patches attached below. ### Alternatives Considered The Fixes also enables native Tool Calling of Ollama hosted Models for Gemma, Anthropic, GLM, Qwen using OpenCode Desktop over the preferred OpenAI-API. That's why we touched the middleware too. Works ### Additional Context [submitted-as_FR.zip](https://github.com/user-attachments/files/26927942/submitted-as_FR.zip)

GiteaMirror closed this issue

2026-04-25 09:47:56 -05:00

GiteaMirror commented

2026-04-25 09:47:58 -05:00

@Classic298 commented on GitHub (Apr 21, 2026):

This is impossible to track properly please open standalone issues.

And include version numbers what version this is affected in and other information like setup information.

There is a bug report form for a reason and you skipped it.

@Classic298 commented on GitHub (Apr 21, 2026): This is impossible to track properly please open standalone issues. And include version numbers what version this is affected in and other information like setup information. There is a bug report form for a reason and you skipped it.

GiteaMirror commented

2026-04-25 09:47:59 -05:00

@pvyswiss commented on GitHub (Apr 21, 2026):

It is possible to track, Each Issue and Fix is described. Files attached. Your PR Guidelines says only PR for Issues. I made this way, since certain things you steer over the Ollama Local API Ports. So my idea was, let you guys cherry pick. But I can if you want to open an issue for each and propose it individual. BTW I also fixed the PDF Render Engine. Black PDFs in Dark Theme is ugly. Gonna look into the opening issue for each after my working hours.

@pvyswiss commented on GitHub (Apr 21, 2026): It is possible to track, Each Issue and Fix is described. Files attached. Your PR Guidelines says only PR for Issues. I made this way, since certain things you steer over the Ollama Local API Ports. So my idea was, let you guys cherry pick. But I can if you want to open an issue for each and propose it individual. BTW I also fixed the PDF Render Engine. Black PDFs in Dark Theme is ugly. Gonna look into the opening issue for each after my working hours.

GiteaMirror commented

2026-04-25 09:47:59 -05:00

@Classic298 commented on GitHub (Apr 21, 2026):

Where does it say only PR for issues?

@Classic298 commented on GitHub (Apr 21, 2026): Where does it say only PR for issues?

GiteaMirror commented

2026-04-25 09:48:00 -05:00

@pvyswiss commented on GitHub (Apr 21, 2026):

Ok, here we go, backward compatible to branch stream 0.9

Issue URL
1 Stale Content-Encoding header (ZlibError) #23920
2 Premature finish_reason "stop" #23921
3 Missing delta.role + non-unique chunk IDs #23922
4 Expose reasoning as tags #23923
5 Client tools crash non-native models #23924
6 No API analytics without chat_id #23926
7 Inject stream_options for Anthropic #23927

and for each I created a PR

<head></head> # | Issue | PR | Branch -- | -- | -- | -- 1 | #23920 | #23928 | fix/strip-stale-content-encoding 2 | #23921 | #23929 | fix/premature-finish-reason-stop 3 | #23922 | #23930 | fix/ollama-sse-role-and-chunk-id 4 | #23923 | #23931 | feat/ollama-reasoning-as-think-tags 5 | #23924 | #23932 | fix/strip-tools-non-native-models 6 | #23926 | No PR (design issue, code suggestion in issue) | -- 7 | #23927 | #23933 | fix/inject-stream-options-anthropic

# Issue PR Branch
1 #23920 #23928 fix/strip-stale-content-encoding
2 #23921 #23929 fix/premature-finish-reason-stop
3 #23922 #23930 fix/ollama-sse-role-and-chunk-id
4 #23923 #23931 feat/ollama-reasoning-as-think-tags
5 #23924 #23932 fix/strip-tools-non-native-models
6 #23926 No PR (design issue, code suggestion in issue) --
7 #23927 #23933 fix/inject-stream-options-anthropic

If you can look at this, and take a benefit out of it, its awesome.
All PRs:

Reference #23917 (original combined issue)
Are minimal, self-contained patches against current main (v0.9.0)
Have one branch per fix for easy cherry-picking
Are from pvyswiss/open-webui fork

@pvyswiss commented on GitHub (Apr 21, 2026): Ok, here we go, backward compatible to branch stream 0.9 Issue URL 1 Stale Content-Encoding header (ZlibError) #23920 2 Premature finish_reason "stop" #23921 3 Missing delta.role + non-unique chunk IDs #23922 4 Expose reasoning as <think> tags #23923 5 Client tools crash non-native models #23924 6 No API analytics without chat_id #23926 7 Inject stream_options for Anthropic #23927 and for each I created a PR <head></head> # | Issue | PR | Branch -- | -- | -- | -- 1 | #23920 | #23928 | fix/strip-stale-content-encoding 2 | #23921 | #23929 | fix/premature-finish-reason-stop 3 | #23922 | #23930 | fix/ollama-sse-role-and-chunk-id 4 | #23923 | #23931 | feat/ollama-reasoning-as-think-tags 5 | #23924 | #23932 | fix/strip-tools-non-native-models 6 | #23926 | No PR (design issue, code suggestion in issue) | -- 7 | #23927 | #23933 | fix/inject-stream-options-anthropic <br class="Apple-interchange-newline"># Issue PR Branch 1 [#23920](https://github.com/open-webui/open-webui/issues/23920) [#23928](https://github.com/open-webui/open-webui/pull/23928) fix/strip-stale-content-encoding 2 [#23921](https://github.com/open-webui/open-webui/issues/23921) [#23929](https://github.com/open-webui/open-webui/pull/23929) fix/premature-finish-reason-stop 3 [#23922](https://github.com/open-webui/open-webui/issues/23922) [#23930](https://github.com/open-webui/open-webui/pull/23930) fix/ollama-sse-role-and-chunk-id 4 [#23923](https://github.com/open-webui/open-webui/issues/23923) [#23931](https://github.com/open-webui/open-webui/pull/23931) feat/ollama-reasoning-as-think-tags 5 [#23924](https://github.com/open-webui/open-webui/issues/23924) [#23932](https://github.com/open-webui/open-webui/pull/23932) fix/strip-tools-non-native-models 6 [#23926](https://github.com/open-webui/open-webui/issues/23926) No PR (design issue, code suggestion in issue) -- 7 [#23927](https://github.com/open-webui/open-webui/issues/23927) [#23933](https://github.com/open-webui/open-webui/pull/23933) fix/inject-stream-options-anthropic If you can look at this, and take a benefit out of it, its awesome. All PRs: Reference #23917 (original combined issue) Are minimal, self-contained patches against current main (v0.9.0) Have one branch per fix for easy cherry-picking Are from pvyswiss/open-webui fork

GiteaMirror commented

2026-04-25 09:48:02 -05:00

@Classic298 commented on GitHub (Apr 21, 2026):

thank you

@Classic298 commented on GitHub (Apr 21, 2026): thank you

GiteaMirror commented

2026-04-25 09:48:03 -05:00

@pvyswiss commented on GitHub (Apr 21, 2026):

Thank you too, its a cool project

@pvyswiss commented on GitHub (Apr 21, 2026): Thank you too, its a cool project

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/open-webui#35635

Path	chat_id	event_emitter	Usage Tracked	DB Write
Web UI browser	Set by frontend	Non-None	Yes (line 3619-3622)	Yes (line 4614-4635)
API client (OpenCode)	None	None	No (passthrough)	No