mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 03:18:23 -05:00
[GH-ISSUE #23855] issue: Merge Responses using Google/Anthropic API model fails to return response, caused by browser CONTENT_DECODING_FAILED. Proposed fix included. #58758
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @jamesc0ry on GitHub (Apr 17, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/23855
Check Existing Issues
Installation Method
Docker
Open WebUI Version
v0.9.0
Ollama Version (if applicable)
0.20.7
Operating System
MacOS 15.4.1
Browser (if applicable)
Chrome 147.0.7727.56
Confirmation
README.md.Expected Behavior
In a conversation containing two parallel models, with at least one being an Anthropic or Google external model, clicking the Merge Responses button with the Google/Anthropic model selected should use that model to generate a merged response.
Actual Behavior
Clicking the Merge Responses button produces no response output in the UI. Browser console shows ERR_CONTENT_DECODING_FAILED 200 on the /api/v1/tasks/moa/completions request.
Steps to Reproduce
AI description I personally inspected for accuracy:
Prerequisites:
Steps to reproduce:
Open
http://localhost:3000in Chrome. Create an admin account on first launch.Go to Admin Panel → Settings → Connections. Add an OpenAI API connection with your API key. Add an Anthropic and/or Google Gemini API connection with your API key. Click Save.
Start a new chat. Click the
+icon in the model selector to add a second model. Select at least one or more models from either Google or Anthropic. Ensure you have at least two models to generate parallel responses.Open chrome browser DevTools (F12) → Network tab.
Send any prompt (e.g., "hello") and wait for both models to finish responding.
Click the Merge Responses button.
Expected behavior: A merged response is generated and displayed.
Actual behavior: No response appears. The Network tab shows a request to
/api/v1/tasks/moa/completionsreturning200 OKbut Chrome reportsERR_CONTENT_DECODING_FAILED.gpt-4o). The merge succeeds. Alternatively, just select an OpenAI model response before clicking the Merge Responses button.Logs & Screenshots
Browser
Relevant log:
POST http://localhost:3000/api/v1/tasks/moa/completions net::ERR_CONTENT_DECODING_FAILED 200 (OK)
Relevant response headers:
HTTP/1.1 200 OK
date: Fri, 17 Apr 2026 21:00:24 GMT
server: uvicorn
date: Fri, 17 Apr 2026 21:00:26 GMT
content-type: text/event-stream
transfer-encoding: chunked
content-encoding: gzip
server: cloudflare
x-process-time: 1
access-control-allow-origin: http://localhost:3000
access-control-allow-credentials: true
Docker
2026-04-17 21:19:50.563 | INFO | uvicorn.protocols.http.httptools_impl:send:483 - 172.66.0.243:31213 - "POST /api/v1/tasks/moa/completions HTTP/1.1" 200
Additional Information
I investigated the code and I'd like to propose a fix. In summary, I believe this is caused by errors in openai.py, which in part implements /api/v1/tasks/moa/completions. aiohttp automatically decodes response body but leaves the headers intact. The current logic does not strip these headers, causing the browser to fail to parse the response correctly. This is because Anthropic/Google APIs return responses with a content-encoding header. My changes mirror logic I observed in terminals.py that already strips this header.
I can verify that the following changes fixed the behavior on my end. AI generated proposal that I reviewed:
Proposed fix
Surgical fix (validated locally)
Two hunks in
backend/open_webui/routers/openai.py.Hunk 1 — add the constant near the top of the module (around line 69):
Hunk 2 — filter the forwarded headers on the SSE return inside
generate_chat_completion(around line 1222):Why this works. The shared aiohttp session uses the default
auto_decompress=True, sor.contentyields already-decoded plaintext. Forwarding the upstreamContent-Encoding: gzipalong with a decoded body is what makes the browser fail. Stripping that header (plus the three hop-by-hop / transport headers that Starlette/uvicorn will regenerate for the outgoing hop anyway) aligns the forwarded headers with the actual body the browser will see.Verified end-to-end against Gemini and Anthropic OpenAI-compat connections —
ERR_CONTENT_DECODING_FAILEDis gone, streaming renders normally, and OpenAI-backed paths are unchanged (they never emitted the offending header to begin with).Precedent in the codebase
This pattern is already present at
backend/open_webui/routers/terminals.py— the terminal-server reverse proxy defines the same four-element frozenset and applies the same case-insensitive filter before building itsStreamingResponse. The diff above is a deliberate copy of that pattern, so the surgical fix doesn't introduce a new idea, it just propagates an existing one to where it's also needed.Broader cleanup proposal (concept only)
Once the surgical fix lands, the same
headers=dict(r.headers)shape exists at three more spots inopenai.py(the embeddings passthrough, the Responses API proxy, and the generic/{path:path}passthrough gated byENABLE_OPENAI_API_PASSTHROUGH) and once inollama.py'ssend_request. All share the identical latent defect — they're just less frequently exercised than MOA / chat completions.Rather than repeat the constant and the filter comprehension at each site (which would then be a third, fourth, fifth copy of what
terminals.pyalready has), I'd propose extracting a small shared helper — afilter_upstream_headers(headers)utility — and colocating it withstream_wrapperinbackend/open_webui/utils/session_pool.py.The rationale for that location:
session_pool.pyalready owns both ends of the "upstream aiohttp → downstream StreamingResponse" boundary. It creates the session that performs the decompression, and it ownsstream_wrapperwhich yields the decoded body. Header filtering is the missing third leg of the same invariant — decoded body requires decoded headers — and having the helper sit next to its siblings means future proxy sites naturally reach for it.Concretely, the broader cleanup would:
stream_wrapper.dict(r.headers)call sites with calls to the helper.terminals.py.Happy to keep the surgical fix isolated as its own PR and follow up with the cleanup in a second PR, or roll them together — whichever you prefer.