[GH-ISSUE #23855] issue: Merge Responses using Google/Anthropic API model fails to return response, caused by browser CONTENT_DECODING_FAILED. Proposed fix included. #58758

Open
opened 2026-05-05 23:52:00 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @jamesc0ry on GitHub (Apr 17, 2026).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/23855

Check Existing Issues

  • I have searched for any existing and/or related issues.
  • I have searched for any existing and/or related discussions.
  • I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!).
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.9.0

Ollama Version (if applicable)

0.20.7

Operating System

MacOS 15.4.1

Browser (if applicable)

Chrome 147.0.7727.56

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

In a conversation containing two parallel models, with at least one being an Anthropic or Google external model, clicking the Merge Responses button with the Google/Anthropic model selected should use that model to generate a merged response.

Actual Behavior

Clicking the Merge Responses button produces no response output in the UI. Browser console shows ERR_CONTENT_DECODING_FAILED 200 on the /api/v1/tasks/moa/completions request.

Steps to Reproduce

AI description I personally inspected for accuracy:

Prerequisites:

  • Docker installed
  • An API key for Anthropic or Google Gemini (or both)
  • An API key for OpenAI (to demonstrate the working case for comparison)

Steps to reproduce:

  1. Install Open WebUI using the dev Docker image:
docker run -d -p 3000:8080 -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:dev
  1. Open http://localhost:3000 in Chrome. Create an admin account on first launch.

  2. Go to Admin Panel → Settings → Connections. Add an OpenAI API connection with your API key. Add an Anthropic and/or Google Gemini API connection with your API key. Click Save.

  3. Start a new chat. Click the + icon in the model selector to add a second model. Select at least one or more models from either Google or Anthropic. Ensure you have at least two models to generate parallel responses.

  4. Open chrome browser DevTools (F12) → Network tab.

  5. Send any prompt (e.g., "hello") and wait for both models to finish responding.

  6. Click the Merge Responses button.

Expected behavior: A merged response is generated and displayed.

Actual behavior: No response appears. The Network tab shows a request to /api/v1/tasks/moa/completions returning 200 OK but Chrome reports ERR_CONTENT_DECODING_FAILED.

Image
  1. To confirm the issue is provider-specific: repeat steps 5-8 using two OpenAI models (e.g., two instances of gpt-4o). The merge succeeds. Alternatively, just select an OpenAI model response before clicking the Merge Responses button.
Image

Logs & Screenshots

Browser

Relevant log:
POST http://localhost:3000/api/v1/tasks/moa/completions net::ERR_CONTENT_DECODING_FAILED 200 (OK)

Relevant response headers:
HTTP/1.1 200 OK
date: Fri, 17 Apr 2026 21:00:24 GMT
server: uvicorn
date: Fri, 17 Apr 2026 21:00:26 GMT
content-type: text/event-stream
transfer-encoding: chunked
content-encoding: gzip
server: cloudflare
x-process-time: 1
access-control-allow-origin: http://localhost:3000
access-control-allow-credentials: true

Docker

2026-04-17 21:19:50.563 | INFO | uvicorn.protocols.http.httptools_impl:send:483 - 172.66.0.243:31213 - "POST /api/v1/tasks/moa/completions HTTP/1.1" 200

Additional Information

I investigated the code and I'd like to propose a fix. In summary, I believe this is caused by errors in openai.py, which in part implements /api/v1/tasks/moa/completions. aiohttp automatically decodes response body but leaves the headers intact. The current logic does not strip these headers, causing the browser to fail to parse the response correctly. This is because Anthropic/Google APIs return responses with a content-encoding header. My changes mirror logic I observed in terminals.py that already strips this header.

I can verify that the following changes fixed the behavior on my end. AI generated proposal that I reviewed:


Proposed fix

Surgical fix (validated locally)

Two hunks in backend/open_webui/routers/openai.py.

Hunk 1 — add the constant near the top of the module (around line 69):

@@ -67,6 +67,8 @@ from open_webui.utils.anthropic import is_anthropic_url, get_anthropic_models
 
 log = logging.getLogger(__name__)
 
+STRIPPED_RESPONSE_HEADERS = frozenset(('transfer-encoding', 'connection', 'content-encoding', 'content-length'))
+
 
 ##########################################

Hunk 2 — filter the forwarded headers on the SSE return inside generate_chat_completion (around line 1222):

@@ -1219,7 +1221,11 @@ async def generate_chat_completion(
             return StreamingResponse(
                 stream_wrapper(r, content_handler=stream_chunks_handler),
                 status_code=r.status,
-                headers=dict(r.headers),
+                headers={
+                    key: value
+                    for key, value in r.headers.items()
+                    if key.lower() not in STRIPPED_RESPONSE_HEADERS
+                },
             )
         else:
             try:

Why this works. The shared aiohttp session uses the default auto_decompress=True, so r.content yields already-decoded plaintext. Forwarding the upstream Content-Encoding: gzip along with a decoded body is what makes the browser fail. Stripping that header (plus the three hop-by-hop / transport headers that Starlette/uvicorn will regenerate for the outgoing hop anyway) aligns the forwarded headers with the actual body the browser will see.

Verified end-to-end against Gemini and Anthropic OpenAI-compat connections — ERR_CONTENT_DECODING_FAILED is gone, streaming renders normally, and OpenAI-backed paths are unchanged (they never emitted the offending header to begin with).

Precedent in the codebase

This pattern is already present at backend/open_webui/routers/terminals.py — the terminal-server reverse proxy defines the same four-element frozenset and applies the same case-insensitive filter before building its StreamingResponse. The diff above is a deliberate copy of that pattern, so the surgical fix doesn't introduce a new idea, it just propagates an existing one to where it's also needed.

Broader cleanup proposal (concept only)

Once the surgical fix lands, the same headers=dict(r.headers) shape exists at three more spots in openai.py (the embeddings passthrough, the Responses API proxy, and the generic /{path:path} passthrough gated by ENABLE_OPENAI_API_PASSTHROUGH) and once in ollama.py's send_request. All share the identical latent defect — they're just less frequently exercised than MOA / chat completions.

Rather than repeat the constant and the filter comprehension at each site (which would then be a third, fourth, fifth copy of what terminals.py already has), I'd propose extracting a small shared helper — a filter_upstream_headers(headers) utility — and colocating it with stream_wrapper in backend/open_webui/utils/session_pool.py.

The rationale for that location: session_pool.py already owns both ends of the "upstream aiohttp → downstream StreamingResponse" boundary. It creates the session that performs the decompression, and it owns stream_wrapper which yields the decoded body. Header filtering is the missing third leg of the same invariant — decoded body requires decoded headers — and having the helper sit next to its siblings means future proxy sites naturally reach for it.

Concretely, the broader cleanup would:

  • Add the helper (and the shared constant) once, next to stream_wrapper.
  • Replace the four remaining dict(r.headers) call sites with calls to the helper.
  • Consider incorporating this change into terminals.py.

Happy to keep the surgical fix isolated as its own PR and follow up with the cleanup in a second PR, or roll them together — whichever you prefer.

Originally created by @jamesc0ry on GitHub (Apr 17, 2026). Original GitHub issue: https://github.com/open-webui/open-webui/issues/23855 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I have also searched in the CLOSED issues AND CLOSED discussions and found no related items (your issue might already be addressed on the development branch!). - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.9.0 ### Ollama Version (if applicable) 0.20.7 ### Operating System MacOS 15.4.1 ### Browser (if applicable) Chrome 147.0.7727.56 ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior In a conversation containing two parallel models, with at least one being an Anthropic or Google external model, clicking the Merge Responses button with the Google/Anthropic model selected should use that model to generate a merged response. ### Actual Behavior Clicking the Merge Responses button produces no response output in the UI. Browser console shows ERR_CONTENT_DECODING_FAILED 200 on the /api/v1/tasks/moa/completions request. ### Steps to Reproduce AI description I personally inspected for accuracy: **Prerequisites:** - Docker installed - An API key for Anthropic or Google Gemini (or both) - An API key for OpenAI (to demonstrate the working case for comparison) **Steps to reproduce:** 1. Install Open WebUI using the dev Docker image: ``` docker run -d -p 3000:8080 -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:dev ``` 2. Open `http://localhost:3000` in Chrome. Create an admin account on first launch. 3. Go to Admin Panel → Settings → Connections. Add an OpenAI API connection with your API key. Add an Anthropic and/or Google Gemini API connection with your API key. Click Save. 4. Start a new chat. Click the `+` icon in the model selector to add a second model. Select at least one or more models from either Google or Anthropic. Ensure you have at least two models to generate parallel responses. 6. Open chrome browser DevTools (F12) → Network tab. 7. Send any prompt (e.g., "hello") and wait for both models to finish responding. 8. Click the Merge Responses button. **Expected behavior:** A merged response is generated and displayed. **Actual behavior:** No response appears. The Network tab shows a request to `/api/v1/tasks/moa/completions` returning `200 OK` but Chrome reports `ERR_CONTENT_DECODING_FAILED`. <img width="2518" height="806" alt="Image" src="https://github.com/user-attachments/assets/d0b6b609-adbe-4ca2-9f0a-c62ff79e62d8" /> 9. To confirm the issue is provider-specific: repeat steps 5-8 using two OpenAI models (e.g., two instances of `gpt-4o`). The merge succeeds. Alternatively, just select an OpenAI model response before clicking the Merge Responses button. <img width="2523" height="952" alt="Image" src="https://github.com/user-attachments/assets/91f5b3fe-9119-4ac7-867a-45c16b57c217" /> ### Logs & Screenshots ### Browser Relevant log: POST http://localhost:3000/api/v1/tasks/moa/completions net::ERR_CONTENT_DECODING_FAILED 200 (OK) Relevant response headers: HTTP/1.1 200 OK date: Fri, 17 Apr 2026 21:00:24 GMT server: uvicorn date: Fri, 17 Apr 2026 21:00:26 GMT content-type: text/event-stream transfer-encoding: chunked content-encoding: gzip server: cloudflare x-process-time: 1 access-control-allow-origin: http://localhost:3000 access-control-allow-credentials: true ### Docker 2026-04-17 21:19:50.563 | INFO | uvicorn.protocols.http.httptools_impl:send:483 - 172.66.0.243:31213 - "POST /api/v1/tasks/moa/completions HTTP/1.1" 200 ### Additional Information I investigated the code and I'd like to propose a fix. In summary, I believe this is caused by errors in openai.py, which in part implements /api/v1/tasks/moa/completions. aiohttp automatically decodes response body but leaves the headers intact. The current logic does not strip these headers, causing the browser to fail to parse the response correctly. This is because Anthropic/Google APIs return responses with a content-encoding header. My changes mirror logic I observed in terminals.py that already strips this header. I can verify that the following changes fixed the behavior on my end. AI generated proposal that I reviewed: --- ## Proposed fix ### Surgical fix (validated locally) Two hunks in `backend/open_webui/routers/openai.py`. **Hunk 1 — add the constant near the top of the module (around line 69):** ```diff @@ -67,6 +67,8 @@ from open_webui.utils.anthropic import is_anthropic_url, get_anthropic_models log = logging.getLogger(__name__) +STRIPPED_RESPONSE_HEADERS = frozenset(('transfer-encoding', 'connection', 'content-encoding', 'content-length')) + ########################################## ``` **Hunk 2 — filter the forwarded headers on the SSE return inside `generate_chat_completion` (around line 1222):** ```diff @@ -1219,7 +1221,11 @@ async def generate_chat_completion( return StreamingResponse( stream_wrapper(r, content_handler=stream_chunks_handler), status_code=r.status, - headers=dict(r.headers), + headers={ + key: value + for key, value in r.headers.items() + if key.lower() not in STRIPPED_RESPONSE_HEADERS + }, ) else: try: ``` --- **Why this works.** The shared aiohttp session uses the default `auto_decompress=True`, so `r.content` yields already-decoded plaintext. Forwarding the upstream `Content-Encoding: gzip` along with a decoded body is what makes the browser fail. Stripping that header (plus the three hop-by-hop / transport headers that Starlette/uvicorn will regenerate for the outgoing hop anyway) aligns the forwarded headers with the actual body the browser will see. Verified end-to-end against Gemini and Anthropic OpenAI-compat connections — `ERR_CONTENT_DECODING_FAILED` is gone, streaming renders normally, and OpenAI-backed paths are unchanged (they never emitted the offending header to begin with). ### Precedent in the codebase This pattern is **already present** at `backend/open_webui/routers/terminals.py` — the terminal-server reverse proxy defines the same four-element frozenset and applies the same case-insensitive filter before building its `StreamingResponse`. The diff above is a deliberate copy of that pattern, so the surgical fix doesn't introduce a new idea, it just propagates an existing one to where it's also needed. ### Broader cleanup proposal (concept only) Once the surgical fix lands, the same `headers=dict(r.headers)` shape exists at three more spots in `openai.py` (the embeddings passthrough, the Responses API proxy, and the generic `/{path:path}` passthrough gated by `ENABLE_OPENAI_API_PASSTHROUGH`) and once in `ollama.py`'s `send_request`. All share the identical latent defect — they're just less frequently exercised than MOA / chat completions. Rather than repeat the constant and the filter comprehension at each site (which would then be a third, fourth, fifth copy of what `terminals.py` already has), I'd propose **extracting a small shared helper** — a `filter_upstream_headers(headers)` utility — and colocating it with `stream_wrapper` in `backend/open_webui/utils/session_pool.py`. The rationale for that location: `session_pool.py` already owns both ends of the "upstream aiohttp → downstream StreamingResponse" boundary. It creates the session that performs the decompression, and it owns `stream_wrapper` which yields the decoded body. Header filtering is the missing third leg of the same invariant — *decoded body requires decoded headers* — and having the helper sit next to its siblings means future proxy sites naturally reach for it. Concretely, the broader cleanup would: - Add the helper (and the shared constant) once, next to `stream_wrapper`. - Replace the four remaining `dict(r.headers)` call sites with calls to the helper. - Consider incorporating this change into `terminals.py`. Happy to keep the surgical fix isolated as its own PR and follow up with the cleanup in a second PR, or roll them together — whichever you prefer.
GiteaMirror added the bug label 2026-05-05 23:52:00 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#58758