issue: 504 Gateway Timeout Errors with Long-Running LLM Requests #6154

New Issue

GiteaMirror · 2025-11-11T16:46:15-06:00

GiteaMirror commented

2025-11-11 16:46:15 -06:00

Originally created by @dieu-bis on GitHub (Aug 21, 2025).

Check Existing Issues

I have searched the existing issues and discussions.
I am using the latest version of Open WebUI.

Installation Method

Other

Open WebUI Version

v0.6.22

Ollama Version (if applicable)

N/A (using LiteLLM proxy)

Operating System

Kubernetes on GKE (Google Kubernetes Engine) - Ubuntu nodes

Browser (if applicable)

Chrome 139.0.7258.128

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

When sending a chat request to any LLM model through OpenWebUI, the interface should wait for the complete response without timing out, showing appropriate loading indicators while the request is being processed.

Actual Behavior

Users receive a "504 Gateway Timeout" error in the browser console after approximately 60 seconds when using certain LLM models. However, the request continues processing on the backend, and the chat response eventually appears in the UI after a few additional seconds. This creates a confusing user experience where an error is shown but the operation actually succeeds.

Steps to Reproduce

Environment Setup:
- OpenWebUI v0.6.22 deployed on Kubernetes (GKE)
- LiteLLM proxy configured as backend at http://litellm.corporate-ai.svc.cluster.local:4000
- Kong ingress controller with extended timeouts (36000000ms configured)
- Redis Sentinel cluster for caching
Reproduction Steps:
- Open Chrome/Firefox/Safari browser
- Navigate to https://[your-openwebui-domain]/
- Log in with valid credentials
- Select any model from the model dropdown
- Type a complex query that requires extended processing time (e.g., "Analyze the following 10-page document and provide detailed insights...")
- Click send or press Enter
- Wait approximately 60 seconds
- Observe browser console showing: Failed to load resource: the server responded with a status of 504 ()
- Wait an additional 3-5 seconds
- Observe that the chat response appears successfully despite the error
Configuration Details:

Kong Ingress configuration

proxy:
connect_timeout: 60000
read_timeout: 36000000
write_timeout: 36000000

OpenWebUI deployment

image: ghcr.io/open-webui/open-webui:0.6.22

Logs & Screenshots

Browser Console Error:
Failed to load resource: the server responded with a status of 504 ()
GET https://[domain]/api/chat/[id] 504 (Gateway Timeout)

LiteLLM Backend Logs (showing successful completion):
INFO: 10.10.22.10:44874 - "POST /chat/completions HTTP/1.1" 200 OK
INFO: 10.10.22.10:51992 - "POST /chat/completions HTTP/1.1" 200 OK

OpenWebUI Logs:
2025-08-21 09:52:01.504 | INFO | httpx._client:_send_single_request:1025 - HTTP Request: POST http://litellm:4000/chat/completions "HTTP/1.1 200 OK"

Additional Information

Analysis:

The 504 error appears to be a client-side (browser) timeout, not a server-side issue
Backend services (OpenWebUI, LiteLLM, Kong) all have appropriate long timeouts configured
The request completes successfully on the backend but the browser gives up waiting after ~60 seconds
This primarily affects models with longer response times

Current Workaround:
Users can ignore the 504 error and wait a few seconds for the response to appear, or refresh the page to see the completed response.

Suggested Fix:
Consider implementing:

Streaming responses to keep the connection alive during long-running requests
WebSocket connections for real-time updates
Configurable client-side timeout values
Progress indicators for long-running requests
Async request pattern with polling for models known to have longer response times

Environment Variables Currently Set:

ENABLE_WEBSOCKET_SUPPORT: True
ENABLE_OPENAI_API: True
DATABASE_URL: [PostgreSQL connection]
GLOBAL_LOG_LEVEL: debug

This issue affects user experience significantly as it shows errors for successful operations, causing confusion and potentially leading users to retry requests unnecessarily.

Originally created by @dieu-bis on GitHub (Aug 21, 2025). ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Other ### Open WebUI Version v0.6.22 ### Ollama Version (if applicable) N/A (using LiteLLM proxy) ### Operating System Kubernetes on GKE (Google Kubernetes Engine) - Ubuntu nodes ### Browser (if applicable) Chrome 139.0.7258.128 ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior When sending a chat request to any LLM model through OpenWebUI, the interface should wait for the complete response without timing out, showing appropriate loading indicators while the request is being processed. ### Actual Behavior Users receive a "504 Gateway Timeout" error in the browser console after approximately 60 seconds when using certain LLM models. However, the request continues processing on the backend, and the chat response eventually appears in the UI after a few additional seconds. This creates a confusing user experience where an error is shown but the operation actually succeeds. ### Steps to Reproduce 1. Environment Setup: - OpenWebUI v0.6.22 deployed on Kubernetes (GKE) - LiteLLM proxy configured as backend at http://litellm.corporate-ai.svc.cluster.local:4000 - Kong ingress controller with extended timeouts (36000000ms configured) - Redis Sentinel cluster for caching 2. Reproduction Steps: - Open Chrome/Firefox/Safari browser - Navigate to https://[your-openwebui-domain]/ - Log in with valid credentials - Select any model from the model dropdown - Type a complex query that requires extended processing time (e.g., "Analyze the following 10-page document and provide detailed insights...") - Click send or press Enter - Wait approximately 60 seconds - Observe browser console showing: Failed to load resource: the server responded with a status of 504 () - Wait an additional 3-5 seconds - Observe that the chat response appears successfully despite the error 3. Configuration Details: # Kong Ingress configuration proxy: connect_timeout: 60000 read_timeout: 36000000 write_timeout: 36000000 # OpenWebUI deployment image: ghcr.io/open-webui/open-webui:0.6.22 ### Logs & Screenshots Browser Console Error: Failed to load resource: the server responded with a status of 504 () GET https://[domain]/api/chat/[id] 504 (Gateway Timeout) LiteLLM Backend Logs (showing successful completion): INFO: 10.10.22.10:44874 - "POST /chat/completions HTTP/1.1" 200 OK INFO: 10.10.22.10:51992 - "POST /chat/completions HTTP/1.1" 200 OK OpenWebUI Logs: 2025-08-21 09:52:01.504 | INFO | httpx._client:_send_single_request:1025 - HTTP Request: POST http://litellm:4000/chat/completions "HTTP/1.1 200 OK" ### Additional Information Analysis: - The 504 error appears to be a client-side (browser) timeout, not a server-side issue - Backend services (OpenWebUI, LiteLLM, Kong) all have appropriate long timeouts configured - The request completes successfully on the backend but the browser gives up waiting after ~60 seconds - This primarily affects models with longer response times Current Workaround: Users can ignore the 504 error and wait a few seconds for the response to appear, or refresh the page to see the completed response. Suggested Fix: Consider implementing: 1. Streaming responses to keep the connection alive during long-running requests 2. WebSocket connections for real-time updates 3. Configurable client-side timeout values 4. Progress indicators for long-running requests 5. Async request pattern with polling for models known to have longer response times Environment Variables Currently Set: - ENABLE_WEBSOCKET_SUPPORT: True - ENABLE_OPENAI_API: True - DATABASE_URL: [PostgreSQL connection] - GLOBAL_LOG_LEVEL: debug This issue affects user experience significantly as it shows errors for successful operations, causing confusion and potentially leading users to retry requests unnecessarily.

GiteaMirror added the bug label 2025-11-11 16:46:15 -06:00

GiteaMirror closed this issue

2025-11-11 16:46:15 -06:00

GiteaMirror commented

2025-11-11 16:46:16 -06:00

@tjbck commented on GitHub (Aug 21, 2025):

Reverse proxy config issue, with that being said this should be addressed in dev.

@tjbck commented on GitHub (Aug 21, 2025): Reverse proxy config issue, with that being said this should be addressed in dev.

GiteaMirror referenced this issue

2026-04-19 20:40:50 -05:00

[GH-ISSUE #6154] bug: artifact renders parts of content twice #14261

GiteaMirror referenced this issue