mirror of
https://github.com/open-webui/open-webui.git
synced 2026-03-10 15:54:15 -05:00
“/api/chat/completions” Times Out (504) When Model Thinking Exceeds 30 Seconds #3962
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @zsy5172 on GitHub (Feb 19, 2025).
Bug Report
Installation Method
I installed the latest image via Docker. (Example command: docker pull …)
Environment
Confirmation:
Expected Behavior
When using a model with a longer thinking time (e.g., the “o1” series), the /api/chat/completions endpoint should successfully return the final response and allow continued conversation afterward.
Actual Behavior
For some models where the thinking time exceeds 30 seconds, the request to /api/chat/completions times out due to the CDN’s 30-second limit, resulting in a 504 Gateway Timeout. Even though /api/chat/completed eventually returns the full response, the 504 error prevents continued chatting on the frontend.
Description
Bug Summary:
When a model thinks for more than 30 seconds, the /api/chat/completions request will timeout and trigger a 504 error.
Reproduction Details
Steps to Reproduce:
Logs and Screenshots
Browser Console Logs:
No special logs except the 504 timeout error.
Docker Container Logs:
Backend logs show the ongoing inference and final output, but the request times out on the frontend.
Screenshots/Screen Recordings (if applicable):
Additional Information
A recommended solution is to make /api/chat/completions not block for so long. Instead, implement a short-polling mechanism (20-30 seconds) so that the CDN’s non-adjustable timeout is not exceeded. This way, the client can still receive the response once the model finishes, avoiding 504 errors that disrupt subsequent conversation.
Note
I will update this issue with any additional details (e.g., specific logs, further config info) if needed. Thank you for the hard work, and I hope this suggestion helps those with extended inference requirements avoid timeouts that prevent further interactions.
@tjbck commented on GitHub (Feb 19, 2025):
This has to do with your reverse proxy configuration.
@eric2788 commented on GitHub (Feb 22, 2025):
Cloudflare Proxy also causes this issue, but without cloudflare proxy I am not able to ensure my website security and may expose my origin IP, I am kind of struggling 😢
@procoprobocop commented on GitHub (Apr 10, 2025):
I had a similar problem in HAProxy. It is solved very simply by increasing the time in the "Connection timeout" and "Server timeout" parameters