“/api/chat/completions” Times Out (504) When Model Thinking Exceeds 30 Seconds #3962

New Issue

GiteaMirror · 2025-11-11T15:43:21-06:00

GiteaMirror commented

2025-11-11 15:43:21 -06:00

Originally created by @zsy5172 on GitHub (Feb 19, 2025).

Bug Report

Installation Method

I installed the latest image via Docker. (Example command: docker pull …)

Environment

Open WebUI Version: v0.5.14
Operating System: Windows 11
Browser: Chrome 133

Confirmation:

I have read and followed all instructions provided in the README.md.
I am on the latest version of both Open WebUI and Ollama.
I have included browser console logs.
I have included Docker container logs.
I have provided the exact steps to reproduce the bug in the “Steps to Reproduce” section below.

Expected Behavior

When using a model with a longer thinking time (e.g., the “o1” series), the /api/chat/completions endpoint should successfully return the final response and allow continued conversation afterward.

Actual Behavior

For some models where the thinking time exceeds 30 seconds, the request to /api/chat/completions times out due to the CDN’s 30-second limit, resulting in a 504 Gateway Timeout. Even though /api/chat/completed eventually returns the full response, the 504 error prevents continued chatting on the frontend.

Description

Bug Summary:
When a model thinks for more than 30 seconds, the /api/chat/completions request will timeout and trigger a 504 error.

Reproduction Details

Steps to Reproduce:

Deploy Open WebUI (e.g., via Docker).
Launch the frontend and select a model with a long thinking time (like the “o1” series).
Enter a prompt that causes extended reasoning.
After ~30 seconds, the frontend shows "SyntaxError: Unexpected end of JSON input".
frontend shows that /api/chat/completed still has the full output, but the conversation is stuck because of the 504.

Logs and Screenshots

Browser Console Logs:
No special logs except the 504 timeout error.

Docker Container Logs:
Backend logs show the ongoing inference and final output, but the request times out on the frontend.

Screenshots/Screen Recordings (if applicable):

Additional Information

A recommended solution is to make /api/chat/completions not block for so long. Instead, implement a short-polling mechanism (20-30 seconds) so that the CDN’s non-adjustable timeout is not exceeded. This way, the client can still receive the response once the model finishes, avoiding 504 errors that disrupt subsequent conversation.

Note

I will update this issue with any additional details (e.g., specific logs, further config info) if needed. Thank you for the hard work, and I hope this suggestion helps those with extended inference requirements avoid timeouts that prevent further interactions.

Originally created by @zsy5172 on GitHub (Feb 19, 2025). # Bug Report ## Installation Method I installed the latest image via Docker. (Example command: docker pull …) ## Environment - **Open WebUI Version:** v0.5.14 - **Operating System:** Windows 11 - **Browser:** Chrome 133 **Confirmation:** - [x] I have read and followed all instructions provided in the README.md. - [x] I am on the latest version of both Open WebUI and Ollama. - [ ] I have included browser console logs. - [ ] I have included Docker container logs. - [x] I have provided the exact steps to reproduce the bug in the “Steps to Reproduce” section below. ## Expected Behavior When using a model with a longer thinking time (e.g., the “o1” series), the /api/chat/completions endpoint should successfully return the final response and allow continued conversation afterward. ## Actual Behavior For some models where the thinking time exceeds 30 seconds, the request to /api/chat/completions times out due to the CDN’s 30-second limit, resulting in a 504 Gateway Timeout. Even though /api/chat/completed eventually returns the full response, the 504 error prevents continued chatting on the frontend. ## Description **Bug Summary:** When a model thinks for more than 30 seconds, the /api/chat/completions request will timeout and trigger a 504 error. ## Reproduction Details **Steps to Reproduce:** 1. Deploy Open WebUI (e.g., via Docker). 2. Launch the frontend and select a model with a long thinking time (like the “o1” series). 3. Enter a prompt that causes extended reasoning. 4. After ~30 seconds, the frontend shows "SyntaxError: Unexpected end of JSON input". 5. frontend shows that /api/chat/completed still has the full output, but the conversation is stuck because of the 504. ## Logs and Screenshots **Browser Console Logs:** No special logs except the 504 timeout error. **Docker Container Logs:** Backend logs show the ongoing inference and final output, but the request times out on the frontend. **Screenshots/Screen Recordings (if applicable):** ![Image](https://github.com/user-attachments/assets/3cc5b552-ca39-4932-9e8c-99ca6bb8d0b2) ![Image](https://github.com/user-attachments/assets/a83c8976-8c5f-4c6d-a86f-9894ff2abc89) ## Additional Information A recommended solution is to make /api/chat/completions not block for so long. Instead, implement a short-polling mechanism (20-30 seconds) so that the CDN’s non-adjustable timeout is not exceeded. This way, the client can still receive the response once the model finishes, avoiding 504 errors that disrupt subsequent conversation. ## Note I will update this issue with any additional details (e.g., specific logs, further config info) if needed. Thank you for the hard work, and I hope this suggestion helps those with extended inference requirements avoid timeouts that prevent further interactions.

GiteaMirror closed this issue

2025-11-11 15:43:21 -06:00

GiteaMirror commented

2025-11-11 15:43:22 -06:00

@tjbck commented on GitHub (Feb 19, 2025):

This has to do with your reverse proxy configuration.

@tjbck commented on GitHub (Feb 19, 2025): This has to do with your reverse proxy configuration.

GiteaMirror commented

2025-11-11 15:43:22 -06:00

@eric2788 commented on GitHub (Feb 22, 2025):

This has to do with your reverse proxy configuration.

Cloudflare Proxy also causes this issue, but without cloudflare proxy I am not able to ensure my website security and may expose my origin IP, I am kind of struggling 😢

@eric2788 commented on GitHub (Feb 22, 2025): > This has to do with your reverse proxy configuration. Cloudflare Proxy also causes this issue, but without cloudflare proxy I am not able to ensure my website security and may expose my origin IP, I am kind of struggling 😢

GiteaMirror commented

2025-11-11 15:43:22 -06:00

@procoprobocop commented on GitHub (Apr 10, 2025):

I had a similar problem in HAProxy. It is solved very simply by increasing the time in the "Connection timeout" and "Server timeout" parameters

@procoprobocop commented on GitHub (Apr 10, 2025): I had a similar problem in HAProxy. It is solved very simply by increasing the time in the "**Connection timeout**" and "**Server timeout**" parameters