504 Gateway Timeout with Concurrent Requests to Ollama #5858

Closed
opened 2025-11-12 13:13:56 -06:00 by GiteaMirror · 2 comments
Owner

Originally created by @Mascinissa on GitHub (Feb 11, 2025).

I'm experiencing timeout issues when running multiple concurrent processes that send requests to the Ollama server (v0.5.7).

Environment:

  • Hardware: NVIDIA A100-SXM4-80GB
  • Model: Llama 3.3 70B
  • Ollama Version: 0.5.7
  • Client: OpenAI Python SDK
  • Setup: Multiple concurrent processes sending requests to the same ollama server

Code:

Here is how the code of each process looks like:

from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1',
    api_key='ollama',
    timeout=120.0  # Has no effect if > 60
)

for messages in prep_messages():  # messages generated programmatically
    response = client.chat.completions.create(
        model="llama3.3",
        messages=messages,
        timeout=120.0  # Has no effect if > 60
    )
    do_smth(response)

Issue:

When the number of concurrent processes increases, requests start timing out after around 1 minute. Here are few log lines showing a timed out request:

...
[GIN] 2025/02/11 - 06:13:02 | 200 |  45.023583657s | 127.0.0.1 | POST "/v1/chat/completions"
[GIN] 2025/02/11 - 06:13:28 | 500 |  59.794127092s | 127.0.0.1 | POST "/v1/chat/completions"
[GIN] 2025/02/11 - 06:13:58 | 200 |  51.127991889s | 127.0.0.1 | POST "/v1/chat/completions"
...

Failed requests receive the following 504 Gateway Timeout response:

<html>
<head><title>504 Gateway Time-out</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
</body>
</html>

Attempted Solutions:

  • Client-side timeout settings:

    • Setting timeout > 60 seconds (e.g., 120.0) has no effect; requests still timeout at 1 minute.
    • Setting timeout < 60 seconds works as expected, with requests timing out earlier.
    • This suggests the 1-minute timeout is enforced server-side and cannot be overridden by client settings.
  • Environment variables:

    • Found references to OLLAMA_RUN_TIMEOUT and OLLAMA_REQUEST_TIMEOUT online. These environment variables appear to be outdated and no longer functional in the current version.

Expected Behavior:

  • Ability to configure request timeout duration.
  • Server should wait longer for model responses when under concurrent load.

Questions:

  1. Is there a way to increase the server-side timeout for requests?
  2. Are there any environment variables or configuration options available for timeout management?
  3. If not, could this feature be considered for future releases?

Let me know if you need any additional information.

Originally created by @Mascinissa on GitHub (Feb 11, 2025). I'm experiencing timeout issues when running multiple concurrent processes that send requests to the Ollama server (v0.5.7). ### Environment: - Hardware: NVIDIA A100-SXM4-80GB - Model: Llama 3.3 70B - Ollama Version: 0.5.7 - Client: OpenAI Python SDK - Setup: Multiple concurrent processes sending requests to the same ollama server ### Code: Here is how the code of each process looks like: ```python from openai import OpenAI client = OpenAI( base_url='http://localhost:11434/v1', api_key='ollama', timeout=120.0 # Has no effect if > 60 ) for messages in prep_messages(): # messages generated programmatically response = client.chat.completions.create( model="llama3.3", messages=messages, timeout=120.0 # Has no effect if > 60 ) do_smth(response) ``` ### Issue: When the number of concurrent processes increases, requests start timing out after around 1 minute. Here are few log lines showing a timed out request: ``` ... [GIN] 2025/02/11 - 06:13:02 | 200 | 45.023583657s | 127.0.0.1 | POST "/v1/chat/completions" [GIN] 2025/02/11 - 06:13:28 | 500 | 59.794127092s | 127.0.0.1 | POST "/v1/chat/completions" [GIN] 2025/02/11 - 06:13:58 | 200 | 51.127991889s | 127.0.0.1 | POST "/v1/chat/completions" ... ``` Failed requests receive the following 504 Gateway Timeout response: ```html <html> <head><title>504 Gateway Time-out</title></head> <body> <center><h1>504 Gateway Time-out</h1></center> </body> </html> ``` ### Attempted Solutions: - Client-side timeout settings: - Setting timeout > 60 seconds (e.g., 120.0) has no effect; requests still timeout at 1 minute. - Setting timeout < 60 seconds works as expected, with requests timing out earlier. - This suggests the 1-minute timeout is enforced server-side and cannot be overridden by client settings. - Environment variables: - Found references to `OLLAMA_RUN_TIMEOUT` and `OLLAMA_REQUEST_TIMEOUT` online. These environment variables appear to be outdated and no longer functional in the current version. ### Expected Behavior: - Ability to configure request timeout duration. - Server should wait longer for model responses when under concurrent load. ### Questions: 1. Is there a way to increase the server-side timeout for requests? 2. Are there any environment variables or configuration options available for timeout management? 3. If not, could this feature be considered for future releases? Let me know if you need any additional information.
GiteaMirror added the feature request label 2025-11-12 13:13:56 -06:00
Author
Owner

@rick-github commented on GitHub (Feb 11, 2025):

The ollama server doesn't have a timeout. You have a proxy which has a 60 second timeout.

@rick-github commented on GitHub (Feb 11, 2025): The ollama server doesn't have a timeout. You have a proxy which has a 60 second timeout.
Author
Owner

@Mascinissa commented on GitHub (Feb 11, 2025):

You're absolutely right! The timeout wasn't from ollama but from zrok. I was tunneling the requests through zrok and it seems that it drops the connections after 1 minute of waiting. I just tried to run things locally and it's not timing out. So I guess I have to look for solutions from the proxy side.
Thanks a lot!

@Mascinissa commented on GitHub (Feb 11, 2025): You're absolutely right! The timeout wasn't from ollama but from zrok. I was tunneling the requests through zrok and it seems that it drops the connections after 1 minute of waiting. I just tried to run things locally and it's not timing out. So I guess I have to look for solutions from the proxy side. Thanks a lot!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama-ollama#5858