[GH-ISSUE #4810] "Server disconnected without sending a response" after ~60seconds. #49547

Closed
opened 2026-04-28 12:14:18 -05:00 by GiteaMirror · 11 comments
Owner

Originally created by @michaelgloeckner on GitHub (Jun 4, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4810

What is the issue?

I run mixtral model and using api/generate.
If I run a bigger prompt it returns "Server disconnected without sending a response."

I checked ollama logs and see:
[GIN] 2024/06/04 - 09:36:43 | 500 | 59.693208463s | 10.0.101.220 | POST "/api/generate"

Is there a way to increase this kind of internal timeout?

I already tried setting timeout but it does not have an impact on this
from ollama import Client
client = Client(host='OllamaServer', timeout=120)

OS

Linux, Containerd, eks

GPU

Nvidia

CPU

AMD

Ollama version

0.1.40

Originally created by @michaelgloeckner on GitHub (Jun 4, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4810 ### What is the issue? I run mixtral model and using api/generate. If I run a bigger prompt it returns "Server disconnected without sending a response." I checked ollama logs and see: [GIN] 2024/06/04 - 09:36:43 | 500 | 59.693208463s | 10.0.101.220 | POST "/api/generate" Is there a way to increase this kind of internal timeout? I already tried setting timeout but it does not have an impact on this from ollama import Client client = Client(host='OllamaServer', timeout=120) ### OS Linux, Containerd, eks ### GPU Nvidia ### CPU AMD ### Ollama version 0.1.40
GiteaMirror added the bug label 2026-04-28 12:14:18 -05:00
Author
Owner

@MrSuddenJoy commented on GitHub (Jun 4, 2024):

Dont containerize. Install on bare system.

<!-- gh-comment-id:2147238446 --> @MrSuddenJoy commented on GitHub (Jun 4, 2024): Dont containerize. Install on bare system.
Author
Owner

@michaelgloeckner commented on GitHub (Jun 4, 2024):

Dont containerize. Install on bare system.
How is docker causing this problem?
I am running ollama on eks with g5.4xlarge instance and gpu support.

<!-- gh-comment-id:2147348400 --> @michaelgloeckner commented on GitHub (Jun 4, 2024): > Dont containerize. Install on bare system. How is docker causing this problem? I am running ollama on eks with g5.4xlarge instance and gpu support.
Author
Owner

@MrSuddenJoy commented on GitHub (Jun 4, 2024):

@michaelgloeckner in your initial poist yoiu stated that you run Docker on Linux. Now you say you run on eks which is Kubernetes.,.... make up your mind please......

<!-- gh-comment-id:2147474804 --> @MrSuddenJoy commented on GitHub (Jun 4, 2024): @michaelgloeckner in your initial poist yoiu stated that you run `Docker` on `Linux`. Now you say you run on `eks` which is `Kubernetes`.,.... make up your mind please......
Author
Owner

@michaelgloeckner commented on GitHub (Jun 4, 2024):

My decision is that I use eks cluster which has an amazon linux node, where inside ollama image is running. At the end it is a ollama container running in linux (AmazonLinux) right?
BTW. Kubernetes is not pickable from the OS selection menu....

<!-- gh-comment-id:2147522007 --> @michaelgloeckner commented on GitHub (Jun 4, 2024): My decision is that I use eks cluster which has an amazon linux node, where inside ollama image is running. At the end it is a ollama container running in linux (AmazonLinux) right? BTW. Kubernetes is not pickable from the OS selection menu....
Author
Owner

@MrSuddenJoy commented on GitHub (Jun 4, 2024):

@michaelgloeckner Yes, its ollama container inside Linux.
Kubernetes may (and should) not be pickable, because its not OS, its software, so you can instal it after you have chosen OS.

<!-- gh-comment-id:2147842534 --> @MrSuddenJoy commented on GitHub (Jun 4, 2024): @michaelgloeckner Yes, its `ollama` container inside `Linux`. Kubernetes may (and should) not be pickable, because its not OS, its software, so you can instal it after you have chosen OS.
Author
Owner

@michaelgloeckner commented on GitHub (Jun 5, 2024):

I enabled Debugging and get the attached logs.
One with a http 200 result and one with http 500 "Server disconnected without sending a response" result after 60 seconds.
ollama_logs.txt

Is there a way to adjust this 60seconds?

<!-- gh-comment-id:2149039293 --> @michaelgloeckner commented on GitHub (Jun 5, 2024): I enabled Debugging and get the attached logs. One with a http 200 result and one with http 500 "Server disconnected without sending a response" result after 60 seconds. [ollama_logs.txt](https://github.com/user-attachments/files/15574903/ollama_logs.txt) Is there a way to adjust this 60seconds?
Author
Owner

@michaelgloeckner commented on GitHub (Jun 5, 2024):

As a temporary workaround I am using stream option now.

<!-- gh-comment-id:2149065280 --> @michaelgloeckner commented on GitHub (Jun 5, 2024): As a temporary workaround I am using stream option now.
Author
Owner

@amida47 commented on GitHub (Jun 19, 2024):

@michaelgloeckner got the same problem while running ollama in a codespaces, no api request goes above 60s, how did you fix this?
I used curl -fsSL https://ollama.com/install.sh | sh for installation.

<!-- gh-comment-id:2177373127 --> @amida47 commented on GitHub (Jun 19, 2024): @michaelgloeckner got the same problem while running ollama in a codespaces, no api request goes above 60s, how did you fix this? I used `curl -fsSL https://ollama.com/install.sh | sh` for installation.
Author
Owner

@amida47 commented on GitHub (Jun 19, 2024):

Dont containerize. Install on bare system.

@MrSuddenJoy why is that?

<!-- gh-comment-id:2177373944 --> @amida47 commented on GitHub (Jun 19, 2024): > Dont containerize. Install on bare system. @MrSuddenJoy why is that?
Author
Owner

@michaelgloeckner commented on GitHub (Jun 19, 2024):

@amida47 i am using the stream=True option like in the below code. This avoids the timeout in my case:

def generateAsStream(prompt, system=None, temperature=0.01):
    response = ""

    stop: Sequence[str] = ["[INST]", "[/INST]"]
    options = Options(temperature=temperature, stop=stop, num_ctx=4096)

    stream_response = client.generate('mixtral', prompt=prompt, options=options, system=system, stream=True)

    for chunk in stream_response:
        response+=chunk['response']
        
    return response
<!-- gh-comment-id:2177980929 --> @michaelgloeckner commented on GitHub (Jun 19, 2024): @amida47 i am using the **stream=True** option like in the below code. This avoids the timeout in my case: ``` def generateAsStream(prompt, system=None, temperature=0.01): response = "" stop: Sequence[str] = ["[INST]", "[/INST]"] options = Options(temperature=temperature, stop=stop, num_ctx=4096) stream_response = client.generate('mixtral', prompt=prompt, options=options, system=system, stream=True) for chunk in stream_response: response+=chunk['response'] return response ```
Author
Owner

@ChenYiJing025 commented on GitHub (May 11, 2025):

I’ve tried the method above, but the issue persists. Any other suggestions or solutions would be greatly appreciated.

<!-- gh-comment-id:2869898214 --> @ChenYiJing025 commented on GitHub (May 11, 2025): I’ve tried the method above, but the issue persists. Any other suggestions or solutions would be greatly appreciated.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#49547