[GH-ISSUE #4350] Configurable model loading timeout #2713

Closed
opened 2026-04-12 13:01:45 -05:00 by GiteaMirror · 10 comments
Owner

Originally created by @ProjectMoon on GitHub (May 11, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4350

The model loading timeout, the time to wait for the llama runner, is hard coded. It would be nice to be able to configure this to increase or decrease it (for me, mostly increase). This would allow experimenting with big models that take forever to load, but might run fine once loaded.

Originally created by @ProjectMoon on GitHub (May 11, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4350 The model loading timeout, the time to wait for the llama runner, is hard coded. It would be nice to be able to configure this to increase or decrease it (for me, mostly increase). This would allow experimenting with big models that take forever to load, but might run fine once loaded.
GiteaMirror added the feature request label 2026-04-12 13:01:45 -05:00
Author
Owner

@lengrongfu commented on GitHub (May 13, 2024):

I can try to add this feature.

<!-- gh-comment-id:2106552500 --> @lengrongfu commented on GitHub (May 13, 2024): I can try to add this feature.
Author
Owner

@lengrongfu commented on GitHub (May 13, 2024):

/assign

<!-- gh-comment-id:2106552567 --> @lengrongfu commented on GitHub (May 13, 2024): /assign
Author
Owner

@lengrongfu commented on GitHub (May 13, 2024):

@ProjectMoon Your problem is that when ollama runs a model that does not exist locally, does pulling it from the registry time out?

<!-- gh-comment-id:2106556703 --> @lengrongfu commented on GitHub (May 13, 2024): @ProjectMoon Your problem is that when ollama runs a model that does not exist locally, does pulling it from the registry time out?
Author
Owner

@ProjectMoon commented on GitHub (May 13, 2024):

No, it's the loading of the model from disk. There's a hard coded timeout of 10 minutes. I forget the exact file where this is, but the comment above the line says something like "be generous, as long models can take a long time to load."

<!-- gh-comment-id:2106778204 --> @ProjectMoon commented on GitHub (May 13, 2024): No, it's the loading of the model from disk. There's a hard coded timeout of 10 minutes. I forget the exact file where this is, but the comment above the line says something like "be generous, as long models can take a long time to load."
Author
Owner

@ProjectMoon commented on GitHub (May 13, 2024):

An example of why this would be useful (to me): I can load Mixtral 8x7b using the Q2_K quant (smallest available file). It loads in 565 seconds, just under the 10 minute timeout limit. But once it's loaded, it generates at 16 tokens/second. It would be lovely if I could try the higher quants and see what happens.

<!-- gh-comment-id:2108395407 --> @ProjectMoon commented on GitHub (May 13, 2024): An example of why this would be useful (to me): I can load Mixtral 8x7b using the Q2_K quant (smallest available file). It loads in 565 seconds, just under the 10 minute timeout limit. But once it's loaded, it generates at 16 tokens/second. It would be lovely if I could try the higher quants and see what happens.
Author
Owner

@ProjectMoon commented on GitHub (Jun 14, 2024):

I am noticing some flakiness on slow model loads. I have one model that gets to 80% and then gives up. Should I reopen this issue or make a new issue to change the stall time?

<!-- gh-comment-id:2168154875 --> @ProjectMoon commented on GitHub (Jun 14, 2024): I am noticing some flakiness on slow model loads. I have one model that gets to 80% and then gives up. Should I reopen this issue or make a new issue to change the stall time?
Author
Owner

@dhiltgen commented on GitHub (Jun 14, 2024):

@ProjectMoon can you try disabling mmap to see if that significantly changes the load times for your setup?

curl http://localhost:11434/api/generate -d '{
  "model": "llama3:70b",
  "prompt": "Why is the sky blue?",
  "stream": false, "options": {"use_mmap": false}
}'
<!-- gh-comment-id:2168318383 --> @dhiltgen commented on GitHub (Jun 14, 2024): @ProjectMoon can you try disabling mmap to see if that significantly changes the load times for your setup? ``` curl http://localhost:11434/api/generate -d '{ "model": "llama3:70b", "prompt": "Why is the sky blue?", "stream": false, "options": {"use_mmap": false} }' ```
Author
Owner

@ProjectMoon commented on GitHub (Jun 14, 2024):

@dhiltgen Disabling mmap makes a world of difference. All the models actually load now, properly split across CPU and GPU. They also load a billion times faster. It's like magic. But mmap doesn't seem to be globally toggleable, and OpenWebUI seems to have only "on" or "default" as options for mmap, instead of also having an "off" value (this isn't part of the Ollama project, but it is odd).

I will now be making some new modelfiles with ollama disabled.

<!-- gh-comment-id:2168492936 --> @ProjectMoon commented on GitHub (Jun 14, 2024): @dhiltgen Disabling mmap makes a world of difference. All the models actually load now, properly split across CPU and GPU. They also load a billion times faster. It's like magic. But mmap doesn't seem to be globally toggleable, and OpenWebUI seems to have only "on" or "default" as options for mmap, instead of also having an "off" value (this isn't part of the Ollama project, but it is odd). I will now be making some new modelfiles with ollama disabled.
Author
Owner

@ProjectMoon commented on GitHub (Jun 14, 2024):

Have noticed, though, that some models have issues loading with mmap disabled if I increase the context size. They crash for various reasons, and not always out of memory issues.

<!-- gh-comment-id:2168645543 --> @ProjectMoon commented on GitHub (Jun 14, 2024): Have noticed, though, that some models have issues loading with mmap disabled if I increase the context size. They crash for various reasons, and not always out of memory issues.
Author
Owner

@dhiltgen commented on GitHub (Jun 14, 2024):

We'll track this via #3940

<!-- gh-comment-id:2168738082 --> @dhiltgen commented on GitHub (Jun 14, 2024): We'll track this via #3940
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#2713