[GH-ISSUE #3090] How can I modify the model's existence duration on the GPU? #1898

Closed
opened 2026-04-12 12:00:36 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @papandadj on GitHub (Mar 13, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3090

Recently, I used Ollama to build my application. When I run a model, it automatically loads onto my GPU. However, after a few minutes, the model seems to be unloaded. How can I force the model to always remain loaded on the GPU?

Originally created by @papandadj on GitHub (Mar 13, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3090 Recently, I used Ollama to build my application. When I run a model, it automatically loads onto my GPU. However, after a few minutes, the model seems to be unloaded. How can I force the model to always remain loaded on the GPU?
Author
Owner

@jmorganca commented on GitHub (Mar 13, 2024):

Hi @papandadj with the API you can set "keep_alive": -1 and it will keep the model loaded indefinitely. Let me know if this helps!

<!-- gh-comment-id:1993325505 --> @jmorganca commented on GitHub (Mar 13, 2024): Hi @papandadj with the API you can set `"keep_alive": -1` and it will keep the model loaded indefinitely. Let me know if this helps!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#1898