[GH-ISSUE #2974] ollama free GPU memory itself #27587

Closed
opened 2026-04-22 05:03:13 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @ly0303521 on GitHub (Mar 7, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2974

Originally assigned to: @dhiltgen on GitHub.

Hello, nice work. Below is my problem:
if no request to ollama for a few minutes, it will free GPU memory, when a new request comes, it will load the model and response, so this will about 5 seconds. How to make ollama hold the GPU memory

Originally created by @ly0303521 on GitHub (Mar 7, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2974 Originally assigned to: @dhiltgen on GitHub. Hello, nice work. Below is my problem: if no request to ollama for a few minutes, it will free GPU memory, when a new request comes, it will load the model and response, so this will about 5 seconds. How to make ollama hold the GPU memory
Author
Owner

@dhiltgen commented on GitHub (Mar 7, 2024):

See https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-keep-a-model-loaded-in-memory-or-make-it-unload-immediately for instructions on using keep_alive

<!-- gh-comment-id:1984022077 --> @dhiltgen commented on GitHub (Mar 7, 2024): See https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-keep-a-model-loaded-in-memory-or-make-it-unload-immediately for instructions on using `keep_alive`
Author
Owner

@ly0303521 commented on GitHub (Mar 8, 2024):

@dhiltgen My problem solved, Thanks very much

<!-- gh-comment-id:1984860796 --> @ly0303521 commented on GitHub (Mar 8, 2024): @dhiltgen My problem solved, Thanks very much
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#27587