[GH-ISSUE #931] How do we stop a model to release GPU memory? (not ollama server). #26217

Closed
opened 2026-04-22 02:16:24 -05:00 by GiteaMirror · 11 comments
Owner

Originally created by @riskk21 on GitHub (Oct 27, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/931

How do we stop a model to release GPU memory? (not ollama server).

Originally created by @riskk21 on GitHub (Oct 27, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/931 How do we stop a model to release GPU memory? (not ollama server).
Author
Owner

@technovangelist commented on GitHub (Oct 27, 2023):

The memory will be release about 5 minutes after the last time you use it

<!-- gh-comment-id:1782319177 --> @technovangelist commented on GitHub (Oct 27, 2023): The memory will be release about 5 minutes after the last time you use it
Author
Owner

@riskk21 commented on GitHub (Oct 27, 2023):

Is there a special command?

<!-- gh-comment-id:1782330410 --> @riskk21 commented on GitHub (Oct 27, 2023): Is there a special command?
Author
Owner

@technovangelist commented on GitHub (Oct 27, 2023):

It's automatic at this time. But we are looking into other options.

<!-- gh-comment-id:1782348467 --> @technovangelist commented on GitHub (Oct 27, 2023): It's automatic at this time. But we are looking into other options.
Author
Owner

@erick1337 commented on GitHub (Dec 4, 2023):

@technovangelist can I modify the offloading time somewhere in the code?

<!-- gh-comment-id:1838195800 --> @erick1337 commented on GitHub (Dec 4, 2023): @technovangelist can I modify the offloading time somewhere in the code?
Author
Owner

@erick1337 commented on GitHub (Dec 4, 2023):

@technovangelist can I modify the offloading time somewhere in the code?

Figured out:

ollama/server/routes.go
var defaultSessionDuration = 5 * time.Minute

<!-- gh-comment-id:1838206595 --> @erick1337 commented on GitHub (Dec 4, 2023): > @technovangelist can I modify the offloading time somewhere in the code? Figured out: ollama/server/routes.go var defaultSessionDuration = 5 * time.Minute
Author
Owner

@technovangelist commented on GitHub (Dec 8, 2023):

That’s great to hear. There is an interesting PR using environment variables that may solve this for some folks.

<!-- gh-comment-id:1847984528 --> @technovangelist commented on GitHub (Dec 8, 2023): That’s great to hear. There is an interesting PR using environment variables that may solve this for some folks.
Author
Owner

@sigkill commented on GitHub (Jan 1, 2024):

Yeah would be great to have it as an environment variable, especially when using langchain from another host.

<!-- gh-comment-id:1873124414 --> @sigkill commented on GitHub (Jan 1, 2024): Yeah would be great to have it as an environment variable, especially when using langchain from another host.
Author
Owner

@skrew commented on GitHub (Jan 9, 2024):

I can confirm that it's annoying to have to wait for the model to reload (it takes a long time for me) when you're waiting for the answer. :)
A settings would be fine

<!-- gh-comment-id:1883058345 --> @skrew commented on GitHub (Jan 9, 2024): I can confirm that it's annoying to have to wait for the model to reload (it takes a long time for me) when you're waiting for the answer. :) A settings would be fine
Author
Owner

@jmorganca commented on GitHub (Feb 20, 2024):

This can be done with:

curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": 0}'

To effectively unload a model (assuming llama2 is loaded)

See this doc for more info: https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-keep-a-model-loaded-in-memory-or-make-it-unload-immediately

<!-- gh-comment-id:1953328550 --> @jmorganca commented on GitHub (Feb 20, 2024): This can be done with: ``` curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": 0}' ``` To effectively unload a model (assuming `llama2` is loaded) See this doc for more info: https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-keep-a-model-loaded-in-memory-or-make-it-unload-immediately
Author
Owner

@leoterry-ulrica commented on GitHub (Mar 13, 2024):

This can be done with:

curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": 0}'

To effectively unload a model (assuming llama2 is loaded)

See this doc for more info: https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-keep-a-model-loaded-in-memory-or-make-it-unload-immediately

It is best to use environment variable Settings to change the time that large model services keep alive!!

<!-- gh-comment-id:1993304871 --> @leoterry-ulrica commented on GitHub (Mar 13, 2024): > This can be done with: > > ``` > curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": 0}' > ``` > > To effectively unload a model (assuming `llama2` is loaded) > > See this doc for more info: https://github.com/ollama/ollama/blob/main/docs/faq.md#how-do-i-keep-a-model-loaded-in-memory-or-make-it-unload-immediately It is best to use environment variable Settings to change the time that large model services keep alive!!
Author
Owner

@sistemasici commented on GitHub (Apr 23, 2024):

Set enviroment value in

vim /etc/systemd/system/ollama.service

Environment="OLLAMA_KEEP_ALIVE=-1"

systemctl daemon-reload

systemctl restart ollama.service

<!-- gh-comment-id:2072198654 --> @sistemasici commented on GitHub (Apr 23, 2024): Set enviroment value in # vim /etc/systemd/system/ollama.service Environment="OLLAMA_KEEP_ALIVE=-1" # systemctl daemon-reload # systemctl restart ollama.service
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#26217