[GH-ISSUE #2121] Feature request: control session duration of loaded models #1212

Closed
opened 2026-04-12 10:59:14 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @nperez on GitHub (Jan 21, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2121

I have a use case where multiple processes (stable diffusion, whsiper, ollama, etc) are competing for limited GPU resources and I need to share the GPU. Unfortunately, there doesn't appear to be a way to manage the session lifetime of loaded models in ollama. It would be cool to have the ability via model options to control the session lifetime (ie. unload after each request) or have a new endpoint to unconditionally unload whatever model is loaded. Without this feature, I need to manage (kill, then restart) the ollama process or wait the five minutes that is the current defaultSessionDuration in routes.go. Before v0.1.18, I probably would have just killed the separate runner process which would leave the api server intact, but now that it is integrated, that isn't really an option any more.

Originally created by @nperez on GitHub (Jan 21, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2121 I have a use case where multiple processes (stable diffusion, whsiper, ollama, etc) are competing for limited GPU resources and I need to share the GPU. Unfortunately, there doesn't appear to be a way to manage the session lifetime of loaded models in ollama. It would be cool to have the ability via model options to control the session lifetime (ie. unload after each request) or have a new endpoint to unconditionally unload whatever model is loaded. Without this feature, I need to manage (kill, then restart) the ollama process or wait the five minutes that is the current `defaultSessionDuration` in routes.go. Before v0.1.18, I probably would have just killed the separate runner process which would leave the api server intact, but now that it is integrated, that isn't really an option any more.
GiteaMirror added the feature request label 2026-04-12 10:59:14 -05:00
Author
Owner

@pdevine commented on GitHub (Jan 27, 2024):

You will be able to use the new keep_alive parameter which was just checked in in #2146 . You can set it to 0 and it will automatically unload the model after inference is completed.

<!-- gh-comment-id:1912861147 --> @pdevine commented on GitHub (Jan 27, 2024): You will be able to use the new `keep_alive` parameter which was just checked in in #2146 . You can set it to `0` and it will automatically unload the model after inference is completed.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#1212