[GH-ISSUE #596] feat: Support ollama's keep_alive request parameter #12136

Closed
opened 2026-04-19 18:57:01 -05:00 by GiteaMirror · 9 comments
Owner

Originally created by @zehDonut on GitHub (Jan 29, 2024).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/596

Is your feature request related to a problem? Please describe.
Ollama unloads models after 5 minutes by default. A new parameter, keep_alive, allows the user to set a custom value.
It would be nice to have an option in the UI where a value for this parameter can be set.

Here is the relevant PR: #2146

Originally created by @zehDonut on GitHub (Jan 29, 2024). Original GitHub issue: https://github.com/open-webui/open-webui/issues/596 **Is your feature request related to a problem? Please describe.** Ollama unloads models after 5 minutes by default. A new parameter, `keep_alive`, allows the user to set a custom value. It would be nice to have an option in the UI where a value for this parameter can be set. Here is the relevant PR: [#2146](https://github.com/ollama/ollama/pull/2146)
GiteaMirror added the enhancementgood first issue labels 2026-04-19 18:57:01 -05:00
Author
Owner

@justinh-rahb commented on GitHub (Jan 29, 2024):

That would be useful, indeed.

<!-- gh-comment-id:1915512240 --> @justinh-rahb commented on GitHub (Jan 29, 2024): That would be useful, indeed.
Author
Owner

@tjthejuggler commented on GitHub (Feb 2, 2024):

Is your feature request related to a problem? Please describe. Ollama unloads models after 5 minutes by default. A new parameter, keep_alive, allows the user to set a custom value. It would be nice to have an option in the UI where a value for this parameter can be set.

Here is the relevant PR: #2146

Do you know how to use keep_alive? nothing I try seems to affect the time it takes for it to unload

curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": 0 }'
curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "0s" }'
curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "0m" }'
curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "1s" }'
curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "1m" }'
curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "0" }'

these all behave the same for me, they continue to use memory indefinitely. Any idea why?

<!-- gh-comment-id:1923693643 --> @tjthejuggler commented on GitHub (Feb 2, 2024): > **Is your feature request related to a problem? Please describe.** Ollama unloads models after 5 minutes by default. A new parameter, `keep_alive`, allows the user to set a custom value. It would be nice to have an option in the UI where a value for this parameter can be set. > > Here is the relevant PR: [#2146](https://github.com/ollama/ollama/pull/2146) Do you know how to use keep_alive? nothing I try seems to affect the time it takes for it to unload curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": 0 }' curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "0s" }' curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "0m" }' curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "1s" }' curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "1m" }' curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "0" }' these all behave the same for me, they continue to use memory indefinitely. Any idea why?
Author
Owner

@zehDonut commented on GitHub (Feb 2, 2024):

Do you know how to use keep_alive? nothing I try seems to affect the time it takes for it to unload

curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": 0 }' curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "0s" }' curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "0m" }' curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "1s" }' curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "1m" }' curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "0" }'

these all behave the same for me, they continue to use memory indefinitely. Any idea why?

Are you running the pre-release version? v0.1.23?

<!-- gh-comment-id:1923723886 --> @zehDonut commented on GitHub (Feb 2, 2024): > Do you know how to use keep_alive? nothing I try seems to affect the time it takes for it to unload > > curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": 0 }' curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "0s" }' curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "0m" }' curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "1s" }' curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "1m" }' curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "0" }' > > these all behave the same for me, they continue to use memory indefinitely. Any idea why? Are you running the pre-release version? [v0.1.23](https://github.com/ollama/ollama/releases/tag/v0.1.23)?
Author
Owner

@tjthejuggler commented on GitHub (Feb 2, 2024):

Do you know how to use keep_alive? nothing I try seems to affect the time it takes for it to unload
curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": 0 }' curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "0s" }' curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "0m" }' curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "1s" }' curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "1m" }' curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "0" }'
these all behave the same for me, they continue to use memory indefinitely. Any idea why?

Are you running the pre-release version? v0.1.23?

Thanks so much! I was on 0.1.22, but I got it now. It is definitely releasing it based on the timing I indicate with keep_awake, but it doesn't completely release it. if i use a model that uses 1500, it releases down to 500ish based on the timing I indicate. Why doesn't it go down to 0? Is it a bug or is it not expected to? After i use ollama ever on my computer i either have to force cancel it or restart to get it completely out of memory.

Also, even with keep_awake = 0, if I loop a few different models that each run fine on their own, then after a few cycles the usage builds up too much and i get a memory overload crash.

There must be some way to completely remove it from memory after it has been used, isn't there?

<!-- gh-comment-id:1924023517 --> @tjthejuggler commented on GitHub (Feb 2, 2024): > > Do you know how to use keep_alive? nothing I try seems to affect the time it takes for it to unload > > curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": 0 }' curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "0s" }' curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "0m" }' curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "1s" }' curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "1m" }' curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "0" }' > > these all behave the same for me, they continue to use memory indefinitely. Any idea why? > > Are you running the pre-release version? [v0.1.23](https://github.com/ollama/ollama/releases/tag/v0.1.23)? Thanks so much! I was on 0.1.22, but I got it now. It is definitely releasing it based on the timing I indicate with keep_awake, but it doesn't completely release it. if i use a model that uses 1500, it releases down to 500ish based on the timing I indicate. Why doesn't it go down to 0? Is it a bug or is it not expected to? After i use ollama ever on my computer i either have to force cancel it or restart to get it completely out of memory. Also, even with keep_awake = 0, if I loop a few different models that each run fine on their own, then after a few cycles the usage builds up too much and i get a memory overload crash. There must be some way to completely remove it from memory after it has been used, isn't there?
Author
Owner

@zehDonut commented on GitHub (Feb 2, 2024):

Why doesn't it go down to 0? Is it a bug or is it not expected to? After i use ollama ever on my computer i either have to force cancel it or restart to get it completely out of memory.

For me it goes down to ~200MB, it goes up by a few MB for additional models, but not as much as yours. Unfortunately I don't know what is going on here.

Are you running large models?

<!-- gh-comment-id:1924071538 --> @zehDonut commented on GitHub (Feb 2, 2024): >Why doesn't it go down to 0? Is it a bug or is it not expected to? After i use ollama ever on my computer i either have to force cancel it or restart to get it completely out of memory. For me it goes down to ~200MB, it goes up by a few MB for additional models, but not as much as yours. Unfortunately I don't know what is going on here. Are you running large models?
Author
Owner

@justinh-rahb commented on GitHub (Feb 2, 2024):

That's most likely down to how your host system is managing memory caching. Some OSes try to be helpful by keeping things you've just closed resident in memory... y'know just in case you closed it by mistake, or you're going to need it again in a few minutes. There's ways to tune this behaviour but that's out of scope of this project, and probably out of scope for Ollama too.

<!-- gh-comment-id:1924079942 --> @justinh-rahb commented on GitHub (Feb 2, 2024): That's most likely down to how your host system is managing memory caching. Some OSes try to be helpful by keeping things you've just closed resident in memory... y'know just in case you closed it by mistake, or you're going to need it again in a few minutes. There's ways to tune this behaviour but that's out of scope of this project, and probably out of scope for Ollama too.
Author
Owner

@tjthejuggler commented on GitHub (Feb 2, 2024):

@zehDonut @justinh-rahb Thanks so much, it is helpful to know that I was barking up the wrong tree. I found a solution that works for me.

Here is the exact situation for anyone else with the same issue. I am bouncing between generating images with stable diffusion and using llava to describe them and then another model to make new image prompts. I need to occasionally kill llama or the ram goes too high and crashes, but the ollama processes need sudo to kill them, so now I am inputting my sudo password into the program on runtime so it can use it to occasionally the ollama process which I find with a combination of nvidia-smi and a regex.

<!-- gh-comment-id:1924173045 --> @tjthejuggler commented on GitHub (Feb 2, 2024): @zehDonut @justinh-rahb Thanks so much, it is helpful to know that I was barking up the wrong tree. I found a solution that works for me. Here is the exact situation for anyone else with the same issue. I am bouncing between generating images with stable diffusion and using llava to describe them and then another model to make new image prompts. I need to occasionally kill llama or the ram goes too high and crashes, but the ollama processes need sudo to kill them, so now I am inputting my sudo password into the program on runtime so it can use it to occasionally the ollama process which I find with a combination of nvidia-smi and a regex.
Author
Owner

@zabirauf commented on GitHub (Feb 13, 2024):

Created PR #721 to support this.

<!-- gh-comment-id:1940636257 --> @zabirauf commented on GitHub (Feb 13, 2024): Created PR #721 to support this.
Author
Owner

@ALIENvsROBOT commented on GitHub (May 4, 2025):

I personally think. This feature should be in advanced parameters In the model section too. not just settings. As settings is for private people so its keep fluctuating between different user for ollama.

<!-- gh-comment-id:2849157034 --> @ALIENvsROBOT commented on GitHub (May 4, 2025): I personally think. This feature should be in advanced parameters In the model section too. not just settings. As settings is for private people so its keep fluctuating between different user for ollama.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#12136