mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 19:38:46 -05:00
[GH-ISSUE #596] feat: Support ollama's keep_alive request parameter #27664
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @zehDonut on GitHub (Jan 29, 2024).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/596
Is your feature request related to a problem? Please describe.
Ollama unloads models after 5 minutes by default. A new parameter,
keep_alive, allows the user to set a custom value.It would be nice to have an option in the UI where a value for this parameter can be set.
Here is the relevant PR: #2146
@justinh-rahb commented on GitHub (Jan 29, 2024):
That would be useful, indeed.
@tjthejuggler commented on GitHub (Feb 2, 2024):
Do you know how to use keep_alive? nothing I try seems to affect the time it takes for it to unload
curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": 0 }'
curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "0s" }'
curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "0m" }'
curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "1s" }'
curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "1m" }'
curl http://localhost:11434/api/generate -d '{ "model": "tinyllama", "prompt": "Why is the sky blue??", "keep_alive": "0" }'
these all behave the same for me, they continue to use memory indefinitely. Any idea why?
@zehDonut commented on GitHub (Feb 2, 2024):
Are you running the pre-release version? v0.1.23?
@tjthejuggler commented on GitHub (Feb 2, 2024):
Thanks so much! I was on 0.1.22, but I got it now. It is definitely releasing it based on the timing I indicate with keep_awake, but it doesn't completely release it. if i use a model that uses 1500, it releases down to 500ish based on the timing I indicate. Why doesn't it go down to 0? Is it a bug or is it not expected to? After i use ollama ever on my computer i either have to force cancel it or restart to get it completely out of memory.
Also, even with keep_awake = 0, if I loop a few different models that each run fine on their own, then after a few cycles the usage builds up too much and i get a memory overload crash.
There must be some way to completely remove it from memory after it has been used, isn't there?
@zehDonut commented on GitHub (Feb 2, 2024):
For me it goes down to ~200MB, it goes up by a few MB for additional models, but not as much as yours. Unfortunately I don't know what is going on here.
Are you running large models?
@justinh-rahb commented on GitHub (Feb 2, 2024):
That's most likely down to how your host system is managing memory caching. Some OSes try to be helpful by keeping things you've just closed resident in memory... y'know just in case you closed it by mistake, or you're going to need it again in a few minutes. There's ways to tune this behaviour but that's out of scope of this project, and probably out of scope for Ollama too.
@tjthejuggler commented on GitHub (Feb 2, 2024):
@zehDonut @justinh-rahb Thanks so much, it is helpful to know that I was barking up the wrong tree. I found a solution that works for me.
Here is the exact situation for anyone else with the same issue. I am bouncing between generating images with stable diffusion and using llava to describe them and then another model to make new image prompts. I need to occasionally kill llama or the ram goes too high and crashes, but the ollama processes need sudo to kill them, so now I am inputting my sudo password into the program on runtime so it can use it to occasionally the ollama process which I find with a combination of nvidia-smi and a regex.
@zabirauf commented on GitHub (Feb 13, 2024):
Created PR #721 to support this.
@ALIENvsROBOT commented on GitHub (May 4, 2025):
I personally think. This feature should be in advanced parameters In the model section too. not just settings. As settings is for private people so its keep fluctuating between different user for ollama.