[GH-ISSUE #3987] Preload Ollama models upon user login #13456

Closed
opened 2026-04-19 20:11:18 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @knguyen298 on GitHub (Jul 18, 2024).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/3987

Is your feature request related to a problem? Please describe.

After periods of no activity, models are unloaded to save memory. Re-loading those models can take quite a long time, and the UI will sometimes get stuck showing the "loading" animation until you refresh.

Describe the solution you'd like

Upon a user logging in, pre-load the default model. This default model can be set per-user, or as a global setting, or both, with admin settings to enable per-user setup.

This would be accomplished by using the documented method of sending the Ollama server an empty request.
For example:
curl http://localhost:11434/api/chat -d '{"model": "mistral"}'

This feature can also be disabled, as well as a limit on how many models are (pre)loaded at a single time. Utilizing the documented GET /api/ps call returns a JSON object, which contains an array of models loaded, and can be used to count the number of models loaded before pre-loading another one.

Describe alternatives you've considered
Models can be set up to expire after a longer period, but isn't ideal if you don't need more continuous usage.

Originally created by @knguyen298 on GitHub (Jul 18, 2024). Original GitHub issue: https://github.com/open-webui/open-webui/issues/3987 **Is your feature request related to a problem? Please describe.** After periods of no activity, models are unloaded to save memory. Re-loading those models can take quite a long time, and the UI will sometimes get stuck showing the "loading" animation until you refresh. **Describe the solution you'd like** Upon a user logging in, pre-load the default model. This default model can be set per-user, or as a global setting, or both, with admin settings to enable per-user setup. This would be accomplished by using the [documented method of sending the Ollama server an empty request.](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-preload-a-model-into-ollama-to-get-faster-response-times) For example: `curl http://localhost:11434/api/chat -d '{"model": "mistral"}'` This feature can also be disabled, as well as a limit on how many models are (pre)loaded at a single time. Utilizing the [documented](https://github.com/ollama/ollama/blob/main/docs/api.md#list-running-models) `GET /api/ps` call returns a JSON object, which contains an array of models loaded, and can be used to count the number of models loaded before pre-loading another one. **Describe alternatives you've considered** Models can be set up to expire after a longer period, but isn't ideal if you don't need more continuous usage.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#13456