[GH-ISSUE #1045] [Question] Open-Webui, how is it releasing memory? #12311

Closed
opened 2026-04-19 19:12:55 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @tilllt on GitHub (Mar 5, 2024).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/1045

Hey there,

i have several AI Services running (Whisper-API, Auto111/StableDiffusion, TTS-Web-GUI), which all compete for the little 12GB VRAM of our GPU. Open-Webui / Ollama are the most frequently used and it happens quite regularly, that - even if no one is actively using Ollama- there is no VRAM available for the other servies.

It seems that open-webui's detection of when a session is active (keep the model loaded) or not (free the vram) is not working reliably. How is this handled at the moment?

Cheers

Originally created by @tilllt on GitHub (Mar 5, 2024). Original GitHub issue: https://github.com/open-webui/open-webui/issues/1045 Hey there, i have several AI Services running (Whisper-API, Auto111/StableDiffusion, TTS-Web-GUI), which all compete for the little 12GB VRAM of our GPU. Open-Webui / Ollama are the most frequently used and it happens quite regularly, that - even if no one is actively using Ollama- there is no VRAM available for the other servies. It seems that open-webui's detection of when a session is active (keep the model loaded) or not (free the vram) is not working reliably. How is this handled at the moment? Cheers
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#12311