[GH-ISSUE #14809] feat: unload STT model from memory #56036

Closed
opened 2026-05-05 18:34:59 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @lrnd1 on GitHub (Jun 9, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/14809

Check Existing Issues

  • I have searched the existing issues and discussions.

Problem Description

I am running Open-WebUI in docker on a mac mini.
I've been playing around with different local STT models, and noticed that they wouldn't get unloaded from RAM, unless I restart the container.
The models are only loaded when the voice function is used, which is good. However, it seemingly never gets unloaded, and keeps the RAM reserved.

Another thing I noticed, that there seems to be a minor bug. When switching between the STT models, the previous model still doesn't get unloaded from RAM.
Also when I switch from a working model to one that should not work (for ex. because vocabulary file is missing), the previous model will be used, instead of throwing an error.

Desired Solution you'd like

It would be convenient if the model would simply deallocate memory after voice or dictate functions are no longer in use.
An even better solution would be to provide an option to set how long the model would remain loaded after use. This would be particularly useful in cases where multiple dictations are being processed, as it might be impractical to keep loading and unloading the model immediately.

Alternatives Considered

No response

Additional Context

No response

Originally created by @lrnd1 on GitHub (Jun 9, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/14809 ### Check Existing Issues - [x] I have searched the existing issues and discussions. ### Problem Description I am running Open-WebUI in docker on a mac mini. I've been playing around with different local STT models, and noticed that they wouldn't get unloaded from RAM, unless I restart the container. The models are only loaded when the voice function is used, which is good. However, it seemingly never gets unloaded, and keeps the RAM reserved. Another thing I noticed, that there seems to be a minor bug. When switching between the STT models, the previous model still doesn't get unloaded from RAM. Also when I switch from a working model to one that should not work (for ex. because vocabulary file is missing), the previous model will be used, instead of throwing an error. ### Desired Solution you'd like It would be convenient if the model would simply deallocate memory after voice or dictate functions are no longer in use. An even better solution would be to provide an option to set how long the model would remain loaded after use. This would be particularly useful in cases where multiple dictations are being processed, as it might be impractical to keep loading and unloading the model immediately. ### Alternatives Considered _No response_ ### Additional Context _No response_
Author
Owner

@tjbck commented on GitHub (Jun 16, 2025):

Should be addressed with 72df23ed79

<!-- gh-comment-id:2976653158 --> @tjbck commented on GitHub (Jun 16, 2025): Should be addressed with 72df23ed79b74036480ea45735350a04b6c6456b
Author
Owner

@lrnd1 commented on GitHub (Jun 18, 2025):

Should be addressed with 72df23ed79

v0.6.15 did not solve the issue.
After using STT, the CPU goes back to normal, but memory consumption stays active.

Image
Image

<!-- gh-comment-id:2983302451 --> @lrnd1 commented on GitHub (Jun 18, 2025): > Should be addressed with https://github.com/open-webui/open-webui/commit/72df23ed79b74036480ea45735350a04b6c6456b v0.6.15 did not solve the issue. After using STT, the CPU goes back to normal, but memory consumption stays active. ![Image](https://github.com/user-attachments/assets/db3988dd-b8a8-48ae-b0da-2f25139aa958) ![Image](https://github.com/user-attachments/assets/c352de9a-b9b2-4a09-b903-47b7d4957452)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#56036