[GH-ISSUE #8814] Show in UI GPU status (models loaded, VRAM available) #53941

New Issue

GiteaMirror · 2026-05-05T15:35:59-05:00

GiteaMirror commented

2026-05-05 15:35:59 -05:00

Originally created by @JusefPol on GitHub (Jan 23, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/8814

Feature Request

Hi guys, I have check the Issues and I haven't found this feature request:

One of the things I have noticed is that when using WebUI, and switch between models, I ended up connecting to my Ollama server to check if the previous model I was using has been dropped already, or if I have enough VRAM available to run another one in parallel. I know I can configure the model to drop immediately, but having some time with the model lingering allows for time to write prompts without the need to reload the model.

But still, it would be nice to know if I have enough VRAM available to run another model for example directly from the UI. I also have a couple of friends accessing my UI sometimes, and they have no way of knowing which model is loaded, or if there is capacity on the GPUs to run the one they want. with the outputs of nvidia-smi and ollama ps you can pretty much get that information, but no idea if is possible to do it from the UI. (since the UI access only the API, I guess the API has to support it first. But I though I drop the idea here in case it gains traction).

Thanks.

Originally created by @JusefPol on GitHub (Jan 23, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/8814 # Feature Request Hi guys, I have check the Issues and I haven't found this feature request: One of the things I have noticed is that when using WebUI, and switch between models, I ended up connecting to my Ollama server to check if the previous model I was using has been dropped already, or if I have enough VRAM available to run another one in parallel. I know I can configure the model to drop immediately, but having some time with the model lingering allows for time to write prompts without the need to reload the model. But still, it would be nice to know if I have enough VRAM available to run another model for example directly from the UI. I also have a couple of friends accessing my UI sometimes, and they have no way of knowing which model is loaded, or if there is capacity on the GPUs to run the one they want. with the outputs of nvidia-smi and ollama ps you can pretty much get that information, but no idea if is possible to do it from the UI. (since the UI access only the API, I guess the API has to support it first. But I though I drop the idea here in case it gains traction). Thanks.

GiteaMirror closed this issue

2026-05-05 15:36:00 -05:00

GiteaMirror commented

2026-05-05 15:36:02 -05:00

@panda44312 commented on GitHub (Jan 23, 2025):

#8176

@panda44312 commented on GitHub (Jan 23, 2025): #8176

Sign in to join this conversation.