[GH-ISSUE #4443] Models remain resident in VRAM after deletion #28534

Closed
opened 2026-04-22 06:47:12 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @coder543 on GitHub (May 15, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4443

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

I downloaded the wrong model, ran it, realized my mistake, then deleted it, and noticed it was still listed as being present in VRAM according to ollama ps.

$ ollama ps
NAME            ID              SIZE    PROCESSOR       UNTIL
yi:9b-v1.5-q8_0 6ea05582d5ca    10 GB   100% GPU        4 minutes from now
$ ollama rm yi:9b-v1.5-q8_0
deleted 'yi:9b-v1.5-q8_0'
$ ollama ps
NAME            ID              SIZE    PROCESSOR       UNTIL
yi:9b-v1.5-q8_0 6ea05582d5ca    10 GB   100% GPU        4 minutes from now
$ nvidia-smi
Wed May 15 02:48:11 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04             Driver Version: 535.171.04   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090        Off | 00000000:01:00.0 Off |                  N/A |
|  0%   50C    P8              24W / 420W |   9588MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A     63185      C   ...unners/cuda_v11/ollama_llama_server     9582MiB |
+---------------------------------------------------------------------------------------+

OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

0.1.38

Originally created by @coder543 on GitHub (May 15, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4443 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? I downloaded the wrong model, ran it, realized my mistake, then deleted it, and noticed it was still listed as being present in VRAM according to `ollama ps`. ```bash $ ollama ps NAME ID SIZE PROCESSOR UNTIL yi:9b-v1.5-q8_0 6ea05582d5ca 10 GB 100% GPU 4 minutes from now $ ollama rm yi:9b-v1.5-q8_0 deleted 'yi:9b-v1.5-q8_0' $ ollama ps NAME ID SIZE PROCESSOR UNTIL yi:9b-v1.5-q8_0 6ea05582d5ca 10 GB 100% GPU 4 minutes from now $ nvidia-smi Wed May 15 02:48:11 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.171.04 Driver Version: 535.171.04 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3090 Off | 00000000:01:00.0 Off | N/A | | 0% 50C P8 24W / 420W | 9588MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 63185 C ...unners/cuda_v11/ollama_llama_server 9582MiB | +---------------------------------------------------------------------------------------+ ``` ### OS Linux ### GPU Nvidia ### CPU AMD ### Ollama version 0.1.38
GiteaMirror added the bug label 2026-04-22 06:47:12 -05:00
Author
Owner

@dhiltgen commented on GitHub (May 21, 2024):

As noted in the output of ps it will unload after 5 minutes by default (looks like you had about 4 minutes remaining.) We'll also unload an idle model if we need the VRAM for loading other models automatically, so you can safely pull and run the model you actually wanted to load.

If you upgrade to the latest version, you can use ollama.exe run yi:9b-v1.5-q8_0 --keepalive 0 "" to quickly trigger an unload.

<!-- gh-comment-id:2123535676 --> @dhiltgen commented on GitHub (May 21, 2024): As noted in the output of `ps` it will unload after 5 minutes by default (looks like you had about 4 minutes remaining.) We'll also unload an idle model if we need the VRAM for loading other models automatically, so you can safely pull and run the model you actually wanted to load. If you upgrade to the latest version, you can use `ollama.exe run yi:9b-v1.5-q8_0 --keepalive 0 ""` to quickly trigger an unload.
Author
Owner

@coder543 commented on GitHub (May 21, 2024):

I agree with what you said in general, but it is still surprising behavior, and if you're using the GPU for other things besides ollama... there is a window of opportunity for the surprising behavior to cause an OOM for whatever other application is trying to load a model into VRAM.

There is no valid use case for a deleted model to remain in VRAM, since you cannot use the deleted model. (ollama will either complain or start downloading it again, rather than just using it from VRAM.)

But, it's fine, I guess.

<!-- gh-comment-id:2123537482 --> @coder543 commented on GitHub (May 21, 2024): I agree with what you said in general, but it is still surprising behavior, and if you're using the GPU for other things besides `ollama`... there is a window of opportunity for the surprising behavior to cause an OOM for whatever other application is trying to load a model into VRAM. There is no valid use case for a deleted model to remain in VRAM, since you cannot use the deleted model. (`ollama` will either complain or start downloading it again, rather than just using it from VRAM.) But, it's fine, I guess.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#28534