[GH-ISSUE #15055] 0.18.x idle VRAM usage and power consumption #9662

Open
opened 2026-04-12 22:33:09 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @fluxlinkage on GitHub (Mar 25, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15055

What is the issue?

I was using Ollama 0.17.7 under Windows 11 and everything is fine.
However, after I updated to 0.18.2, my fans become noisy even if idle.
The output of nvidia-smi shows that a ollama process is using 262MB VRAM, even if ollama is idle (Not running any models, only system tray icon).

Wed Mar 25 10:32:19 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 591.59                 Driver Version: 591.59         CUDA Version: 13.1     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A40                   TCC   |   00000000:01:00.0 Off |                  Off |
|  0%   58C    P0             85W /  300W |     272MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A           15276      C   ...al\Programs\Ollama\ollama.exe        262MiB |
+-----------------------------------------------------------------------------------------+

At the same time, ollama ps says no model is running.

Downgrading to 0.18.0, the same problem.

Downgrading to 0.17.7, everything is OK again. The output of nvidia-smi is normal.

Wed Mar 25 11:02:20 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 591.59                 Driver Version: 591.59         CUDA Version: 13.1     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A40                   TCC   |   00000000:01:00.0 Off |                  Off |
|  0%   45C    P8             14W /  300W |      10MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

So why are the VRAM used in 0.18.x when idle? Is this a new feature (if yes, can I manually turn it off?) or just a bug? I can't accept 70 Watts additional idle power!

Relevant log output

I don't know whether it is relevant, but the following error only exists in 0.18.x log (about 3 seconds after server start, reproducible). In 0.17.7, on such error.

Error #01: write tcp 127.0.0.1:11434->127.0.0.1:54305: wsasend: An established connection was aborted by the software in your host machine.

OS

Windows

GPU

Nvidia

CPU

AMD

Ollama version

0.18.2

Originally created by @fluxlinkage on GitHub (Mar 25, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15055 ### What is the issue? I was using Ollama 0.17.7 under Windows 11 and everything is fine. However, after I updated to 0.18.2, my fans become noisy even if idle. The output of `nvidia-smi` shows that a ollama process is using 262MB VRAM, even if ollama is idle (Not running any models, only system tray icon). ``` Wed Mar 25 10:32:19 2026 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 591.59 Driver Version: 591.59 CUDA Version: 13.1 | +-----------------------------------------+------------------------+----------------------+ | GPU Name Driver-Model | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA A40 TCC | 00000000:01:00.0 Off | Off | | 0% 58C P0 85W / 300W | 272MiB / 49140MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 15276 C ...al\Programs\Ollama\ollama.exe 262MiB | +-----------------------------------------------------------------------------------------+ ``` At the same time, `ollama ps` says no model is running. Downgrading to 0.18.0, the same problem. Downgrading to 0.17.7, everything is OK again. The output of `nvidia-smi` is normal. ``` Wed Mar 25 11:02:20 2026 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 591.59 Driver Version: 591.59 CUDA Version: 13.1 | +-----------------------------------------+------------------------+----------------------+ | GPU Name Driver-Model | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA A40 TCC | 00000000:01:00.0 Off | Off | | 0% 45C P8 14W / 300W | 10MiB / 49140MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+ ``` So why are the VRAM used in 0.18.x when idle? Is this a new feature (if yes, can I manually turn it off?) or just a bug? I can't accept 70 Watts additional idle power! ### Relevant log output ```shell I don't know whether it is relevant, but the following error only exists in 0.18.x log (about 3 seconds after server start, reproducible). In 0.17.7, on such error. Error #01: write tcp 127.0.0.1:11434->127.0.0.1:54305: wsasend: An established connection was aborted by the software in your host machine. ``` ### OS Windows ### GPU Nvidia ### CPU AMD ### Ollama version 0.18.2
GiteaMirror added the bug label 2026-04-12 22:33:09 -05:00
Author
Owner

@solidaxelproject commented on GitHub (Mar 25, 2026):

Same root cause — the runner process in 0.18.x doesn't release memory properly. In my case it manifests as progressive 404 errors under sustained load, with runner RSS growing ~2 MB per request. See full benchmark data and reproduction script here:
https://github.com/ollama/ollama/issues/15027

<!-- gh-comment-id:4126016442 --> @solidaxelproject commented on GitHub (Mar 25, 2026): Same root cause — the runner process in 0.18.x doesn't release memory properly. In my case it manifests as progressive 404 errors under sustained load, with runner RSS growing ~2 MB per request. See full benchmark data and reproduction script here: [https://github.com/ollama/ollama/issues/15027](https://github.com/ollama/ollama/issues/15027)
Author
Owner

@fluxlinkage commented on GitHub (Mar 31, 2026):

Updated to 0.19.0, still same issue.
I think it is related to MLX. So I deleted mlx_cuda_v13 folder under %USERPROFILE%\AppData\Local\Programs\Ollama\lib\ollama, and everything is OK (fallback to cuda beckend)!
Why use MLX on Windows?

<!-- gh-comment-id:4159452008 --> @fluxlinkage commented on GitHub (Mar 31, 2026): Updated to 0.19.0, still same issue. I think it is related to MLX. So I deleted `mlx_cuda_v13` folder under `%USERPROFILE%\AppData\Local\Programs\Ollama\lib\ollama`, and everything is OK (fallback to cuda beckend)! Why use MLX on Windows?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9662