[GH-ISSUE #10597] Unloading model doesn't free all GPU memory (v0.6.8, Radeon RX 7900 XTX, Windows 11) #6971

Open
opened 2026-04-12 18:52:00 -05:00 by GiteaMirror · 17 comments
Owner

Originally created by @8forty on GitHub (May 6, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10597

What is the issue?

I've boiled down my code to a pretty simple (and obviously contrived) example, and I'm watching GPU memory in both GPU-Z and Task Manager. Each iteration of loop starts with about 300MB more GPU memory used before loading the model. ollama ps always shows no models running after the code finishes.

I start by quitting ollama, GPU-Z shows 2MB Memory Used (Dedicated). Start ollama and it goes up to 153MB. Then each loop after the unload (these numbers are consistent/duplicate for each restart):

1: 453MB
2: 751MB
3: 1051MB
...
10: 3092MB
...
20: 5835MB

I also tried a couple of other models ('gemma3:1b', 'gemma3:12b', 'gemma3:27b'), same results.

I see that AMD support is relatively new so maybe this kind of issue is expected? Is there something else I can do (short of killing ollama) to ensure that all GPU memory is released between model loads?

I've got the server log with OLLAMA_DEBUG=1 for just first 2 iterations of the loop, not sure what to look for that might be helpful. I've grep'd out lines with amd/gpu/memory/hip below.

import ollama
from ollama import ChatResponse

model_name = 'llama3.2:3b'
messages = [
    {'role': 'system', 'content': 'You are a helpful chatbot that talks in a professional manner.'},
    {'role': 'user', 'content': 'where is paris?'},
]
print(f'model: {model_name}')
client = ollama.Client(host='http://localhost:11434')

for i in range(10):
    print(f'--- loop {i}')
    chat_response: ChatResponse = client.chat(
        model=model_name,
        messages=messages,
        stream=False,
    )
    print(f'response: {chat_response.message.content}')
    client.generate(model=model_name, keep_alive=0.0)  # unload the model

Relevant log output

time=2025-05-06T16:51:34.791-07:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-05-06T16:51:34.792-07:00 level=INFO source=gpu_windows.go:167 msg=packages count=1
time=2025-05-06T16:51:34.792-07:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=18 efficiency=0 threads=36
time=2025-05-06T16:51:34.792-07:00 level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA"
time=2025-05-06T16:51:34.792-07:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=nvml.dll
time=2025-05-06T16:51:34.792-07:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\nvml.dll C:\\java\\jdk1.8.0_144\\bin\\nvml.dll C:\\java\\jdk1.8.0_144\\bin\\nvml.dll C:\\java\\jdk1.8.0_144\\bin\\nvml.dll C:\\anaconda3\\nvml.dll C:\\anaconda3\\Library\\mingw-w64\\bin\\nvml.dll C:\\anaconda3\\Library\\usr\\bin\\nvml.dll C:\\anaconda3\\Library\\bin\\nvml.dll C:\\anaconda3\\Scripts\\nvml.dll C:\\anaconda3\\bin\\nvml.dll C:\\anaconda3\\condabin\\nvml.dll C:\\java\\jdk1.8.0_144\\bin\\nvml.dll C:\\cygwin64\\usr\\local\\bin\\nvml.dll C:\\cygwin64\\bin\\nvml.dll C:\\WINDOWS\\system32\\nvml.dll C:\\WINDOWS\\nvml.dll C:\\WINDOWS\\System32\\Wbem\\nvml.dll C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\nvml.dll C:\\WINDOWS\\System32\\OpenSSH\\nvml.dll C:\\Program Files (x86)\\Intel\\Intel(R) Management Engine Components\\DAL\\nvml.dll C:\\Program Files\\Intel\\Intel(R) Management Engine Components\\DAL\\nvml.dll C:\\Program Files\\Git\\cmd\\nvml.dll C:\\Program Files\\dotnet\\nvml.dll C:\\Program Files (x86)\\Windows Kits\\10\\Windows Performance Toolkit\\nvml.dll C:\\Program Files\\CMake\\bin\\nvml.dll C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\Llvm\\x64\\bin\\nvml.dll C:\\Program Files\\Go\\bin\\nvml.dll C:\\TDM-GCC-64\\bin\\nvml.dll C:\\Users\\User\\AppData\\Local\\Microsoft\\WindowsApps\\nvml.dll C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\nvml.dll C:\\Users\\User\\AppData\\Local\\JetBrains\\Toolbox\\scripts\\nvml.dll C:\\Users\\User\\go\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\c\\nvml.dll C:\\anaconda3\\Scripts\\nvml.dll C:\\cygwin64\\home\\kf\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\c\\nvml.dll C:\\anaconda3\\Scripts\\nvml.dll C:\\cygwin64\\home\\kf\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\c\\nvml.dll C:\\anaconda3\\Scripts\\nvml.dll C:\\cygwin64\\home\\kf\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\c\\nvml.dll C:\\anaconda3\\Scripts\\nvml.dll c:\\Windows\\System32\\nvml.dll]"
time=2025-05-06T16:51:34.794-07:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths=[]
time=2025-05-06T16:51:34.794-07:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=nvcuda.dll
time=2025-05-06T16:51:34.794-07:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\nvcuda.dll C:\\java\\jdk1.8.0_144\\bin\\nvcuda.dll C:\\java\\jdk1.8.0_144\\bin\\nvcuda.dll C:\\java\\jdk1.8.0_144\\bin\\nvcuda.dll C:\\anaconda3\\nvcuda.dll C:\\anaconda3\\Library\\mingw-w64\\bin\\nvcuda.dll C:\\anaconda3\\Library\\usr\\bin\\nvcuda.dll C:\\anaconda3\\Library\\bin\\nvcuda.dll C:\\anaconda3\\Scripts\\nvcuda.dll C:\\anaconda3\\bin\\nvcuda.dll C:\\anaconda3\\condabin\\nvcuda.dll C:\\java\\jdk1.8.0_144\\bin\\nvcuda.dll C:\\cygwin64\\usr\\local\\bin\\nvcuda.dll C:\\cygwin64\\bin\\nvcuda.dll C:\\WINDOWS\\system32\\nvcuda.dll C:\\WINDOWS\\nvcuda.dll C:\\WINDOWS\\System32\\Wbem\\nvcuda.dll C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\nvcuda.dll C:\\WINDOWS\\System32\\OpenSSH\\nvcuda.dll C:\\Program Files (x86)\\Intel\\Intel(R) Management Engine Components\\DAL\\nvcuda.dll C:\\Program Files\\Intel\\Intel(R) Management Engine Components\\DAL\\nvcuda.dll C:\\Program Files\\Git\\cmd\\nvcuda.dll C:\\Program Files\\dotnet\\nvcuda.dll C:\\Program Files (x86)\\Windows Kits\\10\\Windows Performance Toolkit\\nvcuda.dll C:\\Program Files\\CMake\\bin\\nvcuda.dll C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\Llvm\\x64\\bin\\nvcuda.dll C:\\Program Files\\Go\\bin\\nvcuda.dll C:\\TDM-GCC-64\\bin\\nvcuda.dll C:\\Users\\User\\AppData\\Local\\Microsoft\\WindowsApps\\nvcuda.dll C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\nvcuda.dll C:\\Users\\User\\AppData\\Local\\JetBrains\\Toolbox\\scripts\\nvcuda.dll C:\\Users\\User\\go\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\c\\nvcuda.dll C:\\anaconda3\\Scripts\\nvcuda.dll C:\\cygwin64\\home\\kf\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\c\\nvcuda.dll C:\\anaconda3\\Scripts\\nvcuda.dll C:\\cygwin64\\home\\kf\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\c\\nvcuda.dll C:\\anaconda3\\Scripts\\nvcuda.dll C:\\cygwin64\\home\\kf\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\c\\nvcuda.dll C:\\anaconda3\\Scripts\\nvcuda.dll c:\\windows\\system*\\nvcuda.dll]"
time=2025-05-06T16:51:34.796-07:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths=[]
time=2025-05-06T16:51:34.796-07:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=cudart64_*.dll
time=2025-05-06T16:51:34.796-07:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cudart64_*.dll C:\\java\\jdk1.8.0_144\\bin\\cudart64_*.dll C:\\java\\jdk1.8.0_144\\bin\\cudart64_*.dll C:\\java\\jdk1.8.0_144\\bin\\cudart64_*.dll C:\\anaconda3\\cudart64_*.dll C:\\anaconda3\\Library\\mingw-w64\\bin\\cudart64_*.dll C:\\anaconda3\\Library\\usr\\bin\\cudart64_*.dll C:\\anaconda3\\Library\\bin\\cudart64_*.dll C:\\anaconda3\\Scripts\\cudart64_*.dll C:\\anaconda3\\bin\\cudart64_*.dll C:\\anaconda3\\condabin\\cudart64_*.dll C:\\java\\jdk1.8.0_144\\bin\\cudart64_*.dll C:\\cygwin64\\usr\\local\\bin\\cudart64_*.dll C:\\cygwin64\\bin\\cudart64_*.dll C:\\WINDOWS\\system32\\cudart64_*.dll C:\\WINDOWS\\cudart64_*.dll C:\\WINDOWS\\System32\\Wbem\\cudart64_*.dll C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\cudart64_*.dll C:\\WINDOWS\\System32\\OpenSSH\\cudart64_*.dll C:\\Program Files (x86)\\Intel\\Intel(R) Management Engine Components\\DAL\\cudart64_*.dll C:\\Program Files\\Intel\\Intel(R) Management Engine Components\\DAL\\cudart64_*.dll C:\\Program Files\\Git\\cmd\\cudart64_*.dll C:\\Program Files\\dotnet\\cudart64_*.dll C:\\Program Files (x86)\\Windows Kits\\10\\Windows Performance Toolkit\\cudart64_*.dll C:\\Program Files\\CMake\\bin\\cudart64_*.dll C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\Llvm\\x64\\bin\\cudart64_*.dll C:\\Program Files\\Go\\bin\\cudart64_*.dll C:\\TDM-GCC-64\\bin\\cudart64_*.dll C:\\Users\\User\\AppData\\Local\\Microsoft\\WindowsApps\\cudart64_*.dll C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\cudart64_*.dll C:\\Users\\User\\AppData\\Local\\JetBrains\\Toolbox\\scripts\\cudart64_*.dll C:\\Users\\User\\go\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\c\\cudart64_*.dll C:\\anaconda3\\Scripts\\cudart64_*.dll C:\\cygwin64\\home\\kf\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\c\\cudart64_*.dll C:\\anaconda3\\Scripts\\cudart64_*.dll C:\\cygwin64\\home\\kf\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\c\\cudart64_*.dll C:\\anaconda3\\Scripts\\cudart64_*.dll C:\\cygwin64\\home\\kf\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\c\\cudart64_*.dll C:\\anaconda3\\Scripts\\cudart64_*.dll C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v*\\cudart64_*.dll c:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v*\\bin\\cudart64_*.dll]"
time=2025-05-06T16:51:34.810-07:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths="[C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v11\\cudart64_110.dll C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12\\cudart64_12.dll]"
time=2025-05-06T16:51:34.811-07:00 level=DEBUG source=gpu.go:574 msg="Unable to load cudart library C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v11\\cudart64_110.dll: your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2025-05-06T16:51:34.812-07:00 level=DEBUG source=gpu.go:574 msg="Unable to load cudart library C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12\\cudart64_12.dll: your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2025-05-06T16:51:34.870-07:00 level=DEBUG source=amd_hip_windows.go:88 msg=hipDriverGetVersion version=60241512
time=2025-05-06T16:51:34.870-07:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm"
time=2025-05-06T16:51:34.870-07:00 level=DEBUG source=amd_common.go:44 msg="detected ROCM next to ollama executable C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm"
time=2025-05-06T16:51:34.872-07:00 level=DEBUG source=amd_windows.go:73 msg="detected hip devices" count=2
time=2025-05-06T16:51:34.872-07:00 level=DEBUG source=amd_windows.go:93 msg="hip device" id=0 name="AMD Radeon Pro W5700" gfx=gfx1010:xnack-
time=2025-05-06T16:51:35.530-07:00 level=WARN source=amd_windows.go:138 msg="amdgpu is not supported (supported types:[gfx1030 gfx1100 gfx1101 gfx1102 gfx1151 gfx906])" gpu_type=gfx1010:xnack- gpu=0 library=C:\Users\User\AppData\Local\Programs\Ollama\lib\ollama\rocm
time=2025-05-06T16:51:35.531-07:00 level=DEBUG source=amd_windows.go:93 msg="hip device" id=1 name="AMD Radeon RX 7900 XTX" gfx=gfx1100
time=2025-05-06T16:51:36.175-07:00 level=DEBUG source=amd_windows.go:146 msg="amdgpu is supported" gpu=1 gpu_type=gfx1100
time=2025-05-06T16:51:36.175-07:00 level=DEBUG source=amd_windows.go:149 msg="amdgpu memory" gpu=1 total="24.0 GiB"
time=2025-05-06T16:51:36.175-07:00 level=DEBUG source=amd_windows.go:150 msg="amdgpu memory" gpu=1 available="23.8 GiB"
time=2025-05-06T16:52:02.945-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="127.7 GiB" before.free="101.0 GiB" before.free_swap="101.9 GiB" now.total="127.7 GiB" now.free="100.7 GiB" now.free_swap="100.6 GiB"
time=2025-05-06T16:52:03.630-07:00 level=DEBUG source=amd_windows.go:197 msg="updating rocm free memory" gpu=1 name="AMD Radeon RX 7900 XTX" before="23.8 GiB" now="23.7 GiB"
time=2025-05-06T16:52:03.633-07:00 level=INFO source=sched.go:188 msg="one or more GPUs detected that are unable to accurately report free memory - disabling default concurrency"
time=2025-05-06T16:52:03.682-07:00 level=DEBUG source=memory.go:108 msg=evaluating library=rocm gpu_count=1 available="[23.7 GiB]"
time=2025-05-06T16:52:03.682-07:00 level=INFO source=sched.go:754 msg="new model will fit in available VRAM in single GPU, loading" model=C:\Users\User\.ollama\models\blobs\sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff gpu=1 parallel=2 available=25440878592 required="3.7 GiB"
time=2025-05-06T16:52:03.682-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="127.7 GiB" before.free="100.7 GiB" before.free_swap="100.6 GiB" now.total="127.7 GiB" now.free="100.7 GiB" now.free_swap="100.4 GiB"
time=2025-05-06T16:52:04.345-07:00 level=DEBUG source=amd_windows.go:197 msg="updating rocm free memory" gpu=1 name="AMD Radeon RX 7900 XTX" before="23.7 GiB" now="23.5 GiB"
time=2025-05-06T16:52:04.348-07:00 level=INFO source=server.go:106 msg="system memory" total="127.7 GiB" free="100.7 GiB" free_swap="100.4 GiB"
time=2025-05-06T16:52:04.348-07:00 level=DEBUG source=memory.go:108 msg=evaluating library=rocm gpu_count=1 available="[23.7 GiB]"
time=2025-05-06T16:52:04.348-07:00 level=INFO source=server.go:139 msg=offload library=rocm layers.requested=-1 layers.model=29 layers.offload=29 layers.split="" memory.available="[23.7 GiB]" memory.gpu_overhead="0 B" memory.required.full="3.7 GiB" memory.required.partial="3.7 GiB" memory.required.kv="896.0 MiB" memory.required.allocations="[3.7 GiB]" memory.weights.total="1.9 GiB" memory.weights.repeating="1.6 GiB" memory.weights.nonrepeating="308.2 MiB" memory.graph.full="424.0 MiB" memory.graph.partial="570.7 MiB"
time=2025-05-06T16:52:04.349-07:00 level=DEBUG source=server.go:263 msg="compatible gpu libraries" compatible=[rocm]
time=2025-05-06T16:52:04.754-07:00 level=DEBUG source=server.go:339 msg="adding gpu library" path=C:\Users\User\AppData\Local\Programs\Ollama\lib\ollama\rocm
time=2025-05-06T16:52:04.754-07:00 level=DEBUG source=server.go:346 msg="adding gpu dependency paths" paths=[C:\Users\User\AppData\Local\Programs\Ollama\lib\ollama\rocm]
time=2025-05-06T16:52:04.754-07:00 level=INFO source=server.go:410 msg="starting llama server" cmd="C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --model C:\\Users\\User\\.ollama\\models\\blobs\\sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff --ctx-size 8192 --batch-size 512 --n-gpu-layers 29 --verbose --threads 18 --parallel 2 --port 55195"
load_backend: loaded ROCm backend from C:\Users\User\AppData\Local\Programs\Ollama\lib\ollama\rocm\ggml-hip.dll
time=2025-05-06T16:52:11.904-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="127.7 GiB" before.free="100.7 GiB" before.free_swap="100.4 GiB" now.total="127.7 GiB" now.free="100.7 GiB" now.free_swap="100.2 GiB"
time=2025-05-06T16:52:12.591-07:00 level=DEBUG source=amd_windows.go:197 msg="updating rocm free memory" gpu=1 name="AMD Radeon RX 7900 XTX" before="23.5 GiB" now="23.4 GiB"
time=2025-05-06T16:52:12.642-07:00 level=DEBUG source=memory.go:108 msg=evaluating library=rocm gpu_count=1 available="[23.4 GiB]"
time=2025-05-06T16:52:12.643-07:00 level=INFO source=sched.go:754 msg="new model will fit in available VRAM in single GPU, loading" model=C:\Users\User\.ollama\models\blobs\sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff gpu=1 parallel=2 available=25128730624 required="3.7 GiB"
time=2025-05-06T16:52:12.643-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="127.7 GiB" before.free="100.7 GiB" before.free_swap="100.2 GiB" now.total="127.7 GiB" now.free="100.7 GiB" now.free_swap="100.1 GiB"
time=2025-05-06T16:52:13.324-07:00 level=DEBUG source=amd_windows.go:197 msg="updating rocm free memory" gpu=1 name="AMD Radeon RX 7900 XTX" before="23.4 GiB" now="23.3 GiB"
time=2025-05-06T16:52:13.327-07:00 level=INFO source=server.go:106 msg="system memory" total="127.7 GiB" free="100.7 GiB" free_swap="100.1 GiB"
time=2025-05-06T16:52:13.327-07:00 level=DEBUG source=memory.go:108 msg=evaluating library=rocm gpu_count=1 available="[23.4 GiB]"
time=2025-05-06T16:52:13.327-07:00 level=INFO source=server.go:139 msg=offload library=rocm layers.requested=-1 layers.model=29 layers.offload=29 layers.split="" memory.available="[23.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="3.7 GiB" memory.required.partial="3.7 GiB" memory.required.kv="896.0 MiB" memory.required.allocations="[3.7 GiB]" memory.weights.total="1.9 GiB" memory.weights.repeating="1.6 GiB" memory.weights.nonrepeating="308.2 MiB" memory.graph.full="424.0 MiB" memory.graph.partial="570.7 MiB"
time=2025-05-06T16:52:13.327-07:00 level=DEBUG source=server.go:263 msg="compatible gpu libraries" compatible=[rocm]
time=2025-05-06T16:52:13.718-07:00 level=DEBUG source=server.go:339 msg="adding gpu library" path=C:\Users\User\AppData\Local\Programs\Ollama\lib\ollama\rocm
time=2025-05-06T16:52:13.718-07:00 level=DEBUG source=server.go:346 msg="adding gpu dependency paths" paths=[C:\Users\User\AppData\Local\Programs\Ollama\lib\ollama\rocm]
time=2025-05-06T16:52:13.718-07:00 level=INFO source=server.go:410 msg="starting llama server" cmd="C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --model C:\\Users\\User\\.ollama\\models\\blobs\\sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff --ctx-size 8192 --batch-size 512 --n-gpu-layers 29 --verbose --threads 18 --parallel 2 --port 55202"
load_backend: loaded ROCm backend from C:\Users\User\AppData\Local\Programs\Ollama\lib\ollama\rocm\ggml-hip.dll

OS

Windows 11 Pro

GPU

AMD Radeon 7900 XTX 24G

CPU

Xeon Gold 6154

Ollama version

0.6.8

Originally created by @8forty on GitHub (May 6, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10597 ### What is the issue? I've boiled down my code to a pretty simple (and obviously contrived) example, and I'm watching GPU memory in both GPU-Z and Task Manager. Each iteration of loop starts with about 300MB more GPU memory used before loading the model. ollama ps always shows no models running after the code finishes. I start by quitting ollama, GPU-Z shows 2MB Memory Used (Dedicated). Start ollama and it goes up to 153MB. Then each loop after the unload (these numbers are consistent/duplicate for each restart): 1: 453MB 2: 751MB 3: 1051MB ... 10: 3092MB ... 20: 5835MB I also tried a couple of other models ('gemma3:1b', 'gemma3:12b', 'gemma3:27b'), same results. I see that AMD support is relatively new so maybe this kind of issue is expected? Is there something else I can do (short of killing ollama) to ensure that all GPU memory is released between model loads? I've got the server log with OLLAMA_DEBUG=1 for just first 2 iterations of the loop, not sure what to look for that might be helpful. I've grep'd out lines with amd/gpu/memory/hip below. ``` import ollama from ollama import ChatResponse model_name = 'llama3.2:3b' messages = [ {'role': 'system', 'content': 'You are a helpful chatbot that talks in a professional manner.'}, {'role': 'user', 'content': 'where is paris?'}, ] print(f'model: {model_name}') client = ollama.Client(host='http://localhost:11434') for i in range(10): print(f'--- loop {i}') chat_response: ChatResponse = client.chat( model=model_name, messages=messages, stream=False, ) print(f'response: {chat_response.message.content}') client.generate(model=model_name, keep_alive=0.0) # unload the model ``` ### Relevant log output ```shell time=2025-05-06T16:51:34.791-07:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-05-06T16:51:34.792-07:00 level=INFO source=gpu_windows.go:167 msg=packages count=1 time=2025-05-06T16:51:34.792-07:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=18 efficiency=0 threads=36 time=2025-05-06T16:51:34.792-07:00 level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA" time=2025-05-06T16:51:34.792-07:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=nvml.dll time=2025-05-06T16:51:34.792-07:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\nvml.dll C:\\java\\jdk1.8.0_144\\bin\\nvml.dll C:\\java\\jdk1.8.0_144\\bin\\nvml.dll C:\\java\\jdk1.8.0_144\\bin\\nvml.dll C:\\anaconda3\\nvml.dll C:\\anaconda3\\Library\\mingw-w64\\bin\\nvml.dll C:\\anaconda3\\Library\\usr\\bin\\nvml.dll C:\\anaconda3\\Library\\bin\\nvml.dll C:\\anaconda3\\Scripts\\nvml.dll C:\\anaconda3\\bin\\nvml.dll C:\\anaconda3\\condabin\\nvml.dll C:\\java\\jdk1.8.0_144\\bin\\nvml.dll C:\\cygwin64\\usr\\local\\bin\\nvml.dll C:\\cygwin64\\bin\\nvml.dll C:\\WINDOWS\\system32\\nvml.dll C:\\WINDOWS\\nvml.dll C:\\WINDOWS\\System32\\Wbem\\nvml.dll C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\nvml.dll C:\\WINDOWS\\System32\\OpenSSH\\nvml.dll C:\\Program Files (x86)\\Intel\\Intel(R) Management Engine Components\\DAL\\nvml.dll C:\\Program Files\\Intel\\Intel(R) Management Engine Components\\DAL\\nvml.dll C:\\Program Files\\Git\\cmd\\nvml.dll C:\\Program Files\\dotnet\\nvml.dll C:\\Program Files (x86)\\Windows Kits\\10\\Windows Performance Toolkit\\nvml.dll C:\\Program Files\\CMake\\bin\\nvml.dll C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\Llvm\\x64\\bin\\nvml.dll C:\\Program Files\\Go\\bin\\nvml.dll C:\\TDM-GCC-64\\bin\\nvml.dll C:\\Users\\User\\AppData\\Local\\Microsoft\\WindowsApps\\nvml.dll C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\nvml.dll C:\\Users\\User\\AppData\\Local\\JetBrains\\Toolbox\\scripts\\nvml.dll C:\\Users\\User\\go\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\c\\nvml.dll C:\\anaconda3\\Scripts\\nvml.dll C:\\cygwin64\\home\\kf\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\c\\nvml.dll C:\\anaconda3\\Scripts\\nvml.dll C:\\cygwin64\\home\\kf\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\c\\nvml.dll C:\\anaconda3\\Scripts\\nvml.dll C:\\cygwin64\\home\\kf\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\c\\nvml.dll C:\\anaconda3\\Scripts\\nvml.dll c:\\Windows\\System32\\nvml.dll]" time=2025-05-06T16:51:34.794-07:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths=[] time=2025-05-06T16:51:34.794-07:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=nvcuda.dll time=2025-05-06T16:51:34.794-07:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\nvcuda.dll C:\\java\\jdk1.8.0_144\\bin\\nvcuda.dll C:\\java\\jdk1.8.0_144\\bin\\nvcuda.dll C:\\java\\jdk1.8.0_144\\bin\\nvcuda.dll C:\\anaconda3\\nvcuda.dll C:\\anaconda3\\Library\\mingw-w64\\bin\\nvcuda.dll C:\\anaconda3\\Library\\usr\\bin\\nvcuda.dll C:\\anaconda3\\Library\\bin\\nvcuda.dll C:\\anaconda3\\Scripts\\nvcuda.dll C:\\anaconda3\\bin\\nvcuda.dll C:\\anaconda3\\condabin\\nvcuda.dll C:\\java\\jdk1.8.0_144\\bin\\nvcuda.dll C:\\cygwin64\\usr\\local\\bin\\nvcuda.dll C:\\cygwin64\\bin\\nvcuda.dll C:\\WINDOWS\\system32\\nvcuda.dll C:\\WINDOWS\\nvcuda.dll C:\\WINDOWS\\System32\\Wbem\\nvcuda.dll C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\nvcuda.dll C:\\WINDOWS\\System32\\OpenSSH\\nvcuda.dll C:\\Program Files (x86)\\Intel\\Intel(R) Management Engine Components\\DAL\\nvcuda.dll C:\\Program Files\\Intel\\Intel(R) Management Engine Components\\DAL\\nvcuda.dll C:\\Program Files\\Git\\cmd\\nvcuda.dll C:\\Program Files\\dotnet\\nvcuda.dll C:\\Program Files (x86)\\Windows Kits\\10\\Windows Performance Toolkit\\nvcuda.dll C:\\Program Files\\CMake\\bin\\nvcuda.dll C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\Llvm\\x64\\bin\\nvcuda.dll C:\\Program Files\\Go\\bin\\nvcuda.dll C:\\TDM-GCC-64\\bin\\nvcuda.dll C:\\Users\\User\\AppData\\Local\\Microsoft\\WindowsApps\\nvcuda.dll C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\nvcuda.dll C:\\Users\\User\\AppData\\Local\\JetBrains\\Toolbox\\scripts\\nvcuda.dll C:\\Users\\User\\go\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\c\\nvcuda.dll C:\\anaconda3\\Scripts\\nvcuda.dll C:\\cygwin64\\home\\kf\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\c\\nvcuda.dll C:\\anaconda3\\Scripts\\nvcuda.dll C:\\cygwin64\\home\\kf\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\c\\nvcuda.dll C:\\anaconda3\\Scripts\\nvcuda.dll C:\\cygwin64\\home\\kf\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\c\\nvcuda.dll C:\\anaconda3\\Scripts\\nvcuda.dll c:\\windows\\system*\\nvcuda.dll]" time=2025-05-06T16:51:34.796-07:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths=[] time=2025-05-06T16:51:34.796-07:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=cudart64_*.dll time=2025-05-06T16:51:34.796-07:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cudart64_*.dll C:\\java\\jdk1.8.0_144\\bin\\cudart64_*.dll C:\\java\\jdk1.8.0_144\\bin\\cudart64_*.dll C:\\java\\jdk1.8.0_144\\bin\\cudart64_*.dll C:\\anaconda3\\cudart64_*.dll C:\\anaconda3\\Library\\mingw-w64\\bin\\cudart64_*.dll C:\\anaconda3\\Library\\usr\\bin\\cudart64_*.dll C:\\anaconda3\\Library\\bin\\cudart64_*.dll C:\\anaconda3\\Scripts\\cudart64_*.dll C:\\anaconda3\\bin\\cudart64_*.dll C:\\anaconda3\\condabin\\cudart64_*.dll C:\\java\\jdk1.8.0_144\\bin\\cudart64_*.dll C:\\cygwin64\\usr\\local\\bin\\cudart64_*.dll C:\\cygwin64\\bin\\cudart64_*.dll C:\\WINDOWS\\system32\\cudart64_*.dll C:\\WINDOWS\\cudart64_*.dll C:\\WINDOWS\\System32\\Wbem\\cudart64_*.dll C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\cudart64_*.dll C:\\WINDOWS\\System32\\OpenSSH\\cudart64_*.dll C:\\Program Files (x86)\\Intel\\Intel(R) Management Engine Components\\DAL\\cudart64_*.dll C:\\Program Files\\Intel\\Intel(R) Management Engine Components\\DAL\\cudart64_*.dll C:\\Program Files\\Git\\cmd\\cudart64_*.dll C:\\Program Files\\dotnet\\cudart64_*.dll C:\\Program Files (x86)\\Windows Kits\\10\\Windows Performance Toolkit\\cudart64_*.dll C:\\Program Files\\CMake\\bin\\cudart64_*.dll C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\Llvm\\x64\\bin\\cudart64_*.dll C:\\Program Files\\Go\\bin\\cudart64_*.dll C:\\TDM-GCC-64\\bin\\cudart64_*.dll C:\\Users\\User\\AppData\\Local\\Microsoft\\WindowsApps\\cudart64_*.dll C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\cudart64_*.dll C:\\Users\\User\\AppData\\Local\\JetBrains\\Toolbox\\scripts\\cudart64_*.dll C:\\Users\\User\\go\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\c\\cudart64_*.dll C:\\anaconda3\\Scripts\\cudart64_*.dll C:\\cygwin64\\home\\kf\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\c\\cudart64_*.dll C:\\anaconda3\\Scripts\\cudart64_*.dll C:\\cygwin64\\home\\kf\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\c\\cudart64_*.dll C:\\anaconda3\\Scripts\\cudart64_*.dll C:\\cygwin64\\home\\kf\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\c\\cudart64_*.dll C:\\anaconda3\\Scripts\\cudart64_*.dll C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v*\\cudart64_*.dll c:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v*\\bin\\cudart64_*.dll]" time=2025-05-06T16:51:34.810-07:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths="[C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v11\\cudart64_110.dll C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12\\cudart64_12.dll]" time=2025-05-06T16:51:34.811-07:00 level=DEBUG source=gpu.go:574 msg="Unable to load cudart library C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v11\\cudart64_110.dll: your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2025-05-06T16:51:34.812-07:00 level=DEBUG source=gpu.go:574 msg="Unable to load cudart library C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12\\cudart64_12.dll: your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2025-05-06T16:51:34.870-07:00 level=DEBUG source=amd_hip_windows.go:88 msg=hipDriverGetVersion version=60241512 time=2025-05-06T16:51:34.870-07:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm" time=2025-05-06T16:51:34.870-07:00 level=DEBUG source=amd_common.go:44 msg="detected ROCM next to ollama executable C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm" time=2025-05-06T16:51:34.872-07:00 level=DEBUG source=amd_windows.go:73 msg="detected hip devices" count=2 time=2025-05-06T16:51:34.872-07:00 level=DEBUG source=amd_windows.go:93 msg="hip device" id=0 name="AMD Radeon Pro W5700" gfx=gfx1010:xnack- time=2025-05-06T16:51:35.530-07:00 level=WARN source=amd_windows.go:138 msg="amdgpu is not supported (supported types:[gfx1030 gfx1100 gfx1101 gfx1102 gfx1151 gfx906])" gpu_type=gfx1010:xnack- gpu=0 library=C:\Users\User\AppData\Local\Programs\Ollama\lib\ollama\rocm time=2025-05-06T16:51:35.531-07:00 level=DEBUG source=amd_windows.go:93 msg="hip device" id=1 name="AMD Radeon RX 7900 XTX" gfx=gfx1100 time=2025-05-06T16:51:36.175-07:00 level=DEBUG source=amd_windows.go:146 msg="amdgpu is supported" gpu=1 gpu_type=gfx1100 time=2025-05-06T16:51:36.175-07:00 level=DEBUG source=amd_windows.go:149 msg="amdgpu memory" gpu=1 total="24.0 GiB" time=2025-05-06T16:51:36.175-07:00 level=DEBUG source=amd_windows.go:150 msg="amdgpu memory" gpu=1 available="23.8 GiB" time=2025-05-06T16:52:02.945-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="127.7 GiB" before.free="101.0 GiB" before.free_swap="101.9 GiB" now.total="127.7 GiB" now.free="100.7 GiB" now.free_swap="100.6 GiB" time=2025-05-06T16:52:03.630-07:00 level=DEBUG source=amd_windows.go:197 msg="updating rocm free memory" gpu=1 name="AMD Radeon RX 7900 XTX" before="23.8 GiB" now="23.7 GiB" time=2025-05-06T16:52:03.633-07:00 level=INFO source=sched.go:188 msg="one or more GPUs detected that are unable to accurately report free memory - disabling default concurrency" time=2025-05-06T16:52:03.682-07:00 level=DEBUG source=memory.go:108 msg=evaluating library=rocm gpu_count=1 available="[23.7 GiB]" time=2025-05-06T16:52:03.682-07:00 level=INFO source=sched.go:754 msg="new model will fit in available VRAM in single GPU, loading" model=C:\Users\User\.ollama\models\blobs\sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff gpu=1 parallel=2 available=25440878592 required="3.7 GiB" time=2025-05-06T16:52:03.682-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="127.7 GiB" before.free="100.7 GiB" before.free_swap="100.6 GiB" now.total="127.7 GiB" now.free="100.7 GiB" now.free_swap="100.4 GiB" time=2025-05-06T16:52:04.345-07:00 level=DEBUG source=amd_windows.go:197 msg="updating rocm free memory" gpu=1 name="AMD Radeon RX 7900 XTX" before="23.7 GiB" now="23.5 GiB" time=2025-05-06T16:52:04.348-07:00 level=INFO source=server.go:106 msg="system memory" total="127.7 GiB" free="100.7 GiB" free_swap="100.4 GiB" time=2025-05-06T16:52:04.348-07:00 level=DEBUG source=memory.go:108 msg=evaluating library=rocm gpu_count=1 available="[23.7 GiB]" time=2025-05-06T16:52:04.348-07:00 level=INFO source=server.go:139 msg=offload library=rocm layers.requested=-1 layers.model=29 layers.offload=29 layers.split="" memory.available="[23.7 GiB]" memory.gpu_overhead="0 B" memory.required.full="3.7 GiB" memory.required.partial="3.7 GiB" memory.required.kv="896.0 MiB" memory.required.allocations="[3.7 GiB]" memory.weights.total="1.9 GiB" memory.weights.repeating="1.6 GiB" memory.weights.nonrepeating="308.2 MiB" memory.graph.full="424.0 MiB" memory.graph.partial="570.7 MiB" time=2025-05-06T16:52:04.349-07:00 level=DEBUG source=server.go:263 msg="compatible gpu libraries" compatible=[rocm] time=2025-05-06T16:52:04.754-07:00 level=DEBUG source=server.go:339 msg="adding gpu library" path=C:\Users\User\AppData\Local\Programs\Ollama\lib\ollama\rocm time=2025-05-06T16:52:04.754-07:00 level=DEBUG source=server.go:346 msg="adding gpu dependency paths" paths=[C:\Users\User\AppData\Local\Programs\Ollama\lib\ollama\rocm] time=2025-05-06T16:52:04.754-07:00 level=INFO source=server.go:410 msg="starting llama server" cmd="C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --model C:\\Users\\User\\.ollama\\models\\blobs\\sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff --ctx-size 8192 --batch-size 512 --n-gpu-layers 29 --verbose --threads 18 --parallel 2 --port 55195" load_backend: loaded ROCm backend from C:\Users\User\AppData\Local\Programs\Ollama\lib\ollama\rocm\ggml-hip.dll time=2025-05-06T16:52:11.904-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="127.7 GiB" before.free="100.7 GiB" before.free_swap="100.4 GiB" now.total="127.7 GiB" now.free="100.7 GiB" now.free_swap="100.2 GiB" time=2025-05-06T16:52:12.591-07:00 level=DEBUG source=amd_windows.go:197 msg="updating rocm free memory" gpu=1 name="AMD Radeon RX 7900 XTX" before="23.5 GiB" now="23.4 GiB" time=2025-05-06T16:52:12.642-07:00 level=DEBUG source=memory.go:108 msg=evaluating library=rocm gpu_count=1 available="[23.4 GiB]" time=2025-05-06T16:52:12.643-07:00 level=INFO source=sched.go:754 msg="new model will fit in available VRAM in single GPU, loading" model=C:\Users\User\.ollama\models\blobs\sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff gpu=1 parallel=2 available=25128730624 required="3.7 GiB" time=2025-05-06T16:52:12.643-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="127.7 GiB" before.free="100.7 GiB" before.free_swap="100.2 GiB" now.total="127.7 GiB" now.free="100.7 GiB" now.free_swap="100.1 GiB" time=2025-05-06T16:52:13.324-07:00 level=DEBUG source=amd_windows.go:197 msg="updating rocm free memory" gpu=1 name="AMD Radeon RX 7900 XTX" before="23.4 GiB" now="23.3 GiB" time=2025-05-06T16:52:13.327-07:00 level=INFO source=server.go:106 msg="system memory" total="127.7 GiB" free="100.7 GiB" free_swap="100.1 GiB" time=2025-05-06T16:52:13.327-07:00 level=DEBUG source=memory.go:108 msg=evaluating library=rocm gpu_count=1 available="[23.4 GiB]" time=2025-05-06T16:52:13.327-07:00 level=INFO source=server.go:139 msg=offload library=rocm layers.requested=-1 layers.model=29 layers.offload=29 layers.split="" memory.available="[23.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="3.7 GiB" memory.required.partial="3.7 GiB" memory.required.kv="896.0 MiB" memory.required.allocations="[3.7 GiB]" memory.weights.total="1.9 GiB" memory.weights.repeating="1.6 GiB" memory.weights.nonrepeating="308.2 MiB" memory.graph.full="424.0 MiB" memory.graph.partial="570.7 MiB" time=2025-05-06T16:52:13.327-07:00 level=DEBUG source=server.go:263 msg="compatible gpu libraries" compatible=[rocm] time=2025-05-06T16:52:13.718-07:00 level=DEBUG source=server.go:339 msg="adding gpu library" path=C:\Users\User\AppData\Local\Programs\Ollama\lib\ollama\rocm time=2025-05-06T16:52:13.718-07:00 level=DEBUG source=server.go:346 msg="adding gpu dependency paths" paths=[C:\Users\User\AppData\Local\Programs\Ollama\lib\ollama\rocm] time=2025-05-06T16:52:13.718-07:00 level=INFO source=server.go:410 msg="starting llama server" cmd="C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --model C:\\Users\\User\\.ollama\\models\\blobs\\sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff --ctx-size 8192 --batch-size 512 --n-gpu-layers 29 --verbose --threads 18 --parallel 2 --port 55202" load_backend: loaded ROCm backend from C:\Users\User\AppData\Local\Programs\Ollama\lib\ollama\rocm\ggml-hip.dll ``` ### OS Windows 11 Pro ### GPU AMD Radeon 7900 XTX 24G ### CPU Xeon Gold 6154 ### Ollama version 0.6.8
GiteaMirror added the needs more infobugwindows labels 2026-04-12 18:52:00 -05:00
Author
Owner

@Cownjackson commented on GitHub (May 7, 2025):

I'm also dealing with this. It's got my entire server locked up and is preventing me from even connecting to terminate ollama.

<!-- gh-comment-id:2860200472 --> @Cownjackson commented on GitHub (May 7, 2025): I'm also dealing with this. It's got my entire server locked up and is preventing me from even connecting to terminate ollama.
Author
Owner

@thbeh commented on GitHub (May 8, 2025):

Same issue here .. the only time Ollama fully clears the GPU's memory is when it's closed.

<!-- gh-comment-id:2862024146 --> @thbeh commented on GitHub (May 8, 2025): Same issue here .. the only time Ollama fully clears the GPU's memory is when it's closed.
Author
Owner

@Cownjackson commented on GitHub (May 8, 2025):

Same issue here .. the only time Ollama fully clears the GPU's memory is when it's closed.

Mine isn't even freeing the VRAM after closing ollama... I was completely locked out of my remote server until I force restarted it. Can't use ollama until this is resolved.

<!-- gh-comment-id:2863701510 --> @Cownjackson commented on GitHub (May 8, 2025): > Same issue here .. the only time Ollama fully clears the GPU's memory is when it's closed. Mine isn't even freeing the VRAM after closing ollama... I was completely locked out of my remote server until I force restarted it. Can't use ollama until this is resolved.
Author
Owner

@Cownjackson commented on GitHub (May 8, 2025):

Same issue here .. the only time Ollama fully clears the GPU's memory is when it's closed.

Mine isn't even freeing the VRAM after closing ollama... I was completely locked out of my remote server until I force restarted it. Can't use ollama until this is resolved.

Might be worth noting I'm running ollama on windows server 2022 with an Nvidia L40s

<!-- gh-comment-id:2863705178 --> @Cownjackson commented on GitHub (May 8, 2025): > > Same issue here .. the only time Ollama fully clears the GPU's memory is when it's closed. > > Mine isn't even freeing the VRAM after closing ollama... I was completely locked out of my remote server until I force restarted it. Can't use ollama until this is resolved. Might be worth noting I'm running ollama on windows server 2022 with an Nvidia L40s
Author
Owner

@Kbeff commented on GitHub (May 11, 2025):

Same exact problem, looks like a critical issue, I hope it will get solved soon

<!-- gh-comment-id:2869897085 --> @Kbeff commented on GitHub (May 11, 2025): Same exact problem, looks like a critical issue, I hope it will get solved soon
Author
Owner

@8forty commented on GitHub (May 11, 2025):

@Cownjackson I've had success clearing the GPU memory by killing ollama every time I switch models. In python I'm using psutil to get the pids of all processes named "ollama.exe", then killing them with os.kill(pid, signal.SIGTERM).

Unfortunately I had to add a bunch of error handling for these since sometimes the kill fails (but it appears the process is always dead when that happens so... success?), and sometimes ollama takes a while to restart on the next model load via ollama API's generate function: generate(model=model_name, keep_alive='5m')

<!-- gh-comment-id:2870254883 --> @8forty commented on GitHub (May 11, 2025): @Cownjackson I've had success clearing the GPU memory by killing ollama every time I switch models. In python I'm using `psutil` to get the pids of all processes named "ollama.exe", then killing them with `os.kill(pid, signal.SIGTERM)`. Unfortunately I had to add a bunch of error handling for these since sometimes the kill fails (but it appears the process is always dead when that happens so... success?), and sometimes ollama takes a while to restart on the next model load via ollama API's generate function: `generate(model=model_name, keep_alive='5m')`
Author
Owner

@madwax commented on GitHub (May 13, 2025):

Same problem under Ubuntu 24.04

<!-- gh-comment-id:2874680322 --> @madwax commented on GitHub (May 13, 2025): Same problem under Ubuntu 24.04
Author
Owner

@rick-github commented on GitHub (May 16, 2025):

There are some fixes in 0.7.0 for memory leaks and orphaned processes. Does this problem still occur with 0.7.0?

<!-- gh-comment-id:2886203924 --> @rick-github commented on GitHub (May 16, 2025): There are some fixes in 0.7.0 for memory leaks and orphaned processes. Does this problem still occur with 0.7.0?
Author
Owner

@8forty commented on GitHub (May 16, 2025):

I just updated to 0.7.0 and ran the same code above: same exact results, GPU memory usage grows by about 300M in each iteration of the loop, 153MB, 751MB, 1051MB, ...

<!-- gh-comment-id:2887152032 --> @8forty commented on GitHub (May 16, 2025): I just updated to 0.7.0 and ran the same code above: same exact results, GPU memory usage grows by about 300M in each iteration of the loop, 153MB, 751MB, 1051MB, ...
Author
Owner

@rick-github commented on GitHub (May 16, 2025):

When the loop is running, can you confirm that the runner process id (the one with --model on the command line) changes? The model unload should cause the runner to exit, and a new one to start for the next iteration of the loop. When the runner exits, all VRAM should be released. If the runner is exiting but VRAM is not released, that would indicate a leak in the GPU driver.

<!-- gh-comment-id:2887189824 --> @rick-github commented on GitHub (May 16, 2025): When the loop is running, can you confirm that the runner process id (the one with --model on the command line) changes? The model unload should cause the runner to exit, and a new one to start for the next iteration of the loop. When the runner exits, all VRAM should be released. If the runner is exiting but VRAM is not released, that would indicate a leak in the GPU driver.
Author
Owner

@8forty commented on GitHub (May 16, 2025):

When I quit ollama and start a new one, an ollama.exe shows up in task manager, and that's the one that keeps growing (PID 35372 in the screenshots below). Each iteration, another ollama.exe shows up temporarily then disappears. "ollama app.exe" never changes.

I added a sleep at the end of each iteration (after the ChatResponse is printed) so I could capture the difference between while the chat is executing (2 ollama.exe running) and after it finishes (1 ollama.exe)

Fresh ollama start:
Image

During loop 0:
Image

End of loop 0 sleep:
Image

During loop 1:
Image

End of loop 1 sleep:
Image

<!-- gh-comment-id:2887528161 --> @8forty commented on GitHub (May 16, 2025): When I quit ollama and start a new one, an ollama.exe shows up in task manager, and that's the one that keeps growing (PID 35372 in the screenshots below). Each iteration, another ollama.exe shows up temporarily then disappears. "ollama app.exe" never changes. I added a sleep at the end of each iteration (after the ChatResponse is printed) so I could capture the difference between while the chat is executing (2 ollama.exe running) and after it finishes (1 ollama.exe) Fresh ollama start: <img width="809" alt="Image" src="https://github.com/user-attachments/assets/077c756d-e3ff-4ae6-975c-59e77221b0ce" /> During loop 0: <img width="815" alt="Image" src="https://github.com/user-attachments/assets/2b9f98cd-6db1-4519-b434-66d39bad357e" /> End of loop 0 sleep: <img width="821" alt="Image" src="https://github.com/user-attachments/assets/00daed85-3801-4ffa-886c-e1a6899036c2" /> During loop 1: <img width="813" alt="Image" src="https://github.com/user-attachments/assets/fa48ab98-cf1a-4e74-9c81-052f93f31475" /> End of loop 1 sleep: <img width="813" alt="Image" src="https://github.com/user-attachments/assets/ba622289-df23-4bcc-bdd8-7546a3e9b12f" />
Author
Owner

@rick-github commented on GitHub (May 16, 2025):

Ah, so it's the server that's accumulating VRAM. Which it shouldn't, because it's the runner's jobs to allocate it. All the server does is probe the GPUs to find out how much VRAM is available. I've tested this on Linux and the server doesn't grow, so this looks like a Windows issue. Your system has an AMD GPU and Cownjackson's is Nvidia so perhaps not related to the type of GPU either. I'll see if I can replicate.

<!-- gh-comment-id:2887586741 --> @rick-github commented on GitHub (May 16, 2025): Ah, so it's the server that's accumulating VRAM. Which it shouldn't, because it's the runner's jobs to allocate it. All the server does is probe the GPUs to find out how much VRAM is available. I've tested this on Linux and the server doesn't grow, so this looks like a Windows issue. Your system has an AMD GPU and Cownjackson's is Nvidia so perhaps not related to the type of GPU either. I'll see if I can replicate.
Author
Owner

@BloodyIron commented on GitHub (Jun 26, 2025):

worth keeping this open or time to close?

<!-- gh-comment-id:3009744543 --> @BloodyIron commented on GitHub (Jun 26, 2025): worth keeping this open or time to close?
Author
Owner

@rick-github commented on GitHub (Jun 26, 2025):

I've been unable to replicate. If somebody can demonstrate the problem with a recent version of ollama, then the ticket should remain open, otherwise close as stale.

<!-- gh-comment-id:3009766083 --> @rick-github commented on GitHub (Jun 26, 2025): I've been unable to replicate. If somebody can demonstrate the problem with a recent version of ollama, then the ticket should remain open, otherwise close as stale.
Author
Owner

@Kaehvaman commented on GitHub (Aug 26, 2025):

I've replicated this by run/stop one model several times and the server process accumulated 1.8 GB of video memory. I am using AMD GPU. Ollama version is 0.11.6
The rightmost column is dedicated GPU memory.

Image
<!-- gh-comment-id:3224568765 --> @Kaehvaman commented on GitHub (Aug 26, 2025): I've replicated this by run/stop one model several times and the server process accumulated 1.8 GB of video memory. I am using AMD GPU. Ollama version is 0.11.6 The rightmost column is dedicated GPU memory. <img width="2014" height="565" alt="Image" src="https://github.com/user-attachments/assets/5cb9100d-0d2f-4c1b-96f5-820398630097" />
Author
Owner

@sikhness commented on GitHub (Sep 9, 2025):

I'm having the same issue on Windows with Ollama 0.11.6 and using an AMD GPU. I'm going to have to force reboot the ollama server periodically until this gets resolved or my VRAM will blow up.

<!-- gh-comment-id:3272615555 --> @sikhness commented on GitHub (Sep 9, 2025): I'm having the same issue on Windows with Ollama 0.11.6 and using an AMD GPU. I'm going to have to force reboot the ollama server periodically until this gets resolved or my VRAM will blow up.
Author
Owner

@germgerm commented on GitHub (Oct 2, 2025):

windows 10 ollama 0.12.3
stopping ollama and issuing a a simple ollama list causes vram usage to increase by 149mb, stopping ollama frees it up - expected behaviour ?
also running into vram not being released even though ps reports no models loaded

<!-- gh-comment-id:3361204068 --> @germgerm commented on GitHub (Oct 2, 2025): windows 10 ollama 0.12.3 stopping ollama and issuing a a simple _ollama list_ causes vram usage to increase by 149mb, stopping ollama frees it up - expected behaviour ? also running into vram not being released even though _ps_ reports no models loaded
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#6971