[GH-ISSUE #10597] Unloading model doesn't free all GPU memory (v0.6.8, Radeon RX 7900 XTX, Windows 11) #6971

New Issue

GiteaMirror · 2026-04-12T18:52:00-05:00

GiteaMirror commented

2026-04-12 18:52:00 -05:00

Originally created by @8forty on GitHub (May 6, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10597

What is the issue?

I've boiled down my code to a pretty simple (and obviously contrived) example, and I'm watching GPU memory in both GPU-Z and Task Manager. Each iteration of loop starts with about 300MB more GPU memory used before loading the model. ollama ps always shows no models running after the code finishes.

I start by quitting ollama, GPU-Z shows 2MB Memory Used (Dedicated). Start ollama and it goes up to 153MB. Then each loop after the unload (these numbers are consistent/duplicate for each restart):

1: 453MB
2: 751MB
3: 1051MB
...
10: 3092MB
...
20: 5835MB

I also tried a couple of other models ('gemma3:1b', 'gemma3:12b', 'gemma3:27b'), same results.

I see that AMD support is relatively new so maybe this kind of issue is expected? Is there something else I can do (short of killing ollama) to ensure that all GPU memory is released between model loads?

I've got the server log with OLLAMA_DEBUG=1 for just first 2 iterations of the loop, not sure what to look for that might be helpful. I've grep'd out lines with amd/gpu/memory/hip below.

import ollama
from ollama import ChatResponse

model_name = 'llama3.2:3b'
messages = [
    {'role': 'system', 'content': 'You are a helpful chatbot that talks in a professional manner.'},
    {'role': 'user', 'content': 'where is paris?'},
]
print(f'model: {model_name}')
client = ollama.Client(host='http://localhost:11434')

for i in range(10):
    print(f'--- loop {i}')
    chat_response: ChatResponse = client.chat(
        model=model_name,
        messages=messages,
        stream=False,
    )
    print(f'response: {chat_response.message.content}')
    client.generate(model=model_name, keep_alive=0.0)  # unload the model

Relevant log output

time=2025-05-06T16:51:34.791-07:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-05-06T16:51:34.792-07:00 level=INFO source=gpu_windows.go:167 msg=packages count=1
time=2025-05-06T16:51:34.792-07:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=18 efficiency=0 threads=36
time=2025-05-06T16:51:34.792-07:00 level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA"
time=2025-05-06T16:51:34.792-07:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=nvml.dll
time=2025-05-06T16:51:34.792-07:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\nvml.dll C:\\java\\jdk1.8.0_144\\bin\\nvml.dll C:\\java\\jdk1.8.0_144\\bin\\nvml.dll C:\\java\\jdk1.8.0_144\\bin\\nvml.dll C:\\anaconda3\\nvml.dll C:\\anaconda3\\Library\\mingw-w64\\bin\\nvml.dll C:\\anaconda3\\Library\\usr\\bin\\nvml.dll C:\\anaconda3\\Library\\bin\\nvml.dll C:\\anaconda3\\Scripts\\nvml.dll C:\\anaconda3\\bin\\nvml.dll C:\\anaconda3\\condabin\\nvml.dll C:\\java\\jdk1.8.0_144\\bin\\nvml.dll C:\\cygwin64\\usr\\local\\bin\\nvml.dll C:\\cygwin64\\bin\\nvml.dll C:\\WINDOWS\\system32\\nvml.dll C:\\WINDOWS\\nvml.dll C:\\WINDOWS\\System32\\Wbem\\nvml.dll C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\nvml.dll C:\\WINDOWS\\System32\\OpenSSH\\nvml.dll C:\\Program Files (x86)\\Intel\\Intel(R) Management Engine Components\\DAL\\nvml.dll C:\\Program Files\\Intel\\Intel(R) Management Engine Components\\DAL\\nvml.dll C:\\Program Files\\Git\\cmd\\nvml.dll C:\\Program Files\\dotnet\\nvml.dll C:\\Program Files (x86)\\Windows Kits\\10\\Windows Performance Toolkit\\nvml.dll C:\\Program Files\\CMake\\bin\\nvml.dll C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\Llvm\\x64\\bin\\nvml.dll C:\\Program Files\\Go\\bin\\nvml.dll C:\\TDM-GCC-64\\bin\\nvml.dll C:\\Users\\User\\AppData\\Local\\Microsoft\\WindowsApps\\nvml.dll C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\nvml.dll C:\\Users\\User\\AppData\\Local\\JetBrains\\Toolbox\\scripts\\nvml.dll C:\\Users\\User\\go\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\c\\nvml.dll C:\\anaconda3\\Scripts\\nvml.dll C:\\cygwin64\\home\\kf\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\c\\nvml.dll C:\\anaconda3\\Scripts\\nvml.dll C:\\cygwin64\\home\\kf\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\c\\nvml.dll C:\\anaconda3\\Scripts\\nvml.dll C:\\cygwin64\\home\\kf\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\c\\nvml.dll C:\\anaconda3\\Scripts\\nvml.dll c:\\Windows\\System32\\nvml.dll]"
time=2025-05-06T16:51:34.794-07:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths=[]
time=2025-05-06T16:51:34.794-07:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=nvcuda.dll
time=2025-05-06T16:51:34.794-07:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\nvcuda.dll C:\\java\\jdk1.8.0_144\\bin\\nvcuda.dll C:\\java\\jdk1.8.0_144\\bin\\nvcuda.dll C:\\java\\jdk1.8.0_144\\bin\\nvcuda.dll C:\\anaconda3\\nvcuda.dll C:\\anaconda3\\Library\\mingw-w64\\bin\\nvcuda.dll C:\\anaconda3\\Library\\usr\\bin\\nvcuda.dll C:\\anaconda3\\Library\\bin\\nvcuda.dll C:\\anaconda3\\Scripts\\nvcuda.dll C:\\anaconda3\\bin\\nvcuda.dll C:\\anaconda3\\condabin\\nvcuda.dll C:\\java\\jdk1.8.0_144\\bin\\nvcuda.dll C:\\cygwin64\\usr\\local\\bin\\nvcuda.dll C:\\cygwin64\\bin\\nvcuda.dll C:\\WINDOWS\\system32\\nvcuda.dll C:\\WINDOWS\\nvcuda.dll C:\\WINDOWS\\System32\\Wbem\\nvcuda.dll C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\nvcuda.dll C:\\WINDOWS\\System32\\OpenSSH\\nvcuda.dll C:\\Program Files (x86)\\Intel\\Intel(R) Management Engine Components\\DAL\\nvcuda.dll C:\\Program Files\\Intel\\Intel(R) Management Engine Components\\DAL\\nvcuda.dll C:\\Program Files\\Git\\cmd\\nvcuda.dll C:\\Program Files\\dotnet\\nvcuda.dll C:\\Program Files (x86)\\Windows Kits\\10\\Windows Performance Toolkit\\nvcuda.dll C:\\Program Files\\CMake\\bin\\nvcuda.dll C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\Llvm\\x64\\bin\\nvcuda.dll C:\\Program Files\\Go\\bin\\nvcuda.dll C:\\TDM-GCC-64\\bin\\nvcuda.dll C:\\Users\\User\\AppData\\Local\\Microsoft\\WindowsApps\\nvcuda.dll C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\nvcuda.dll C:\\Users\\User\\AppData\\Local\\JetBrains\\Toolbox\\scripts\\nvcuda.dll C:\\Users\\User\\go\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\c\\nvcuda.dll C:\\anaconda3\\Scripts\\nvcuda.dll C:\\cygwin64\\home\\kf\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\c\\nvcuda.dll C:\\anaconda3\\Scripts\\nvcuda.dll C:\\cygwin64\\home\\kf\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\c\\nvcuda.dll C:\\anaconda3\\Scripts\\nvcuda.dll C:\\cygwin64\\home\\kf\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\c\\nvcuda.dll C:\\anaconda3\\Scripts\\nvcuda.dll c:\\windows\\system*\\nvcuda.dll]"
time=2025-05-06T16:51:34.796-07:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths=[]
time=2025-05-06T16:51:34.796-07:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=cudart64_*.dll
time=2025-05-06T16:51:34.796-07:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cudart64_*.dll C:\\java\\jdk1.8.0_144\\bin\\cudart64_*.dll C:\\java\\jdk1.8.0_144\\bin\\cudart64_*.dll C:\\java\\jdk1.8.0_144\\bin\\cudart64_*.dll C:\\anaconda3\\cudart64_*.dll C:\\anaconda3\\Library\\mingw-w64\\bin\\cudart64_*.dll C:\\anaconda3\\Library\\usr\\bin\\cudart64_*.dll C:\\anaconda3\\Library\\bin\\cudart64_*.dll C:\\anaconda3\\Scripts\\cudart64_*.dll C:\\anaconda3\\bin\\cudart64_*.dll C:\\anaconda3\\condabin\\cudart64_*.dll C:\\java\\jdk1.8.0_144\\bin\\cudart64_*.dll C:\\cygwin64\\usr\\local\\bin\\cudart64_*.dll C:\\cygwin64\\bin\\cudart64_*.dll C:\\WINDOWS\\system32\\cudart64_*.dll C:\\WINDOWS\\cudart64_*.dll C:\\WINDOWS\\System32\\Wbem\\cudart64_*.dll C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\cudart64_*.dll C:\\WINDOWS\\System32\\OpenSSH\\cudart64_*.dll C:\\Program Files (x86)\\Intel\\Intel(R) Management Engine Components\\DAL\\cudart64_*.dll C:\\Program Files\\Intel\\Intel(R) Management Engine Components\\DAL\\cudart64_*.dll C:\\Program Files\\Git\\cmd\\cudart64_*.dll C:\\Program Files\\dotnet\\cudart64_*.dll C:\\Program Files (x86)\\Windows Kits\\10\\Windows Performance Toolkit\\cudart64_*.dll C:\\Program Files\\CMake\\bin\\cudart64_*.dll C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\Llvm\\x64\\bin\\cudart64_*.dll C:\\Program Files\\Go\\bin\\cudart64_*.dll C:\\TDM-GCC-64\\bin\\cudart64_*.dll C:\\Users\\User\\AppData\\Local\\Microsoft\\WindowsApps\\cudart64_*.dll C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\cudart64_*.dll C:\\Users\\User\\AppData\\Local\\JetBrains\\Toolbox\\scripts\\cudart64_*.dll C:\\Users\\User\\go\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\c\\cudart64_*.dll C:\\anaconda3\\Scripts\\cudart64_*.dll C:\\cygwin64\\home\\kf\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\c\\cudart64_*.dll C:\\anaconda3\\Scripts\\cudart64_*.dll C:\\cygwin64\\home\\kf\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\c\\cudart64_*.dll C:\\anaconda3\\Scripts\\cudart64_*.dll C:\\cygwin64\\home\\kf\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\c\\cudart64_*.dll C:\\anaconda3\\Scripts\\cudart64_*.dll C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v*\\cudart64_*.dll c:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v*\\bin\\cudart64_*.dll]"
time=2025-05-06T16:51:34.810-07:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths="[C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v11\\cudart64_110.dll C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12\\cudart64_12.dll]"
time=2025-05-06T16:51:34.811-07:00 level=DEBUG source=gpu.go:574 msg="Unable to load cudart library C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v11\\cudart64_110.dll: your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2025-05-06T16:51:34.812-07:00 level=DEBUG source=gpu.go:574 msg="Unable to load cudart library C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12\\cudart64_12.dll: your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2025-05-06T16:51:34.870-07:00 level=DEBUG source=amd_hip_windows.go:88 msg=hipDriverGetVersion version=60241512
time=2025-05-06T16:51:34.870-07:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm"
time=2025-05-06T16:51:34.870-07:00 level=DEBUG source=amd_common.go:44 msg="detected ROCM next to ollama executable C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm"
time=2025-05-06T16:51:34.872-07:00 level=DEBUG source=amd_windows.go:73 msg="detected hip devices" count=2
time=2025-05-06T16:51:34.872-07:00 level=DEBUG source=amd_windows.go:93 msg="hip device" id=0 name="AMD Radeon Pro W5700" gfx=gfx1010:xnack-
time=2025-05-06T16:51:35.530-07:00 level=WARN source=amd_windows.go:138 msg="amdgpu is not supported (supported types:[gfx1030 gfx1100 gfx1101 gfx1102 gfx1151 gfx906])" gpu_type=gfx1010:xnack- gpu=0 library=C:\Users\User\AppData\Local\Programs\Ollama\lib\ollama\rocm
time=2025-05-06T16:51:35.531-07:00 level=DEBUG source=amd_windows.go:93 msg="hip device" id=1 name="AMD Radeon RX 7900 XTX" gfx=gfx1100
time=2025-05-06T16:51:36.175-07:00 level=DEBUG source=amd_windows.go:146 msg="amdgpu is supported" gpu=1 gpu_type=gfx1100
time=2025-05-06T16:51:36.175-07:00 level=DEBUG source=amd_windows.go:149 msg="amdgpu memory" gpu=1 total="24.0 GiB"
time=2025-05-06T16:51:36.175-07:00 level=DEBUG source=amd_windows.go:150 msg="amdgpu memory" gpu=1 available="23.8 GiB"
time=2025-05-06T16:52:02.945-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="127.7 GiB" before.free="101.0 GiB" before.free_swap="101.9 GiB" now.total="127.7 GiB" now.free="100.7 GiB" now.free_swap="100.6 GiB"
time=2025-05-06T16:52:03.630-07:00 level=DEBUG source=amd_windows.go:197 msg="updating rocm free memory" gpu=1 name="AMD Radeon RX 7900 XTX" before="23.8 GiB" now="23.7 GiB"
time=2025-05-06T16:52:03.633-07:00 level=INFO source=sched.go:188 msg="one or more GPUs detected that are unable to accurately report free memory - disabling default concurrency"
time=2025-05-06T16:52:03.682-07:00 level=DEBUG source=memory.go:108 msg=evaluating library=rocm gpu_count=1 available="[23.7 GiB]"
time=2025-05-06T16:52:03.682-07:00 level=INFO source=sched.go:754 msg="new model will fit in available VRAM in single GPU, loading" model=C:\Users\User\.ollama\models\blobs\sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff gpu=1 parallel=2 available=25440878592 required="3.7 GiB"
time=2025-05-06T16:52:03.682-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="127.7 GiB" before.free="100.7 GiB" before.free_swap="100.6 GiB" now.total="127.7 GiB" now.free="100.7 GiB" now.free_swap="100.4 GiB"
time=2025-05-06T16:52:04.345-07:00 level=DEBUG source=amd_windows.go:197 msg="updating rocm free memory" gpu=1 name="AMD Radeon RX 7900 XTX" before="23.7 GiB" now="23.5 GiB"
time=2025-05-06T16:52:04.348-07:00 level=INFO source=server.go:106 msg="system memory" total="127.7 GiB" free="100.7 GiB" free_swap="100.4 GiB"
time=2025-05-06T16:52:04.348-07:00 level=DEBUG source=memory.go:108 msg=evaluating library=rocm gpu_count=1 available="[23.7 GiB]"
time=2025-05-06T16:52:04.348-07:00 level=INFO source=server.go:139 msg=offload library=rocm layers.requested=-1 layers.model=29 layers.offload=29 layers.split="" memory.available="[23.7 GiB]" memory.gpu_overhead="0 B" memory.required.full="3.7 GiB" memory.required.partial="3.7 GiB" memory.required.kv="896.0 MiB" memory.required.allocations="[3.7 GiB]" memory.weights.total="1.9 GiB" memory.weights.repeating="1.6 GiB" memory.weights.nonrepeating="308.2 MiB" memory.graph.full="424.0 MiB" memory.graph.partial="570.7 MiB"
time=2025-05-06T16:52:04.349-07:00 level=DEBUG source=server.go:263 msg="compatible gpu libraries" compatible=[rocm]
time=2025-05-06T16:52:04.754-07:00 level=DEBUG source=server.go:339 msg="adding gpu library" path=C:\Users\User\AppData\Local\Programs\Ollama\lib\ollama\rocm
time=2025-05-06T16:52:04.754-07:00 level=DEBUG source=server.go:346 msg="adding gpu dependency paths" paths=[C:\Users\User\AppData\Local\Programs\Ollama\lib\ollama\rocm]
time=2025-05-06T16:52:04.754-07:00 level=INFO source=server.go:410 msg="starting llama server" cmd="C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --model C:\\Users\\User\\.ollama\\models\\blobs\\sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff --ctx-size 8192 --batch-size 512 --n-gpu-layers 29 --verbose --threads 18 --parallel 2 --port 55195"
load_backend: loaded ROCm backend from C:\Users\User\AppData\Local\Programs\Ollama\lib\ollama\rocm\ggml-hip.dll
time=2025-05-06T16:52:11.904-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="127.7 GiB" before.free="100.7 GiB" before.free_swap="100.4 GiB" now.total="127.7 GiB" now.free="100.7 GiB" now.free_swap="100.2 GiB"
time=2025-05-06T16:52:12.591-07:00 level=DEBUG source=amd_windows.go:197 msg="updating rocm free memory" gpu=1 name="AMD Radeon RX 7900 XTX" before="23.5 GiB" now="23.4 GiB"
time=2025-05-06T16:52:12.642-07:00 level=DEBUG source=memory.go:108 msg=evaluating library=rocm gpu_count=1 available="[23.4 GiB]"
time=2025-05-06T16:52:12.643-07:00 level=INFO source=sched.go:754 msg="new model will fit in available VRAM in single GPU, loading" model=C:\Users\User\.ollama\models\blobs\sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff gpu=1 parallel=2 available=25128730624 required="3.7 GiB"
time=2025-05-06T16:52:12.643-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="127.7 GiB" before.free="100.7 GiB" before.free_swap="100.2 GiB" now.total="127.7 GiB" now.free="100.7 GiB" now.free_swap="100.1 GiB"
time=2025-05-06T16:52:13.324-07:00 level=DEBUG source=amd_windows.go:197 msg="updating rocm free memory" gpu=1 name="AMD Radeon RX 7900 XTX" before="23.4 GiB" now="23.3 GiB"
time=2025-05-06T16:52:13.327-07:00 level=INFO source=server.go:106 msg="system memory" total="127.7 GiB" free="100.7 GiB" free_swap="100.1 GiB"
time=2025-05-06T16:52:13.327-07:00 level=DEBUG source=memory.go:108 msg=evaluating library=rocm gpu_count=1 available="[23.4 GiB]"
time=2025-05-06T16:52:13.327-07:00 level=INFO source=server.go:139 msg=offload library=rocm layers.requested=-1 layers.model=29 layers.offload=29 layers.split="" memory.available="[23.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="3.7 GiB" memory.required.partial="3.7 GiB" memory.required.kv="896.0 MiB" memory.required.allocations="[3.7 GiB]" memory.weights.total="1.9 GiB" memory.weights.repeating="1.6 GiB" memory.weights.nonrepeating="308.2 MiB" memory.graph.full="424.0 MiB" memory.graph.partial="570.7 MiB"
time=2025-05-06T16:52:13.327-07:00 level=DEBUG source=server.go:263 msg="compatible gpu libraries" compatible=[rocm]
time=2025-05-06T16:52:13.718-07:00 level=DEBUG source=server.go:339 msg="adding gpu library" path=C:\Users\User\AppData\Local\Programs\Ollama\lib\ollama\rocm
time=2025-05-06T16:52:13.718-07:00 level=DEBUG source=server.go:346 msg="adding gpu dependency paths" paths=[C:\Users\User\AppData\Local\Programs\Ollama\lib\ollama\rocm]
time=2025-05-06T16:52:13.718-07:00 level=INFO source=server.go:410 msg="starting llama server" cmd="C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --model C:\\Users\\User\\.ollama\\models\\blobs\\sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff --ctx-size 8192 --batch-size 512 --n-gpu-layers 29 --verbose --threads 18 --parallel 2 --port 55202"
load_backend: loaded ROCm backend from C:\Users\User\AppData\Local\Programs\Ollama\lib\ollama\rocm\ggml-hip.dll

OS

Windows 11 Pro

GPU

AMD Radeon 7900 XTX 24G

CPU

Xeon Gold 6154

Ollama version

0.6.8

Originally created by @8forty on GitHub (May 6, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10597 ### What is the issue? I've boiled down my code to a pretty simple (and obviously contrived) example, and I'm watching GPU memory in both GPU-Z and Task Manager. Each iteration of loop starts with about 300MB more GPU memory used before loading the model. ollama ps always shows no models running after the code finishes. I start by quitting ollama, GPU-Z shows 2MB Memory Used (Dedicated). Start ollama and it goes up to 153MB. Then each loop after the unload (these numbers are consistent/duplicate for each restart): 1: 453MB 2: 751MB 3: 1051MB ... 10: 3092MB ... 20: 5835MB I also tried a couple of other models ('gemma3:1b', 'gemma3:12b', 'gemma3:27b'), same results. I see that AMD support is relatively new so maybe this kind of issue is expected? Is there something else I can do (short of killing ollama) to ensure that all GPU memory is released between model loads? I've got the server log with OLLAMA_DEBUG=1 for just first 2 iterations of the loop, not sure what to look for that might be helpful. I've grep'd out lines with amd/gpu/memory/hip below. ``` import ollama from ollama import ChatResponse model_name = 'llama3.2:3b' messages = [ {'role': 'system', 'content': 'You are a helpful chatbot that talks in a professional manner.'}, {'role': 'user', 'content': 'where is paris?'}, ] print(f'model: {model_name}') client = ollama.Client(host='http://localhost:11434') for i in range(10): print(f'--- loop {i}') chat_response: ChatResponse = client.chat( model=model_name, messages=messages, stream=False, ) print(f'response: {chat_response.message.content}') client.generate(model=model_name, keep_alive=0.0) # unload the model ``` ### Relevant log output ```shell time=2025-05-06T16:51:34.791-07:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-05-06T16:51:34.792-07:00 level=INFO source=gpu_windows.go:167 msg=packages count=1 time=2025-05-06T16:51:34.792-07:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=18 efficiency=0 threads=36 time=2025-05-06T16:51:34.792-07:00 level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA" time=2025-05-06T16:51:34.792-07:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=nvml.dll time=2025-05-06T16:51:34.792-07:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\nvml.dll C:\\java\\jdk1.8.0_144\\bin\\nvml.dll C:\\java\\jdk1.8.0_144\\bin\\nvml.dll C:\\java\\jdk1.8.0_144\\bin\\nvml.dll C:\\anaconda3\\nvml.dll C:\\anaconda3\\Library\\mingw-w64\\bin\\nvml.dll C:\\anaconda3\\Library\\usr\\bin\\nvml.dll C:\\anaconda3\\Library\\bin\\nvml.dll C:\\anaconda3\\Scripts\\nvml.dll C:\\anaconda3\\bin\\nvml.dll C:\\anaconda3\\condabin\\nvml.dll C:\\java\\jdk1.8.0_144\\bin\\nvml.dll C:\\cygwin64\\usr\\local\\bin\\nvml.dll C:\\cygwin64\\bin\\nvml.dll C:\\WINDOWS\\system32\\nvml.dll C:\\WINDOWS\\nvml.dll C:\\WINDOWS\\System32\\Wbem\\nvml.dll C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\nvml.dll C:\\WINDOWS\\System32\\OpenSSH\\nvml.dll C:\\Program Files (x86)\\Intel\\Intel(R) Management Engine Components\\DAL\\nvml.dll C:\\Program Files\\Intel\\Intel(R) Management Engine Components\\DAL\\nvml.dll C:\\Program Files\\Git\\cmd\\nvml.dll C:\\Program Files\\dotnet\\nvml.dll C:\\Program Files (x86)\\Windows Kits\\10\\Windows Performance Toolkit\\nvml.dll C:\\Program Files\\CMake\\bin\\nvml.dll C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\Llvm\\x64\\bin\\nvml.dll C:\\Program Files\\Go\\bin\\nvml.dll C:\\TDM-GCC-64\\bin\\nvml.dll C:\\Users\\User\\AppData\\Local\\Microsoft\\WindowsApps\\nvml.dll C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\nvml.dll C:\\Users\\User\\AppData\\Local\\JetBrains\\Toolbox\\scripts\\nvml.dll C:\\Users\\User\\go\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\c\\nvml.dll C:\\anaconda3\\Scripts\\nvml.dll C:\\cygwin64\\home\\kf\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\c\\nvml.dll C:\\anaconda3\\Scripts\\nvml.dll C:\\cygwin64\\home\\kf\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\c\\nvml.dll C:\\anaconda3\\Scripts\\nvml.dll C:\\cygwin64\\home\\kf\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvml.dll C:\\cygwin64\\home\\kf\\c\\nvml.dll C:\\anaconda3\\Scripts\\nvml.dll c:\\Windows\\System32\\nvml.dll]" time=2025-05-06T16:51:34.794-07:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths=[] time=2025-05-06T16:51:34.794-07:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=nvcuda.dll time=2025-05-06T16:51:34.794-07:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\nvcuda.dll C:\\java\\jdk1.8.0_144\\bin\\nvcuda.dll C:\\java\\jdk1.8.0_144\\bin\\nvcuda.dll C:\\java\\jdk1.8.0_144\\bin\\nvcuda.dll C:\\anaconda3\\nvcuda.dll C:\\anaconda3\\Library\\mingw-w64\\bin\\nvcuda.dll C:\\anaconda3\\Library\\usr\\bin\\nvcuda.dll C:\\anaconda3\\Library\\bin\\nvcuda.dll C:\\anaconda3\\Scripts\\nvcuda.dll C:\\anaconda3\\bin\\nvcuda.dll C:\\anaconda3\\condabin\\nvcuda.dll C:\\java\\jdk1.8.0_144\\bin\\nvcuda.dll C:\\cygwin64\\usr\\local\\bin\\nvcuda.dll C:\\cygwin64\\bin\\nvcuda.dll C:\\WINDOWS\\system32\\nvcuda.dll C:\\WINDOWS\\nvcuda.dll C:\\WINDOWS\\System32\\Wbem\\nvcuda.dll C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\nvcuda.dll C:\\WINDOWS\\System32\\OpenSSH\\nvcuda.dll C:\\Program Files (x86)\\Intel\\Intel(R) Management Engine Components\\DAL\\nvcuda.dll C:\\Program Files\\Intel\\Intel(R) Management Engine Components\\DAL\\nvcuda.dll C:\\Program Files\\Git\\cmd\\nvcuda.dll C:\\Program Files\\dotnet\\nvcuda.dll C:\\Program Files (x86)\\Windows Kits\\10\\Windows Performance Toolkit\\nvcuda.dll C:\\Program Files\\CMake\\bin\\nvcuda.dll C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\Llvm\\x64\\bin\\nvcuda.dll C:\\Program Files\\Go\\bin\\nvcuda.dll C:\\TDM-GCC-64\\bin\\nvcuda.dll C:\\Users\\User\\AppData\\Local\\Microsoft\\WindowsApps\\nvcuda.dll C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\nvcuda.dll C:\\Users\\User\\AppData\\Local\\JetBrains\\Toolbox\\scripts\\nvcuda.dll C:\\Users\\User\\go\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\c\\nvcuda.dll C:\\anaconda3\\Scripts\\nvcuda.dll C:\\cygwin64\\home\\kf\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\c\\nvcuda.dll C:\\anaconda3\\Scripts\\nvcuda.dll C:\\cygwin64\\home\\kf\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\c\\nvcuda.dll C:\\anaconda3\\Scripts\\nvcuda.dll C:\\cygwin64\\home\\kf\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\nvcuda.dll C:\\cygwin64\\home\\kf\\c\\nvcuda.dll C:\\anaconda3\\Scripts\\nvcuda.dll c:\\windows\\system*\\nvcuda.dll]" time=2025-05-06T16:51:34.796-07:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths=[] time=2025-05-06T16:51:34.796-07:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=cudart64_*.dll time=2025-05-06T16:51:34.796-07:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cudart64_*.dll C:\\java\\jdk1.8.0_144\\bin\\cudart64_*.dll C:\\java\\jdk1.8.0_144\\bin\\cudart64_*.dll C:\\java\\jdk1.8.0_144\\bin\\cudart64_*.dll C:\\anaconda3\\cudart64_*.dll C:\\anaconda3\\Library\\mingw-w64\\bin\\cudart64_*.dll C:\\anaconda3\\Library\\usr\\bin\\cudart64_*.dll C:\\anaconda3\\Library\\bin\\cudart64_*.dll C:\\anaconda3\\Scripts\\cudart64_*.dll C:\\anaconda3\\bin\\cudart64_*.dll C:\\anaconda3\\condabin\\cudart64_*.dll C:\\java\\jdk1.8.0_144\\bin\\cudart64_*.dll C:\\cygwin64\\usr\\local\\bin\\cudart64_*.dll C:\\cygwin64\\bin\\cudart64_*.dll C:\\WINDOWS\\system32\\cudart64_*.dll C:\\WINDOWS\\cudart64_*.dll C:\\WINDOWS\\System32\\Wbem\\cudart64_*.dll C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\cudart64_*.dll C:\\WINDOWS\\System32\\OpenSSH\\cudart64_*.dll C:\\Program Files (x86)\\Intel\\Intel(R) Management Engine Components\\DAL\\cudart64_*.dll C:\\Program Files\\Intel\\Intel(R) Management Engine Components\\DAL\\cudart64_*.dll C:\\Program Files\\Git\\cmd\\cudart64_*.dll C:\\Program Files\\dotnet\\cudart64_*.dll C:\\Program Files (x86)\\Windows Kits\\10\\Windows Performance Toolkit\\cudart64_*.dll C:\\Program Files\\CMake\\bin\\cudart64_*.dll C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\Llvm\\x64\\bin\\cudart64_*.dll C:\\Program Files\\Go\\bin\\cudart64_*.dll C:\\TDM-GCC-64\\bin\\cudart64_*.dll C:\\Users\\User\\AppData\\Local\\Microsoft\\WindowsApps\\cudart64_*.dll C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\cudart64_*.dll C:\\Users\\User\\AppData\\Local\\JetBrains\\Toolbox\\scripts\\cudart64_*.dll C:\\Users\\User\\go\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\c\\cudart64_*.dll C:\\anaconda3\\Scripts\\cudart64_*.dll C:\\cygwin64\\home\\kf\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\c\\cudart64_*.dll C:\\anaconda3\\Scripts\\cudart64_*.dll C:\\cygwin64\\home\\kf\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\c\\cudart64_*.dll C:\\anaconda3\\Scripts\\cudart64_*.dll C:\\cygwin64\\home\\kf\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\scala-2.10.6\\bin\\cudart64_*.dll C:\\cygwin64\\home\\kf\\c\\cudart64_*.dll C:\\anaconda3\\Scripts\\cudart64_*.dll C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v*\\cudart64_*.dll c:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v*\\bin\\cudart64_*.dll]" time=2025-05-06T16:51:34.810-07:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths="[C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v11\\cudart64_110.dll C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12\\cudart64_12.dll]" time=2025-05-06T16:51:34.811-07:00 level=DEBUG source=gpu.go:574 msg="Unable to load cudart library C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v11\\cudart64_110.dll: your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2025-05-06T16:51:34.812-07:00 level=DEBUG source=gpu.go:574 msg="Unable to load cudart library C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12\\cudart64_12.dll: your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2025-05-06T16:51:34.870-07:00 level=DEBUG source=amd_hip_windows.go:88 msg=hipDriverGetVersion version=60241512 time=2025-05-06T16:51:34.870-07:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm" time=2025-05-06T16:51:34.870-07:00 level=DEBUG source=amd_common.go:44 msg="detected ROCM next to ollama executable C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm" time=2025-05-06T16:51:34.872-07:00 level=DEBUG source=amd_windows.go:73 msg="detected hip devices" count=2 time=2025-05-06T16:51:34.872-07:00 level=DEBUG source=amd_windows.go:93 msg="hip device" id=0 name="AMD Radeon Pro W5700" gfx=gfx1010:xnack- time=2025-05-06T16:51:35.530-07:00 level=WARN source=amd_windows.go:138 msg="amdgpu is not supported (supported types:[gfx1030 gfx1100 gfx1101 gfx1102 gfx1151 gfx906])" gpu_type=gfx1010:xnack- gpu=0 library=C:\Users\User\AppData\Local\Programs\Ollama\lib\ollama\rocm time=2025-05-06T16:51:35.531-07:00 level=DEBUG source=amd_windows.go:93 msg="hip device" id=1 name="AMD Radeon RX 7900 XTX" gfx=gfx1100 time=2025-05-06T16:51:36.175-07:00 level=DEBUG source=amd_windows.go:146 msg="amdgpu is supported" gpu=1 gpu_type=gfx1100 time=2025-05-06T16:51:36.175-07:00 level=DEBUG source=amd_windows.go:149 msg="amdgpu memory" gpu=1 total="24.0 GiB" time=2025-05-06T16:51:36.175-07:00 level=DEBUG source=amd_windows.go:150 msg="amdgpu memory" gpu=1 available="23.8 GiB" time=2025-05-06T16:52:02.945-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="127.7 GiB" before.free="101.0 GiB" before.free_swap="101.9 GiB" now.total="127.7 GiB" now.free="100.7 GiB" now.free_swap="100.6 GiB" time=2025-05-06T16:52:03.630-07:00 level=DEBUG source=amd_windows.go:197 msg="updating rocm free memory" gpu=1 name="AMD Radeon RX 7900 XTX" before="23.8 GiB" now="23.7 GiB" time=2025-05-06T16:52:03.633-07:00 level=INFO source=sched.go:188 msg="one or more GPUs detected that are unable to accurately report free memory - disabling default concurrency" time=2025-05-06T16:52:03.682-07:00 level=DEBUG source=memory.go:108 msg=evaluating library=rocm gpu_count=1 available="[23.7 GiB]" time=2025-05-06T16:52:03.682-07:00 level=INFO source=sched.go:754 msg="new model will fit in available VRAM in single GPU, loading" model=C:\Users\User\.ollama\models\blobs\sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff gpu=1 parallel=2 available=25440878592 required="3.7 GiB" time=2025-05-06T16:52:03.682-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="127.7 GiB" before.free="100.7 GiB" before.free_swap="100.6 GiB" now.total="127.7 GiB" now.free="100.7 GiB" now.free_swap="100.4 GiB" time=2025-05-06T16:52:04.345-07:00 level=DEBUG source=amd_windows.go:197 msg="updating rocm free memory" gpu=1 name="AMD Radeon RX 7900 XTX" before="23.7 GiB" now="23.5 GiB" time=2025-05-06T16:52:04.348-07:00 level=INFO source=server.go:106 msg="system memory" total="127.7 GiB" free="100.7 GiB" free_swap="100.4 GiB" time=2025-05-06T16:52:04.348-07:00 level=DEBUG source=memory.go:108 msg=evaluating library=rocm gpu_count=1 available="[23.7 GiB]" time=2025-05-06T16:52:04.348-07:00 level=INFO source=server.go:139 msg=offload library=rocm layers.requested=-1 layers.model=29 layers.offload=29 layers.split="" memory.available="[23.7 GiB]" memory.gpu_overhead="0 B" memory.required.full="3.7 GiB" memory.required.partial="3.7 GiB" memory.required.kv="896.0 MiB" memory.required.allocations="[3.7 GiB]" memory.weights.total="1.9 GiB" memory.weights.repeating="1.6 GiB" memory.weights.nonrepeating="308.2 MiB" memory.graph.full="424.0 MiB" memory.graph.partial="570.7 MiB" time=2025-05-06T16:52:04.349-07:00 level=DEBUG source=server.go:263 msg="compatible gpu libraries" compatible=[rocm] time=2025-05-06T16:52:04.754-07:00 level=DEBUG source=server.go:339 msg="adding gpu library" path=C:\Users\User\AppData\Local\Programs\Ollama\lib\ollama\rocm time=2025-05-06T16:52:04.754-07:00 level=DEBUG source=server.go:346 msg="adding gpu dependency paths" paths=[C:\Users\User\AppData\Local\Programs\Ollama\lib\ollama\rocm] time=2025-05-06T16:52:04.754-07:00 level=INFO source=server.go:410 msg="starting llama server" cmd="C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --model C:\\Users\\User\\.ollama\\models\\blobs\\sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff --ctx-size 8192 --batch-size 512 --n-gpu-layers 29 --verbose --threads 18 --parallel 2 --port 55195" load_backend: loaded ROCm backend from C:\Users\User\AppData\Local\Programs\Ollama\lib\ollama\rocm\ggml-hip.dll time=2025-05-06T16:52:11.904-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="127.7 GiB" before.free="100.7 GiB" before.free_swap="100.4 GiB" now.total="127.7 GiB" now.free="100.7 GiB" now.free_swap="100.2 GiB" time=2025-05-06T16:52:12.591-07:00 level=DEBUG source=amd_windows.go:197 msg="updating rocm free memory" gpu=1 name="AMD Radeon RX 7900 XTX" before="23.5 GiB" now="23.4 GiB" time=2025-05-06T16:52:12.642-07:00 level=DEBUG source=memory.go:108 msg=evaluating library=rocm gpu_count=1 available="[23.4 GiB]" time=2025-05-06T16:52:12.643-07:00 level=INFO source=sched.go:754 msg="new model will fit in available VRAM in single GPU, loading" model=C:\Users\User\.ollama\models\blobs\sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff gpu=1 parallel=2 available=25128730624 required="3.7 GiB" time=2025-05-06T16:52:12.643-07:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="127.7 GiB" before.free="100.7 GiB" before.free_swap="100.2 GiB" now.total="127.7 GiB" now.free="100.7 GiB" now.free_swap="100.1 GiB" time=2025-05-06T16:52:13.324-07:00 level=DEBUG source=amd_windows.go:197 msg="updating rocm free memory" gpu=1 name="AMD Radeon RX 7900 XTX" before="23.4 GiB" now="23.3 GiB" time=2025-05-06T16:52:13.327-07:00 level=INFO source=server.go:106 msg="system memory" total="127.7 GiB" free="100.7 GiB" free_swap="100.1 GiB" time=2025-05-06T16:52:13.327-07:00 level=DEBUG source=memory.go:108 msg=evaluating library=rocm gpu_count=1 available="[23.4 GiB]" time=2025-05-06T16:52:13.327-07:00 level=INFO source=server.go:139 msg=offload library=rocm layers.requested=-1 layers.model=29 layers.offload=29 layers.split="" memory.available="[23.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="3.7 GiB" memory.required.partial="3.7 GiB" memory.required.kv="896.0 MiB" memory.required.allocations="[3.7 GiB]" memory.weights.total="1.9 GiB" memory.weights.repeating="1.6 GiB" memory.weights.nonrepeating="308.2 MiB" memory.graph.full="424.0 MiB" memory.graph.partial="570.7 MiB" time=2025-05-06T16:52:13.327-07:00 level=DEBUG source=server.go:263 msg="compatible gpu libraries" compatible=[rocm] time=2025-05-06T16:52:13.718-07:00 level=DEBUG source=server.go:339 msg="adding gpu library" path=C:\Users\User\AppData\Local\Programs\Ollama\lib\ollama\rocm time=2025-05-06T16:52:13.718-07:00 level=DEBUG source=server.go:346 msg="adding gpu dependency paths" paths=[C:\Users\User\AppData\Local\Programs\Ollama\lib\ollama\rocm] time=2025-05-06T16:52:13.718-07:00 level=INFO source=server.go:410 msg="starting llama server" cmd="C:\\Users\\User\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --model C:\\Users\\User\\.ollama\\models\\blobs\\sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff --ctx-size 8192 --batch-size 512 --n-gpu-layers 29 --verbose --threads 18 --parallel 2 --port 55202" load_backend: loaded ROCm backend from C:\Users\User\AppData\Local\Programs\Ollama\lib\ollama\rocm\ggml-hip.dll ``` ### OS Windows 11 Pro ### GPU AMD Radeon 7900 XTX 24G ### CPU Xeon Gold 6154 ### Ollama version 0.6.8

GiteaMirror added the needs more info bug windows labels 2026-04-12 18:52:00 -05:00

GiteaMirror commented

2026-04-12 18:52:01 -05:00

@Cownjackson commented on GitHub (May 7, 2025):

I'm also dealing with this. It's got my entire server locked up and is preventing me from even connecting to terminate ollama.

@Cownjackson commented on GitHub (May 7, 2025): I'm also dealing with this. It's got my entire server locked up and is preventing me from even connecting to terminate ollama.

GiteaMirror commented

2026-04-12 18:52:02 -05:00

@thbeh commented on GitHub (May 8, 2025):

Same issue here .. the only time Ollama fully clears the GPU's memory is when it's closed.

@thbeh commented on GitHub (May 8, 2025): Same issue here .. the only time Ollama fully clears the GPU's memory is when it's closed.

GiteaMirror commented

2026-04-12 18:52:03 -05:00

@Cownjackson commented on GitHub (May 8, 2025):

Same issue here .. the only time Ollama fully clears the GPU's memory is when it's closed.

Mine isn't even freeing the VRAM after closing ollama... I was completely locked out of my remote server until I force restarted it. Can't use ollama until this is resolved.

@Cownjackson commented on GitHub (May 8, 2025): > Same issue here .. the only time Ollama fully clears the GPU's memory is when it's closed. Mine isn't even freeing the VRAM after closing ollama... I was completely locked out of my remote server until I force restarted it. Can't use ollama until this is resolved.

GiteaMirror commented

2026-04-12 18:52:04 -05:00

@Cownjackson commented on GitHub (May 8, 2025):

Same issue here .. the only time Ollama fully clears the GPU's memory is when it's closed.

Mine isn't even freeing the VRAM after closing ollama... I was completely locked out of my remote server until I force restarted it. Can't use ollama until this is resolved.

Might be worth noting I'm running ollama on windows server 2022 with an Nvidia L40s

@Cownjackson commented on GitHub (May 8, 2025): > > Same issue here .. the only time Ollama fully clears the GPU's memory is when it's closed. > > Mine isn't even freeing the VRAM after closing ollama... I was completely locked out of my remote server until I force restarted it. Can't use ollama until this is resolved. Might be worth noting I'm running ollama on windows server 2022 with an Nvidia L40s

GiteaMirror commented

2026-04-12 18:52:04 -05:00

@Kbeff commented on GitHub (May 11, 2025):

Same exact problem, looks like a critical issue, I hope it will get solved soon

@Kbeff commented on GitHub (May 11, 2025): Same exact problem, looks like a critical issue, I hope it will get solved soon

GiteaMirror commented

2026-04-12 18:52:04 -05:00

@8forty commented on GitHub (May 11, 2025):

@Cownjackson I've had success clearing the GPU memory by killing ollama every time I switch models. In python I'm using psutil to get the pids of all processes named "ollama.exe", then killing them with os.kill(pid, signal.SIGTERM).

Unfortunately I had to add a bunch of error handling for these since sometimes the kill fails (but it appears the process is always dead when that happens so... success?), and sometimes ollama takes a while to restart on the next model load via ollama API's generate function: generate(model=model_name, keep_alive='5m')

@8forty commented on GitHub (May 11, 2025): @Cownjackson I've had success clearing the GPU memory by killing ollama every time I switch models. In python I'm using `psutil` to get the pids of all processes named "ollama.exe", then killing them with `os.kill(pid, signal.SIGTERM)`. Unfortunately I had to add a bunch of error handling for these since sometimes the kill fails (but it appears the process is always dead when that happens so... success?), and sometimes ollama takes a while to restart on the next model load via ollama API's generate function: `generate(model=model_name, keep_alive='5m')`

GiteaMirror commented

2026-04-12 18:52:05 -05:00

@madwax commented on GitHub (May 13, 2025):

Same problem under Ubuntu 24.04

@madwax commented on GitHub (May 13, 2025): Same problem under Ubuntu 24.04

GiteaMirror commented

2026-04-12 18:52:06 -05:00

@rick-github commented on GitHub (May 16, 2025):

There are some fixes in 0.7.0 for memory leaks and orphaned processes. Does this problem still occur with 0.7.0?

@rick-github commented on GitHub (May 16, 2025): There are some fixes in 0.7.0 for memory leaks and orphaned processes. Does this problem still occur with 0.7.0?

GiteaMirror commented

2026-04-12 18:52:06 -05:00

@8forty commented on GitHub (May 16, 2025):

I just updated to 0.7.0 and ran the same code above: same exact results, GPU memory usage grows by about 300M in each iteration of the loop, 153MB, 751MB, 1051MB, ...

@8forty commented on GitHub (May 16, 2025): I just updated to 0.7.0 and ran the same code above: same exact results, GPU memory usage grows by about 300M in each iteration of the loop, 153MB, 751MB, 1051MB, ...

GiteaMirror commented

2026-04-12 18:52:07 -05:00

@rick-github commented on GitHub (May 16, 2025):

When the loop is running, can you confirm that the runner process id (the one with --model on the command line) changes? The model unload should cause the runner to exit, and a new one to start for the next iteration of the loop. When the runner exits, all VRAM should be released. If the runner is exiting but VRAM is not released, that would indicate a leak in the GPU driver.

@rick-github commented on GitHub (May 16, 2025): When the loop is running, can you confirm that the runner process id (the one with --model on the command line) changes? The model unload should cause the runner to exit, and a new one to start for the next iteration of the loop. When the runner exits, all VRAM should be released. If the runner is exiting but VRAM is not released, that would indicate a leak in the GPU driver.

GiteaMirror commented

2026-04-12 18:52:07 -05:00

@8forty commented on GitHub (May 16, 2025):

When I quit ollama and start a new one, an ollama.exe shows up in task manager, and that's the one that keeps growing (PID 35372 in the screenshots below). Each iteration, another ollama.exe shows up temporarily then disappears. "ollama app.exe" never changes.

I added a sleep at the end of each iteration (after the ChatResponse is printed) so I could capture the difference between while the chat is executing (2 ollama.exe running) and after it finishes (1 ollama.exe)

Fresh ollama start:

During loop 0:

End of loop 0 sleep:

During loop 1:

End of loop 1 sleep:

@8forty commented on GitHub (May 16, 2025): When I quit ollama and start a new one, an ollama.exe shows up in task manager, and that's the one that keeps growing (PID 35372 in the screenshots below). Each iteration, another ollama.exe shows up temporarily then disappears. "ollama app.exe" never changes. I added a sleep at the end of each iteration (after the ChatResponse is printed) so I could capture the difference between while the chat is executing (2 ollama.exe running) and after it finishes (1 ollama.exe) Fresh ollama start: <img width="809" alt="Image" src="https://github.com/user-attachments/assets/077c756d-e3ff-4ae6-975c-59e77221b0ce" /> During loop 0: <img width="815" alt="Image" src="https://github.com/user-attachments/assets/2b9f98cd-6db1-4519-b434-66d39bad357e" /> End of loop 0 sleep: <img width="821" alt="Image" src="https://github.com/user-attachments/assets/00daed85-3801-4ffa-886c-e1a6899036c2" /> During loop 1: <img width="813" alt="Image" src="https://github.com/user-attachments/assets/fa48ab98-cf1a-4e74-9c81-052f93f31475" /> End of loop 1 sleep: <img width="813" alt="Image" src="https://github.com/user-attachments/assets/ba622289-df23-4bcc-bdd8-7546a3e9b12f" />

GiteaMirror commented

2026-04-12 18:52:08 -05:00

@rick-github commented on GitHub (May 16, 2025):

Ah, so it's the server that's accumulating VRAM. Which it shouldn't, because it's the runner's jobs to allocate it. All the server does is probe the GPUs to find out how much VRAM is available. I've tested this on Linux and the server doesn't grow, so this looks like a Windows issue. Your system has an AMD GPU and Cownjackson's is Nvidia so perhaps not related to the type of GPU either. I'll see if I can replicate.

@rick-github commented on GitHub (May 16, 2025): Ah, so it's the server that's accumulating VRAM. Which it shouldn't, because it's the runner's jobs to allocate it. All the server does is probe the GPUs to find out how much VRAM is available. I've tested this on Linux and the server doesn't grow, so this looks like a Windows issue. Your system has an AMD GPU and Cownjackson's is Nvidia so perhaps not related to the type of GPU either. I'll see if I can replicate.

GiteaMirror commented

2026-04-12 18:52:08 -05:00

@BloodyIron commented on GitHub (Jun 26, 2025):

worth keeping this open or time to close?

@BloodyIron commented on GitHub (Jun 26, 2025): worth keeping this open or time to close?

GiteaMirror commented

2026-04-12 18:52:09 -05:00

@rick-github commented on GitHub (Jun 26, 2025):

I've been unable to replicate. If somebody can demonstrate the problem with a recent version of ollama, then the ticket should remain open, otherwise close as stale.

@rick-github commented on GitHub (Jun 26, 2025): I've been unable to replicate. If somebody can demonstrate the problem with a recent version of ollama, then the ticket should remain open, otherwise close as stale.

GiteaMirror commented

2026-04-12 18:52:10 -05:00

@Kaehvaman commented on GitHub (Aug 26, 2025):

I've replicated this by run/stop one model several times and the server process accumulated 1.8 GB of video memory. I am using AMD GPU. Ollama version is 0.11.6
The rightmost column is dedicated GPU memory.

@Kaehvaman commented on GitHub (Aug 26, 2025): I've replicated this by run/stop one model several times and the server process accumulated 1.8 GB of video memory. I am using AMD GPU. Ollama version is 0.11.6 The rightmost column is dedicated GPU memory. <img width="2014" height="565" alt="Image" src="https://github.com/user-attachments/assets/5cb9100d-0d2f-4c1b-96f5-820398630097" />

GiteaMirror commented

2026-04-12 18:52:10 -05:00

@sikhness commented on GitHub (Sep 9, 2025):

I'm having the same issue on Windows with Ollama 0.11.6 and using an AMD GPU. I'm going to have to force reboot the ollama server periodically until this gets resolved or my VRAM will blow up.

@sikhness commented on GitHub (Sep 9, 2025): I'm having the same issue on Windows with Ollama 0.11.6 and using an AMD GPU. I'm going to have to force reboot the ollama server periodically until this gets resolved or my VRAM will blow up.

GiteaMirror commented

2026-04-12 18:52:10 -05:00

@germgerm commented on GitHub (Oct 2, 2025):

windows 10 ollama 0.12.3
stopping ollama and issuing a a simple ollama list causes vram usage to increase by 149mb, stopping ollama frees it up - expected behaviour ?
also running into vram not being released even though ps reports no models loaded

@germgerm commented on GitHub (Oct 2, 2025): windows 10 ollama 0.12.3 stopping ollama and issuing a a simple _ollama list_ causes vram usage to increase by 149mb, stopping ollama frees it up - expected behaviour ? also running into vram not being released even though _ps_ reports no models loaded

GiteaMirror referenced this issue

2026-04-12 23:54:05 -05:00

[PR #6971] [CLOSED] draft: mllama vision encoder #12279

GiteaMirror referenced this issue

2026-04-16 06:06:08 -05:00

[PR #6971] [CLOSED] draft: mllama vision encoder #17550

GiteaMirror referenced this issue

2026-04-19 16:35:09 -05:00

[PR #6971] [CLOSED] draft: mllama vision encoder #22819

GiteaMirror referenced this issue

2026-04-22 22:49:26 -05:00

[PR #6971] [CLOSED] draft: mllama vision encoder #38152

GiteaMirror referenced this issue

2026-04-24 23:08:23 -05:00

[PR #6971] [CLOSED] draft: mllama vision encoder #43527

GiteaMirror referenced this issue

2026-04-29 13:51:40 -05:00

[PR #6971] [CLOSED] draft: mllama vision encoder #58976

GiteaMirror referenced this issue

2026-05-05 06:44:00 -05:00

[PR #6971] [CLOSED] draft: mllama vision encoder #74573

Sign in to join this conversation.

Branches Tags

main

parth-update-hermes-launch

parth-agent-system-prompt-cwd

hoyyeva/vscode-extension-docs-update

parth-gemma4-chat-template-renderer

parth-fix-claude-model-picker

parth-api-status-context-length

docs/vscode-extension-setup

hoyyeva/wire-up-context-length

hoyyeva/claude-code-context-doc

jmorganca/investigate-issue-17046

hoyyeva/hermes-docs

jmorganca/agent-loop-style

hoyyeva/openclaw

parth-agent-loop

hoyyeva/ollama-vscode-extension

brucemacd/cache-metrics

brucemacd/hermes-desktop

hoyyeva/docs-vscode

parth-input-style-experiment

brucemacd/docs-glm52

hoyyeva/poc-docs

Parth/mlx-launch-recommendations

parth-first-time-app-cli-experience

test/darwin-xcode-pin

improve-cloud-model-recommendations

hoyyeva/goose-docs

jmorganca/context-limit-fixes

hoyyeva/qwen-doc

hoyyeva/vscode-docs

jmorganca/remove-mlx-imagegen-code

parth-copilot-token-length-defaults

hoyyeva/poolside-windows

laguna-support

jmorganca/harden-markdown-rendering

laguna-renderer-parser

laguna-llamacpp

codex/make-integration-hidden-and-lunchable

brucemacd/omp-docs

pdevine/gguf-mtp-oldstyle

hoyyeva/migrate-pi

hoyyeva/anthropic-local-image-path

parth-launch-codex-app

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth/hide-claude-desktop-till-release

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#6971