[GH-ISSUE #13419] Unable to utilize GTTSize instead of Bios VRAM limit for full RAM utilization #70919

Open
opened 2026-05-04 23:27:51 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @BarachielFallen on GitHub (Dec 11, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13419

What is the issue?

I am able to utilize the full 128gb of system RAM as VRAM using ollama version 12.11 on
ROCM 7.0.2 linux kernel 6.14 Ubuntu 24.04.3 LTS. I am using Ollama: ROCM and not Vulkan backend fully containerized in a docker compose stack on a GMKTec Evo-X2 AMD 395. I was hoping you had a
solution for the newer version of Ollama past 12.11 that works for this functionality since I cant use the newer models
like minimax and QWEN Next past version 12 with full vram. GTTSize /
AMDGTTsize is completely broken in the newer builds, it only reports the
Bios VRAM setting which I have minimized in order to take advantage of gtt page sizes in the kernel. I even
tried the new Ollama engine with no luck. Is there plans on fixing the ROCM version of Ollama in the newest builds for AMDGTT/GTTSIZE?

Relevant log output

"time=2025-12-02T04:37:28.716Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --model /root/.ollama/models/blobs                                                                                                                                            /sha256-090a1569019c7b110eaaf505792bdbed02b7ea5b54783e18a38a1747e4de96eb --port 34237"
time=2025-12-02T04:37:28.717Z level=INFO source=sched.go:443 msg="system memory" total="124.9 GiB" free="122.5 GiB" free_swap="7.7 GiB"
time=2025-12-02T04:37:28.717Z level=INFO source=sched.go:450 msg="gpu memory" id=0 library=ROCm available="121.0 GiB" free="121.5 GiB" minimum="45                                                                                                                                            7.0 MiB" overhead="0 B"
time=2025-12-02T04:37:28.717Z level=INFO source=server.go:459 msg="loading model" "model layers"=48 requested=-1
time=2025-12-02T04:37:28.717Z level=INFO source=device.go:240 msg="model weights" device=ROCm0 size="67.2 GiB"
time=2025-12-02T04:37:28.717Z level=INFO source=device.go:251 msg="kv cache" device=ROCm0 size="17.2 GiB"
time=2025-12-02T04:37:28.717Z level=INFO source=device.go:262 msg="compute graph" device=ROCm0 size="34.4 GiB"
time=2025-12-02T04:37:28.717Z level=INFO source=device.go:272 msg="total memory" size="118.8 GiB"
time=2025-12-02T04:37:28.725Z level=INFO source=runner.go:963 msg="starting go runner"
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so
/opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, ID: 0
load_backend: loaded ROCm backend from /usr/lib/ollama/rocm/libggml-hip.so
time=2025-12-02T04:37:29.385Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=                                                                                                                                            1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.PEER_MAX_BATCH_SI                                                                                                                                            ZE=128 compiler=cgo(gcc)
time=2025-12-02T04:37:29.385Z level=INFO source=runner.go:999 msg="Server listening on 127.0.0.1:34237"
time=2025-12-02T04:37:29.392Z level=INFO source=runner.go:893 msg=load request="{Operation:commit LoraPath:[] Parallel:3 BatchSize:512 FlashAttent                                                                                                                                            ion:false KvSize:96000 KvCacheType: NumThreads:16 GPULayers:48[ID:0 Layers:48(0..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}"
time=2025-12-02T04:37:29.393Z level=INFO source=server.go:1294 msg="waiting for llama runner to start responding"
llama_model_load_from_file_impl: using device ROCm0 (AMD Radeon Graphics) (0000:c5:00.0) - 124397 MiB free
time=2025-12-02T04:37:29.393Z level=INFO source=server.go:1328 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: loaded meta data with 50 key-value pairs and 803 tensors from /root/.ollama/models/blobs/sha256-090a1569019c7b110eaaf505792bdb                                                                                                                                            ed02b7ea5b54783e18a38a1747e4de96eb (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.

OS

Ubuntu 24.04.3 LTS kernel 6.14.0-36-generic

GPU

AMD 8060S iGPU

CPU

AMD Ryzen AI Max+ 395

Ollama version

working version: 12.11
non-working versions: x >12.11

Originally created by @BarachielFallen on GitHub (Dec 11, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13419 ### What is the issue? I am able to utilize the full 128gb of system RAM as VRAM using ollama version 12.11 on ROCM 7.0.2 linux kernel 6.14 Ubuntu 24.04.3 LTS. I am using Ollama: ROCM and not Vulkan backend fully containerized in a docker compose stack on a GMKTec Evo-X2 AMD 395. I was hoping you had a solution for the newer version of Ollama past 12.11 that works for this functionality since I cant use the newer models like minimax and QWEN Next past version 12 with full vram. GTTSize / AMDGTTsize is completely broken in the newer builds, it only reports the Bios VRAM setting which I have minimized in order to take advantage of gtt page sizes in the kernel. I even tried the new Ollama engine with no luck. Is there plans on fixing the ROCM version of Ollama in the newest builds for AMDGTT/GTTSIZE? ### Relevant log output ```shell "time=2025-12-02T04:37:28.716Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --model /root/.ollama/models/blobs /sha256-090a1569019c7b110eaaf505792bdbed02b7ea5b54783e18a38a1747e4de96eb --port 34237" time=2025-12-02T04:37:28.717Z level=INFO source=sched.go:443 msg="system memory" total="124.9 GiB" free="122.5 GiB" free_swap="7.7 GiB" time=2025-12-02T04:37:28.717Z level=INFO source=sched.go:450 msg="gpu memory" id=0 library=ROCm available="121.0 GiB" free="121.5 GiB" minimum="45 7.0 MiB" overhead="0 B" time=2025-12-02T04:37:28.717Z level=INFO source=server.go:459 msg="loading model" "model layers"=48 requested=-1 time=2025-12-02T04:37:28.717Z level=INFO source=device.go:240 msg="model weights" device=ROCm0 size="67.2 GiB" time=2025-12-02T04:37:28.717Z level=INFO source=device.go:251 msg="kv cache" device=ROCm0 size="17.2 GiB" time=2025-12-02T04:37:28.717Z level=INFO source=device.go:262 msg="compute graph" device=ROCm0 size="34.4 GiB" time=2025-12-02T04:37:28.717Z level=INFO source=device.go:272 msg="total memory" size="118.8 GiB" time=2025-12-02T04:37:28.725Z level=INFO source=runner.go:963 msg="starting go runner" load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, ID: 0 load_backend: loaded ROCm backend from /usr/lib/ollama/rocm/libggml-hip.so time=2025-12-02T04:37:29.385Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA= 1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.PEER_MAX_BATCH_SI ZE=128 compiler=cgo(gcc) time=2025-12-02T04:37:29.385Z level=INFO source=runner.go:999 msg="Server listening on 127.0.0.1:34237" time=2025-12-02T04:37:29.392Z level=INFO source=runner.go:893 msg=load request="{Operation:commit LoraPath:[] Parallel:3 BatchSize:512 FlashAttent ion:false KvSize:96000 KvCacheType: NumThreads:16 GPULayers:48[ID:0 Layers:48(0..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}" time=2025-12-02T04:37:29.393Z level=INFO source=server.go:1294 msg="waiting for llama runner to start responding" llama_model_load_from_file_impl: using device ROCm0 (AMD Radeon Graphics) (0000:c5:00.0) - 124397 MiB free time=2025-12-02T04:37:29.393Z level=INFO source=server.go:1328 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: loaded meta data with 50 key-value pairs and 803 tensors from /root/.ollama/models/blobs/sha256-090a1569019c7b110eaaf505792bdb ed02b7ea5b54783e18a38a1747e4de96eb (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. ``` ### OS Ubuntu 24.04.3 LTS kernel 6.14.0-36-generic ### GPU AMD 8060S iGPU ### CPU AMD Ryzen AI Max+ 395 ### Ollama version working version: 12.11 non-working versions: x >12.11
GiteaMirror added the bug label 2026-05-04 23:27:51 -05:00
Author
Owner

@BarachielFallen commented on GitHub (Dec 11, 2025):

Here is the revelant log comparison between version 12.11 and the latest version showing no vram based on my bios and not kernel settings vs full RAM utilization: docker logs --follow ollama
time=2025-12-11T04:01:28.580Z level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:11.5.1 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:32000 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:30m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:true OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-12-11T04:01:28.583Z level=INFO source=images.go:522 msg="total blobs: 77"
time=2025-12-11T04:01:28.584Z level=INFO source=images.go:529 msg="total unused blobs removed: 0"
time=2025-12-11T04:01:28.585Z level=INFO source=routes.go:1597 msg="Listening on [::]:11434 (version 0.13.2)"
time=2025-12-11T04:01:28.585Z level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2025-12-11T04:01:28.585Z level=WARN source=runner.go:485 msg="user overrode visible devices" HSA_OVERRIDE_GFX_VERSION=11.5.1
time=2025-12-11T04:01:28.585Z level=WARN source=runner.go:489 msg="if GPUs are not correctly discovered, unset and try again"
time=2025-12-11T04:01:28.585Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 44721"
time=2025-12-11T04:01:29.291Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 39419"
time=2025-12-11T04:01:30.020Z level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1151 name=ROCm0 description="AMD Radeon Graphics" libdirs=ollama,rocm driver=60342.13 pci_id=0000:c5:00.0 type=iGPU total="512.0 MiB" available="254.6 MiB"
time=2025-12-11T04:01:30.020Z level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="512.0 MiB" threshold="20.0 GiB"
^Xroot@llmmachine4:/home/llmadmin# docker logs --follow ollama
time=2025-12-11T04:03:46.240Z level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:11.5.1 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:32000 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:30m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:true OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:
https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-12-11T04:03:46.242Z level=INFO source=images.go:522 msg="total blobs: 77"
time=2025-12-11T04:03:46.243Z level=INFO source=images.go:529 msg="total unused blobs removed: 0"
time=2025-12-11T04:03:46.243Z level=INFO source=routes.go:1597 msg="Listening on [::]:11434 (version 0.12.11)"
time=2025-12-11T04:03:46.244Z level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2025-12-11T04:03:46.244Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 44279"
time=2025-12-11T04:03:47.181Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 35327"
time=2025-12-11T04:03:47.842Z level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1151 name=ROCm0 description="AMD Radeon Graphics" libdirs=ollama,rocm driver=60342.13 pci_id=0000:c5:00.0 type=iGPU total="128.0 GiB" available="121.5 GiB"

<!-- gh-comment-id:3639986709 --> @BarachielFallen commented on GitHub (Dec 11, 2025): Here is the revelant log comparison between version 12.11 and the latest version showing no vram based on my bios and not kernel settings vs full RAM utilization: docker logs --follow ollama time=2025-12-11T04:01:28.580Z level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:11.5.1 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:32000 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:30m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:true OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-12-11T04:01:28.583Z level=INFO source=images.go:522 msg="total blobs: 77" time=2025-12-11T04:01:28.584Z level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-12-11T04:01:28.585Z level=INFO source=routes.go:1597 msg="Listening on [::]:11434 (version 0.13.2)" time=2025-12-11T04:01:28.585Z level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2025-12-11T04:01:28.585Z level=WARN source=runner.go:485 msg="user overrode visible devices" HSA_OVERRIDE_GFX_VERSION=11.5.1 time=2025-12-11T04:01:28.585Z level=WARN source=runner.go:489 msg="if GPUs are not correctly discovered, unset and try again" time=2025-12-11T04:01:28.585Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 44721" time=2025-12-11T04:01:29.291Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 39419" time=2025-12-11T04:01:30.020Z level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1151 name=ROCm0 description="AMD Radeon Graphics" libdirs=ollama,rocm driver=60342.13 pci_id=0000:c5:00.0 type=iGPU total="512.0 MiB" available="254.6 MiB" time=2025-12-11T04:01:30.020Z level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="512.0 MiB" threshold="20.0 GiB" ^Xroot@llmmachine4:/home/llmadmin# docker logs --follow ollama time=2025-12-11T04:03:46.240Z level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:11.5.1 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:32000 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:30m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:true OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-12-11T04:03:46.242Z level=INFO source=images.go:522 msg="total blobs: 77" time=2025-12-11T04:03:46.243Z level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-12-11T04:03:46.243Z level=INFO source=routes.go:1597 msg="Listening on [::]:11434 (version 0.12.11)" time=2025-12-11T04:03:46.244Z level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2025-12-11T04:03:46.244Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 44279" time=2025-12-11T04:03:47.181Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 35327" time=2025-12-11T04:03:47.842Z level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1151 name=ROCm0 description="AMD Radeon Graphics" libdirs=ollama,rocm driver=60342.13 pci_id=0000:c5:00.0 type=iGPU total="128.0 GiB" available="121.5 GiB"
Author
Owner

@BarachielFallen commented on GitHub (Dec 11, 2025):

It is the difference between docker run ollama/ollama:0.12.11-rocm and docker run ollama/ollama:rocm in docker compose form.

<!-- gh-comment-id:3639994818 --> @BarachielFallen commented on GitHub (Dec 11, 2025): It is the difference between docker run ollama/ollama:0.12.11-rocm and docker run ollama/ollama:rocm in docker compose form.
Author
Owner

@arktisk-varg commented on GitHub (Dec 17, 2025):

This is weird, I have the opposite issue. I am running ollama/ollama:0.13.4 in docker with vulkan on an amd igpu. It works fine, except for that it always detects the gttsize ram not the allocated vram. This causes ollama to only use the vram up to the gttsize. In my case I have 16GB vram, 16GB system ram and so it ends up using/detecting only 8G of the vram.

<!-- gh-comment-id:3666995671 --> @arktisk-varg commented on GitHub (Dec 17, 2025): This is weird, I have the opposite issue. I am running ollama/ollama:0.13.4 in docker with vulkan on an amd igpu. It works fine, except for that it always detects the gttsize ram not the allocated vram. This causes ollama to only use the vram up to the gttsize. In my case I have 16GB vram, 16GB system ram and so it ends up using/detecting only 8G of the vram.
Author
Owner

@rjmalagon commented on GitHub (Dec 19, 2025):

This is weird, I have the opposite issue. I am running ollama/ollama:0.13.4 in docker with vulkan on an amd igpu. It works fine, except for that it always detects the gttsize ram not the allocated vram. This causes ollama to only use the vram up to the gttsize. In my case I have 16GB vram, 16GB system ram and so it ends up using/detecting only 8G of the vram.

This may help you, you can change the default gtt allocation. https://github.com/rjmalagon/ollama-linux-amd-apu/tree/apu-optimizer?tab=readme-ov-file#modify-the-amount-of-gtt-memory

Or at boot level you can specify this kernel arg amdgpu.gttsize=15360, values in KB.

<!-- gh-comment-id:3675538348 --> @rjmalagon commented on GitHub (Dec 19, 2025): > This is weird, I have the opposite issue. I am running ollama/ollama:0.13.4 in docker with vulkan on an amd igpu. It works fine, except for that it always detects the gttsize ram not the allocated vram. This causes ollama to only use the vram up to the gttsize. In my case I have 16GB vram, 16GB system ram and so it ends up using/detecting only 8G of the vram. This may help you, you can change the default gtt allocation. https://github.com/rjmalagon/ollama-linux-amd-apu/tree/apu-optimizer?tab=readme-ov-file#modify-the-amount-of-gtt-memory Or at boot level you can specify this kernel arg `amdgpu.gttsize=15360`, values in KB.
Author
Owner

@arktisk-varg commented on GitHub (Dec 28, 2025):

Interesting idea. However modifying /etc/modprobe.d/ttm.conf seems to change nothing. GTT remains same, half of system ram.

<!-- gh-comment-id:3694784803 --> @arktisk-varg commented on GitHub (Dec 28, 2025): Interesting idea. However modifying /etc/modprobe.d/ttm.conf seems to change nothing. GTT remains same, half of system ram.
Author
Owner

@rick-github commented on GitHub (Jan 1, 2026):

Perhaps resolved in the next release by https://github.com/ollama/ollama/pull/13196.

<!-- gh-comment-id:3703301030 --> @rick-github commented on GitHub (Jan 1, 2026): Perhaps resolved in the next release by https://github.com/ollama/ollama/pull/13196.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#70919