[GH-ISSUE #13677] AMD GPU incorrect VRAM-detection #71041

Open
opened 2026-05-04 23:50:19 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @arktisk-varg on GitHub (Jan 11, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/13677

What is the issue?

My system is using an AMD 780m iGPU with 16GB VRAM allocated in bios, running with vulkan.

However ollama always sets available VRAM only to about 8GB (which is the size of GTT and half of system RAM). This is not supposed to be the case. In VULKAN VRAM is being used, so VRAM should be detected with 16GB.

VRAM detection was supposedly fixed with #13196 , but clearly it is not!
Also it is not a hardware or container-configuration issue, because inside, the container can see the expected VRAM/RAM configurations correctly:

Image Also note, that when loading a model, ollama uses the dedicated VRAM, but just only about half (limited to what it falsely detects at startup).

Also the previous versions 0.13.5, 0.13.4 all produce the same issue.

Relevant log output

time=2026-01-11T14:44:42.054Z level=INFO source=routes.go:1601 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"

time=2026-01-11T14:44:42.055Z level=INFO source=images.go:499 msg="total blobs: 16"

time=2026-01-11T14:44:42.055Z level=INFO source=images.go:506 msg="total unused blobs removed: 0"

time=2026-01-11T14:44:42.055Z level=INFO source=routes.go:1654 msg="Listening on [::]:11434 (version 0.14.0-rc2)"

time=2026-01-11T14:44:42.056Z level=INFO source=runner.go:67 msg="discovering available GPUs..."

time=2026-01-11T14:44:42.057Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 42173"

time=2026-01-11T14:44:42.075Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 37005"

time=2026-01-11T14:44:42.129Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 42655"

time=2026-01-11T14:44:42.147Z level=INFO source=types.go:42 msg="inference compute" id=00000000-0100-0000-0000-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="AMD Radeon 780M Graphics (RADV PHOENIX)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:01:00.0 type=iGPU total="8.1 GiB" available="8.0 GiB"

time=2026-01-11T14:44:42.147Z level=INFO source=routes.go:1695 msg="entering low vram mode" "total vram"="8.1 GiB" threshold="20.0 GiB"

OS

Debian/Docker 6.12.57+deb13-amd64

GPU

AMD Radeon 780M Graphics

CPU

AMD Ryzen 7 8745H

Ollama version

ollama/ollama:0.14.0-rc2

Originally created by @arktisk-varg on GitHub (Jan 11, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/13677 ### What is the issue? My system is using an AMD 780m iGPU with 16GB VRAM allocated in bios, running with vulkan. However ollama always sets available VRAM only to about 8GB (which is the size of GTT and half of system RAM). This is not supposed to be the case. In VULKAN VRAM is being used, so VRAM should be detected with 16GB. VRAM detection was supposedly fixed with #13196 , but clearly it is not! Also it is not a hardware or container-configuration issue, because inside, the container can see the expected VRAM/RAM configurations correctly: <img width="1606" height="565" alt="Image" src="https://github.com/user-attachments/assets/02feb95a-df41-4400-885d-c1d6cae993b1" /> Also note, that when loading a model, ollama uses the dedicated VRAM, but just only about half (limited to what it falsely detects at startup). Also the previous versions 0.13.5, 0.13.4 all produce the same issue. ### Relevant log output ```shell time=2026-01-11T14:44:42.054Z level=INFO source=routes.go:1601 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:true ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2026-01-11T14:44:42.055Z level=INFO source=images.go:499 msg="total blobs: 16" time=2026-01-11T14:44:42.055Z level=INFO source=images.go:506 msg="total unused blobs removed: 0" time=2026-01-11T14:44:42.055Z level=INFO source=routes.go:1654 msg="Listening on [::]:11434 (version 0.14.0-rc2)" time=2026-01-11T14:44:42.056Z level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-01-11T14:44:42.057Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 42173" time=2026-01-11T14:44:42.075Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 37005" time=2026-01-11T14:44:42.129Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 42655" time=2026-01-11T14:44:42.147Z level=INFO source=types.go:42 msg="inference compute" id=00000000-0100-0000-0000-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="AMD Radeon 780M Graphics (RADV PHOENIX)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:01:00.0 type=iGPU total="8.1 GiB" available="8.0 GiB" time=2026-01-11T14:44:42.147Z level=INFO source=routes.go:1695 msg="entering low vram mode" "total vram"="8.1 GiB" threshold="20.0 GiB" ``` ### OS Debian/Docker 6.12.57+deb13-amd64 ### GPU AMD Radeon 780M Graphics ### CPU AMD Ryzen 7 8745H ### Ollama version ollama/ollama:0.14.0-rc2
GiteaMirror added the bug label 2026-05-04 23:50:19 -05:00
Author
Owner

@moontato commented on GitHub (Jan 13, 2026):

happens on all versions after 0.12.11

<!-- gh-comment-id:3744886600 --> @moontato commented on GitHub (Jan 13, 2026): happens on all versions after 0.12.11
Author
Owner

@arktisk-varg commented on GitHub (Jan 13, 2026):

Oh ok, I have not tried that version. Is there a fix or a roadmap when this might get fixed?

<!-- gh-comment-id:3745047032 --> @arktisk-varg commented on GitHub (Jan 13, 2026): Oh ok, I have not tried that version. Is there a fix or a roadmap when this might get fixed?
Author
Owner

@Sharkytail3r commented on GitHub (Jan 13, 2026):

great work?

<!-- gh-comment-id:3745118927 --> @Sharkytail3r commented on GitHub (Jan 13, 2026): great work?
Author
Owner

@arktisk-varg commented on GitHub (Jan 14, 2026):

Just wanted to add:
The issue still persists in final v0.14.0 despite release-note "More accurate VRAM measurements for AMD iGPUs".

<!-- gh-comment-id:3749423501 --> @arktisk-varg commented on GitHub (Jan 14, 2026): Just wanted to add: The issue still persists in final v0.14.0 despite release-note "More accurate VRAM measurements for AMD iGPUs".
Author
Owner

@gianlucaT1989 commented on GitHub (Jan 16, 2026):

I'm joining this discussion because I'm experiencing the exact same behavior on an AMD Radeon 780M (Phoenix).

Despite having 8GB of VRAM allocated in the BIOS and correctly recognized by the kernel, Ollama (v0.14.0+) continues to allocate models exclusively in the GTT (System RAM). The VRAM column in amdgpu_top for the ollama process stays at 0M, while GTT fills up.

My environment:

  • OS: Proxmox/Debian (Kernel 6.x)
  • ROCm Version: 7.1.1
  • GPU: Radeon 780M (GFX1103)
  • BIOS UMA Buffer: 8GB (Confirmed by cat /sys/class/drm/card0/device/mem_info_vram_total reporting 8589934592)

My overwrite.conf attempt:

[Service]
Environment="LD_LIBRARY_PATH=/opt/rocm-7.1.1/lib"
Environment="OLLAMA_AMD_GPU=1"
Environment="HSA_OVERRIDE_GFX_VERSION=11.0.2"
Environment="HSA_AMD_SYSTEM_RESOURCES=0"
Environment="HIP_FORCE_DEVICE_ALLOC=1"
Environment="OLLAMA_VRAM_OVERRIDE=8589934592"

Results:
Even with HSA_AMD_SYSTEM_RESOURCES=0 (which should forbid system RAM allocation), the model is still pushed to GTT. Performance is actually good (~36 tk/s on Llama 3 8B), but the 8GB pre-allocated in the BIOS are completely wasted as they sit empty, while Ollama consumes additional system RAM.

My questions to the maintainers:

  1. Hardcoded preference? Is the Ollama/ROCm runner hardcoded to prefer "Host-Registered Memory" (GTT) on Phoenix/RDNA3 APUs? If so, is there a way to override this to use the UMA buffer?
  2. ROCm Target: Since ROCm 7.1.1 officially supports RDNA3 APUs, why is HSA_OVERRIDE_GFX_VERSION=11.0.2 still required? If I set it to the native 11.0.3, Ollama fails to find compatible code objects.
  3. VRAM Reporting: Why does Ollama report "accurate VRAM measurements" in the v0.14.0 changelog if it then ignores the available VRAM in favor of GTT on these chips?

It seems like the allocator doesn't "trust" the VRAM reporting on APUs and defaults to GTT for stability, but this leads to significant RAM waste on systems with large BIOS-allocated buffers.

<!-- gh-comment-id:3761918093 --> @gianlucaT1989 commented on GitHub (Jan 16, 2026): I'm joining this discussion because I'm experiencing the exact same behavior on an **AMD Radeon 780M (Phoenix)**. Despite having **8GB of VRAM** allocated in the BIOS and correctly recognized by the kernel, Ollama (v0.14.0+) continues to allocate models exclusively in the **GTT (System RAM)**. The VRAM column in `amdgpu_top` for the `ollama` process stays at **0M**, while GTT fills up. **My environment:** * **OS:** Proxmox/Debian (Kernel 6.x) * **ROCm Version:** 7.1.1 * **GPU:** Radeon 780M (GFX1103) * **BIOS UMA Buffer:** 8GB (Confirmed by `cat /sys/class/drm/card0/device/mem_info_vram_total` reporting `8589934592`) **My `overwrite.conf` attempt:** ```ini [Service] Environment="LD_LIBRARY_PATH=/opt/rocm-7.1.1/lib" Environment="OLLAMA_AMD_GPU=1" Environment="HSA_OVERRIDE_GFX_VERSION=11.0.2" Environment="HSA_AMD_SYSTEM_RESOURCES=0" Environment="HIP_FORCE_DEVICE_ALLOC=1" Environment="OLLAMA_VRAM_OVERRIDE=8589934592" ``` **Results:** Even with `HSA_AMD_SYSTEM_RESOURCES=0` (which should forbid system RAM allocation), the model is still pushed to GTT. Performance is actually good (~36 tk/s on Llama 3 8B), but the 8GB pre-allocated in the BIOS are completely wasted as they sit empty, while Ollama consumes additional system RAM. **My questions to the maintainers:** 1. **Hardcoded preference?** Is the Ollama/ROCm runner hardcoded to prefer "Host-Registered Memory" (GTT) on Phoenix/RDNA3 APUs? If so, is there a way to override this to use the UMA buffer? 2. **ROCm Target:** Since ROCm 7.1.1 officially supports RDNA3 APUs, why is `HSA_OVERRIDE_GFX_VERSION=11.0.2` still required? If I set it to the native `11.0.3`, Ollama fails to find compatible code objects. 3. **VRAM Reporting:** Why does Ollama report "accurate VRAM measurements" in the v0.14.0 changelog if it then ignores the available VRAM in favor of GTT on these chips? It seems like the allocator doesn't "trust" the VRAM reporting on APUs and defaults to GTT for stability, but this leads to significant RAM waste on systems with large BIOS-allocated buffers.
Author
Owner

@arktisk-varg commented on GitHub (Jan 20, 2026):

No this is expected, I think. ROCm uses GTT and VULKAN uses VRAM. However their respective VRAM/GTT-size detection mechanisms are faulty.

<!-- gh-comment-id:3774308288 --> @arktisk-varg commented on GitHub (Jan 20, 2026): No this is expected, I think. ROCm uses GTT and VULKAN uses VRAM. However their respective VRAM/GTT-size detection mechanisms are faulty.
Author
Owner

@joshuacraigt-web commented on GitHub (Mar 11, 2026):

I added the following to the Ollama system file and it seems to have done the trick.
[Service]
Environment="HSA_OVERRIDE_GFX_VERSION=11.0.2"
Environment="ROCR_VISIBLE_DEVICES=0"
Environment="OLLAMA_GPU_OVERHEAD=0"

Once I restarted Ollama I had to repull models but they're running on 100% GPU according to ollama ps.

<!-- gh-comment-id:4041400950 --> @joshuacraigt-web commented on GitHub (Mar 11, 2026): I added the following to the Ollama system file and it seems to have done the trick. [Service] Environment="HSA_OVERRIDE_GFX_VERSION=11.0.2" Environment="ROCR_VISIBLE_DEVICES=0" Environment="OLLAMA_GPU_OVERHEAD=0" Once I restarted Ollama I had to repull models but they're running on 100% GPU according to ollama ps.
Author
Owner

@arktisk-varg commented on GitHub (Mar 12, 2026):

I added the following to the Ollama system file and it seems to have done the trick. [Service] Environment="HSA_OVERRIDE_GFX_VERSION=11.0.2" Environment="ROCR_VISIBLE_DEVICES=0" Environment="OLLAMA_GPU_OVERHEAD=0"

Once I restarted Ollama I had to repull models but they're running on 100% GPU according to ollama ps.

Are you running ROCm or VULKAN?

<!-- gh-comment-id:4047420589 --> @arktisk-varg commented on GitHub (Mar 12, 2026): > I added the following to the Ollama system file and it seems to have done the trick. [Service] Environment="HSA_OVERRIDE_GFX_VERSION=11.0.2" Environment="ROCR_VISIBLE_DEVICES=0" Environment="OLLAMA_GPU_OVERHEAD=0" > > Once I restarted Ollama I had to repull models but they're running on 100% GPU according to ollama ps. Are you running ROCm or VULKAN?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#71041