[GH-ISSUE #12062] AMD APU (gfx1151) – VRAM reported as “total memory”, large GTT ignored → GPU wrongly rejected / underutilized #54523

Closed
opened 2026-04-29 06:14:57 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @gras64 on GitHub (Aug 24, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12062

What is the issue?

Summary:
On AMD APUs (e.g. gfx1151) with small fixed VRAM (512 MiB) but huge GTT
(≈ 108 GB), Ollama currently reads total memory as VRAM only. This
leads to the GPU being flagged as “too small” and skipped

The relevant code is in discover/amd_linux.go, which reads
mem_info_vram_total but does not add mem_info_gtt_total.


Current behavior:

  • Hardware: AMD Ryzen AI Strix Halo / gfx1151 APU
  • VRAM 512 MiB
  • GTT 108 GB (out of total 128GB)

Expected behavior:
If VRAM is below a reasonable threshold, add GTT to
the total memory calculation. This matches how the AMDGPU driver
exposes memory for APUs: VRAM + GTT together form the usable pool.


Minimal invasive fix (idea):
Instead of introducing an isAPU detection, just fallback to
combined memory when VRAM is too small
:

This way:

  • Discrete GPUs (large VRAM) remain unchanged.
  • APUs with tiny VRAM but large GTT report a realistic “total memory”.
  • No new flags or hardware heuristics needed.

Alternative / more robust option:

  • Always log VRAM, GTT, and “effective total” separately.

Why this matters:

  • On gfx1151 APUs, Ollama otherwise ignores ~108 GB of available GPU
    memory and treats the device as unusable.
  • This small change allows APUs to participate in workloads without
    false rejection.
  • Similar issues appeared in the but no solution was implemented

Relevant log output


OS

Linux

GPU

AMD

CPU

AMD

Ollama version

0.11.6

Originally created by @gras64 on GitHub (Aug 24, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12062 ### What is the issue? **Summary:** On AMD APUs (e.g. gfx1151) with small fixed VRAM (512 MiB) but huge GTT (≈ 108 GB), Ollama currently reads *total memory* as VRAM only. This leads to the GPU being flagged as “too small” and skipped The relevant code is in **`discover/amd_linux.go`**, which reads `mem_info_vram_total` but does not add `mem_info_gtt_total`. --- **Current behavior:** * Hardware: AMD Ryzen AI Strix Halo / gfx1151 APU * VRAM 512 MiB * GTT 108 GB (out of total 128GB) --- **Expected behavior:** If VRAM is below a reasonable threshold, **add GTT** to the total memory calculation. This matches how the AMDGPU driver exposes memory for APUs: VRAM + GTT together form the usable pool. --- **Minimal invasive fix (idea):** Instead of introducing an `isAPU` detection, just **fallback to combined memory when VRAM is too small**: This way: * Discrete GPUs (large VRAM) remain unchanged. * APUs with tiny VRAM but large GTT report a realistic “total memory”. * No new flags or hardware heuristics needed. --- **Alternative / more robust option:** * Always log VRAM, GTT, and “effective total” separately. --- **Why this matters:** * On gfx1151 APUs, Ollama otherwise ignores \~108 GB of available GPU memory and treats the device as unusable. * This small change allows APUs to participate in workloads without false rejection. * Similar issues appeared in the but no solution was implemented ### Relevant log output ```shell ``` ### OS Linux ### GPU AMD ### CPU AMD ### Ollama version 0.11.6
GiteaMirror added the bug label 2026-04-29 06:14:57 -05:00
Author
Owner

@jaredduggan commented on GitHub (Aug 26, 2025):

I have the same issue. The better fix would be for ollama to add full support for the chipset so we do not have to set vram to 512MB and use GTT.

<!-- gh-comment-id:3225942156 --> @jaredduggan commented on GitHub (Aug 26, 2025): I have the same issue. The better fix would be for ollama to add full support for the chipset so we do not have to set vram to 512MB and use GTT.
Author
Owner

@rick-github commented on GitHub (Aug 27, 2025):

The better fix would be for ollama to add full support for the chipset

ollama does support gfx1151. I've configured 96G of VRAM on my evo-x2 and it works fine. Admittedly not as much as 108G GTT, so if you are after max usage, then the support falls short.

<!-- gh-comment-id:3226322830 --> @rick-github commented on GitHub (Aug 27, 2025): > The better fix would be for ollama to add full support for the chipset ollama does support gfx1151. I've configured 96G of VRAM on my evo-x2 and it works fine. Admittedly not as much as 108G GTT, so if you are after max usage, then the support falls short.
Author
Owner

@jaredduggan commented on GitHub (Aug 27, 2025):

setting your VRAM in bios to 96GB caps your VRAM at that and defeats the purpose of having 128GB of unified memory. Proper support would allow utilization of as much VRAM as needed minus system ram requirements. The current workaround on Linux is to set VRAM very low (512MB), manually set GTT to 108-120ish GB. With these settings I have LMStudio recognizing 120GB of VRAM with ROCm and 86GB with Vulkan. Still not running well though so I am here because I am also trying to optimize ollama.

<!-- gh-comment-id:3226407690 --> @jaredduggan commented on GitHub (Aug 27, 2025): setting your VRAM in bios to 96GB caps your VRAM at that and defeats the purpose of having 128GB of unified memory. Proper support would allow utilization of as much VRAM as needed minus system ram requirements. The current workaround on Linux is to set VRAM very low (512MB), manually set GTT to 108-120ish GB. With these settings I have LMStudio recognizing 120GB of VRAM with ROCm and 86GB with Vulkan. Still not running well though so I am here because I am also trying to optimize ollama.
Author
Owner

@MatthK commented on GitHub (Aug 30, 2025):

Any update on this?

<!-- gh-comment-id:3238887707 --> @MatthK commented on GitHub (Aug 30, 2025): Any update on this?
Author
Owner

@Jeepmb commented on GitHub (Aug 30, 2025):

Just encountered this issue myself, exact same behavior observed.

<!-- gh-comment-id:3239378976 --> @Jeepmb commented on GitHub (Aug 30, 2025): Just encountered this issue myself, exact same behavior observed.
Author
Owner

@philipp-paland commented on GitHub (Sep 30, 2025):

I can confirm that llama looks to be using the BIOS configured VRAM pretty much ok:

time=2025-09-30T11:00:33.985+02:00 level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2025-09-30T11:00:33.985+02:00 level=DEBUG source=amd_linux.go:203 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=5510 unique_id=0
time=2025-09-30T11:00:33.985+02:00 level=DEBUG source=amd_linux.go:237 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card1/device
time=2025-09-30T11:00:33.985+02:00 level=DEBUG source=amd_linux.go:343 msg="amdgpu memory" gpu=0 total="96.0 GiB"
time=2025-09-30T11:00:33.985+02:00 level=DEBUG source=amd_linux.go:344 msg="amdgpu memory" gpu=0 available="95.5 GiB"
time=2025-09-30T11:00:33.985+02:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/local/lib/ollama/rocm"
time=2025-09-30T11:00:33.985+02:00 level=DEBUG source=amd_common.go:44 msg="detected ROCM next to ollama executable /usr/local/lib/ollama/rocm"
time=2025-09-30T11:00:33.987+02:00 level=DEBUG source=amd_linux.go:375 msg="rocm supported GPUs" types="[gfx1010 gfx1012 gfx1030 gfx1100 gfx1101 gfx1102 gfx1151 gfx1200 gfx1201 gfx900 gfx906 gfx908 gfx90a gfx942]"
time=2025-09-30T11:00:33.988+02:00 level=INFO source=amd_linux.go:390 msg="amdgpu is supported" gpu=0 gpu_type=gfx1151
time=2025-09-30T11:00:33.988+02:00 level=INFO source=types.go:131 msg="inference compute" id=0 library=rocm variant="" compute=gfx1151 driver=0.0 name=1002:1586 total="96.0 GiB" available="95.5 GiB"

gpt-oss:20b was offloading all layers to the GPU:

time=2025-09-30T11:11:34.884+02:00 level=INFO source=ggml.go:487 msg="offloading 24 repeating layers to GPU"
time=2025-09-30T11:11:34.884+02:00 level=INFO source=ggml.go:493 msg="offloading output layer to GPU"
time=2025-09-30T11:11:34.884+02:00 level=INFO source=ggml.go:498 msg="offloaded 25/25 layers to GPU"

While gpt-oss:120b didn't run with this configuration because it didn't see enough system memory

% ollama run gpt-oss:120b  "give me the longest lorem ipsum you can"
Error: 500 Internal Server Error: model requires more system memory (60.9 GiB) than is available (36.8 GiB)

It would be wonderful if ollama could use the GTT approach instead of relying on the BIOS VRAM setting.

<!-- gh-comment-id:3350873090 --> @philipp-paland commented on GitHub (Sep 30, 2025): I can confirm that llama looks to be using the BIOS configured VRAM pretty much ok: ``` time=2025-09-30T11:00:33.985+02:00 level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2025-09-30T11:00:33.985+02:00 level=DEBUG source=amd_linux.go:203 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=5510 unique_id=0 time=2025-09-30T11:00:33.985+02:00 level=DEBUG source=amd_linux.go:237 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card1/device time=2025-09-30T11:00:33.985+02:00 level=DEBUG source=amd_linux.go:343 msg="amdgpu memory" gpu=0 total="96.0 GiB" time=2025-09-30T11:00:33.985+02:00 level=DEBUG source=amd_linux.go:344 msg="amdgpu memory" gpu=0 available="95.5 GiB" time=2025-09-30T11:00:33.985+02:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/local/lib/ollama/rocm" time=2025-09-30T11:00:33.985+02:00 level=DEBUG source=amd_common.go:44 msg="detected ROCM next to ollama executable /usr/local/lib/ollama/rocm" time=2025-09-30T11:00:33.987+02:00 level=DEBUG source=amd_linux.go:375 msg="rocm supported GPUs" types="[gfx1010 gfx1012 gfx1030 gfx1100 gfx1101 gfx1102 gfx1151 gfx1200 gfx1201 gfx900 gfx906 gfx908 gfx90a gfx942]" time=2025-09-30T11:00:33.988+02:00 level=INFO source=amd_linux.go:390 msg="amdgpu is supported" gpu=0 gpu_type=gfx1151 time=2025-09-30T11:00:33.988+02:00 level=INFO source=types.go:131 msg="inference compute" id=0 library=rocm variant="" compute=gfx1151 driver=0.0 name=1002:1586 total="96.0 GiB" available="95.5 GiB" ``` gpt-oss:20b was offloading all layers to the GPU: ``` time=2025-09-30T11:11:34.884+02:00 level=INFO source=ggml.go:487 msg="offloading 24 repeating layers to GPU" time=2025-09-30T11:11:34.884+02:00 level=INFO source=ggml.go:493 msg="offloading output layer to GPU" time=2025-09-30T11:11:34.884+02:00 level=INFO source=ggml.go:498 msg="offloaded 25/25 layers to GPU" ``` While gpt-oss:120b didn't run with this configuration because it didn't see enough system memory ``` % ollama run gpt-oss:120b "give me the longest lorem ipsum you can" Error: 500 Internal Server Error: model requires more system memory (60.9 GiB) than is available (36.8 GiB) ``` It would be wonderful if ollama could use the GTT approach instead of relying on the BIOS VRAM setting.
Author
Owner

@namecaps3k commented on GitHub (Dec 5, 2025):

Same issue here. My VRAM in Bios is set to 1GB (lowest possible on minisforum) and rest is GTT but Ollama is doing all inference on CPU because of that which is painfully slow.

<!-- gh-comment-id:3617334019 --> @namecaps3k commented on GitHub (Dec 5, 2025): Same issue here. My VRAM in Bios is set to 1GB (lowest possible on minisforum) and rest is GTT but Ollama is doing all inference on CPU because of that which is painfully slow.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#54523