[GH-ISSUE #15302] Vulkan runtime, allow more than 64GB VRAM with splitted memory heaps. #9790

Open
opened 2026-04-12 22:40:13 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @rjmalagon on GitHub (Apr 3, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15302

On Linux AMDGPU, when assigned more than 64 GB GTT memory, Vulkan split the memory heap. Ollama counts the assigned GTT memory pool, but gets stuck when using above the larger splitted memory heap.

In my current setup, a GFX900-level AMD APU, with 96GB of main RAM, 82GB assigned (amdgpu.gttsize=82000)

These are the reported memory heaps by vulkaninfo

=================================
memoryHeaps: count = 2
        memoryHeaps[0]:
                size   = 29376905216 (0x6d7000000) (27.36 GiB)
                budget = 29367087104 (0x6d66a3000) (27.35 GiB)
                usage  = 0 (0x00000000) (0.00 B)
                flags:
                        None
        memoryHeaps[1]:
                size   = 58753810432 (0xdae000000) (54.72 GiB)
                budget = 58734174208 (0xdacd46000) (54.70 GiB)
                usage  = 0 (0x00000000) (0.00 B)
                flags: count = 1
                        MEMORY_HEAP_DEVICE_LOCAL_BIT

This is what is detected by Ollama Vulkan runtime

time=2026-04-03T19:23:01.730Z level=INFO source=types.go:42 msg="inference compute" id=00000000-0900-0000-0000-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan1 description="AMD Radeon Vega 8 Graphics (RADV RAVEN)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:09:00.0 type=iGPU total="82.1 GiB" available="82.1 GiB"

When filling the larger heap (above 54.7 GB in this example, Qwen3.5 27B bf16) Ollama does not crash, just gets indefinitely stuck loading the model.

time=2026-04-03T19:05:34.815Z level=INFO source=device.go:240 msg="model weights" device=Vulkan1 size="49.3 GiB"
time=2026-04-03T19:05:34.815Z level=INFO source=device.go:245 msg="model weights" device=CPU size="2.5 GiB"
time=2026-04-03T19:05:34.815Z level=INFO source=device.go:251 msg="kv cache" device=Vulkan1 size="4.7 GiB"
time=2026-04-03T19:05:34.832Z level=INFO source=device.go:262 msg="compute graph" device=Vulkan1 size="789.3 MiB"
time=2026-04-03T19:05:34.832Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="39.6 MiB"
time=2026-04-03T19:05:34.833Z level=INFO source=device.go:272 msg="total memory" size="57.2 GiB"
Originally created by @rjmalagon on GitHub (Apr 3, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15302 On Linux AMDGPU, when assigned more than 64 GB GTT memory, Vulkan split the memory heap. Ollama counts the assigned GTT memory pool, but gets stuck when using above the larger splitted memory heap. In my current setup, a GFX900-level AMD APU, with 96GB of main RAM, 82GB assigned (amdgpu.gttsize=82000) These are the reported memory heaps by `vulkaninfo` ```VkPhysicalDeviceMemoryProperties: ================================= memoryHeaps: count = 2 memoryHeaps[0]: size = 29376905216 (0x6d7000000) (27.36 GiB) budget = 29367087104 (0x6d66a3000) (27.35 GiB) usage = 0 (0x00000000) (0.00 B) flags: None memoryHeaps[1]: size = 58753810432 (0xdae000000) (54.72 GiB) budget = 58734174208 (0xdacd46000) (54.70 GiB) usage = 0 (0x00000000) (0.00 B) flags: count = 1 MEMORY_HEAP_DEVICE_LOCAL_BIT ``` This is what is detected by Ollama Vulkan runtime ``` time=2026-04-03T19:23:01.730Z level=INFO source=types.go:42 msg="inference compute" id=00000000-0900-0000-0000-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan1 description="AMD Radeon Vega 8 Graphics (RADV RAVEN)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:09:00.0 type=iGPU total="82.1 GiB" available="82.1 GiB" ``` When filling the larger heap (above 54.7 GB in this example, Qwen3.5 27B bf16) Ollama does not crash, just gets indefinitely stuck loading the model. ``` time=2026-04-03T19:05:34.815Z level=INFO source=device.go:240 msg="model weights" device=Vulkan1 size="49.3 GiB" time=2026-04-03T19:05:34.815Z level=INFO source=device.go:245 msg="model weights" device=CPU size="2.5 GiB" time=2026-04-03T19:05:34.815Z level=INFO source=device.go:251 msg="kv cache" device=Vulkan1 size="4.7 GiB" time=2026-04-03T19:05:34.832Z level=INFO source=device.go:262 msg="compute graph" device=Vulkan1 size="789.3 MiB" time=2026-04-03T19:05:34.832Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="39.6 MiB" time=2026-04-03T19:05:34.833Z level=INFO source=device.go:272 msg="total memory" size="57.2 GiB" ```
GiteaMirror added the feature request label 2026-04-12 22:40:13 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9790