[GH-ISSUE #13173] New AMD memory detection routines ignores unified memory on AMD APU #34471

Closed
opened 2026-04-22 18:04:46 -05:00 by GiteaMirror · 17 comments
Owner

Originally created by @rjmalagon on GitHub (Nov 20, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13173

What is the issue?

After https://github.com/ollama/ollama/pull/12871 , Ollama counts strict VRAM on AMD APUs even when the unified RAM is used by ROCM and Vulkan runtimes. Tested on Radenon 680M (gfx1030) on ROCM 6.4.4

Relevant log output

before patches
´´´
time=2025-11-20T15:56:06.961Z level=DEBUG source=runner.go:175 msg="adjusting filtering IDs" FilterID=0 new_ID=0
time=2025-11-20T15:56:06.961Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=6.603711351s
time=2025-11-20T15:56:06.961Z level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1030 name=ROCm0 description="AMD Radeon 680M" libdirs=ollama,rocm_v6 driver=60443.48 pci_id=0000:e7:00.0 type=iGPU total="96.0 GiB" available="90.6 GiB"
´´´
after patches
´´´
time=2025-11-20T15:45:28.686Z level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1030 name=ROCm0 description="AMD Radeon 680M" libdirs=ollama,rocm_v6 driver=60443.48 pci_id=0000:e7:00.0 type=iGPU total="512.0 MiB" available="496.1 MiB"
time=2025-11-20T15:45:28.686Z level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="512.0 MiB" threshold="20.0 GiB"
´´´

OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @rjmalagon on GitHub (Nov 20, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13173 ### What is the issue? After https://github.com/ollama/ollama/pull/12871 , Ollama counts strict VRAM on AMD APUs even when the unified RAM is used by ROCM and Vulkan runtimes. Tested on Radenon 680M (gfx1030) on ROCM 6.4.4 ### Relevant log output ```shell before patches ´´´ time=2025-11-20T15:56:06.961Z level=DEBUG source=runner.go:175 msg="adjusting filtering IDs" FilterID=0 new_ID=0 time=2025-11-20T15:56:06.961Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=6.603711351s time=2025-11-20T15:56:06.961Z level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1030 name=ROCm0 description="AMD Radeon 680M" libdirs=ollama,rocm_v6 driver=60443.48 pci_id=0000:e7:00.0 type=iGPU total="96.0 GiB" available="90.6 GiB" ´´´ after patches ´´´ time=2025-11-20T15:45:28.686Z level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1030 name=ROCm0 description="AMD Radeon 680M" libdirs=ollama,rocm_v6 driver=60443.48 pci_id=0000:e7:00.0 type=iGPU total="512.0 MiB" available="496.1 MiB" time=2025-11-20T15:45:28.686Z level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="512.0 MiB" threshold="20.0 GiB" ´´´ ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the bugamd labels 2026-04-22 18:04:47 -05:00
Author
Owner

@phueper commented on GitHub (Nov 20, 2025):

i confirm , also happens on my AMD RYZEN AI MAX+ 395 w/ Radeon 8060S. Only the actual VRAM is detected, not the shared RAM. forcing ollana to use vulkan (OLLAMA_LLM_LIBRARY=vulkan) "heals" it in so far as the GPU is used again for models, but not using ROCm obviously ... i bisected this to 2f36d769aa ...i'll try to come up with a PR once i figured out how the new logic detects the VRAM

<!-- gh-comment-id:3559210915 --> @phueper commented on GitHub (Nov 20, 2025): i confirm , also happens on my AMD RYZEN AI MAX+ 395 w/ Radeon 8060S. Only the actual VRAM is detected, not the shared RAM. forcing ollana to use vulkan (`OLLAMA_LLM_LIBRARY=vulkan`) "heals" it in so far as the GPU is used again for models, but not using ROCm obviously ... i bisected this to 2f36d769aa2db6e7bb41a0dbd079f9ce7a9bdc40 ...i'll try to come up with a PR once i figured out how the new logic detects the VRAM
Author
Owner

@rick-github commented on GitHub (Nov 20, 2025):

It may not be ollama, on my 8060S I get 96GB. The difference is the driver, compared to rjmalagon's log line.

ollama  | time=2025-11-20T16:52:52.907Z level=INFO source=types.go:42 msg="inference compute"
 id=0 filter_id=0 library=ROCm compute=gfx1151 name=ROCm0
 description="AMD Radeon Graphics" libdirs=ollama,rocm driver=60342.13 pci_id=0000:c6:00.0 type=iGPU
 total="96.0 GiB" available="95.8 GiB"

If I load the Vulkan driver, I get 111.5GB.

ollama  | time=2025-11-20T16:54:38.440Z level=INFO source=types.go:42 msg="inference compute"
 id=00000000-c600-0000-0000-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0
 description="AMD Radeon 8060S (RADV GFX1151)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:c6:00.0 type=iGPU
 total="111.5 GiB" available="111.3 GiB"
<!-- gh-comment-id:3559298287 --> @rick-github commented on GitHub (Nov 20, 2025): It may not be ollama, on my 8060S I get 96GB. The difference is the driver, compared to rjmalagon's log line. ``` ollama | time=2025-11-20T16:52:52.907Z level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1151 name=ROCm0 description="AMD Radeon Graphics" libdirs=ollama,rocm driver=60342.13 pci_id=0000:c6:00.0 type=iGPU total="96.0 GiB" available="95.8 GiB" ``` If I load the Vulkan driver, I get 111.5GB. ``` ollama | time=2025-11-20T16:54:38.440Z level=INFO source=types.go:42 msg="inference compute" id=00000000-c600-0000-0000-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 description="AMD Radeon 8060S (RADV GFX1151)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:c6:00.0 type=iGPU total="111.5 GiB" available="111.3 GiB" ```
Author
Owner

@phueper commented on GitHub (Nov 20, 2025):

@dhiltgen this change b90cf90fb1

fixes this issue for me

  • read and calculate GTT memory as well as VRAM memory as available memory

so it now shows:

time=2025-11-20T22:10:30.804Z level=INFO source=types.go:42 msg="inference compute" PID=1 id=0 filter_id=0 library=ROCm compute=gfx1151 name=ROCm0 description="Radeon 8060S Graphics" libdirs=ollama,rocm_v7 driver=70125.42 pci_id=0000:c2:00.0 type=iGPU total="61.9 GiB" avail
able="61.8 GiB"

which matches the Vulkan output

i am not quite sure how to get this into the llama/patches files ? Is that sth that would be needed for a PR or would those be generated separately somehow ?

<!-- gh-comment-id:3560308072 --> @phueper commented on GitHub (Nov 20, 2025): @dhiltgen this change https://github.com/phueper/ollama-linux-amd-apu/commit/b90cf90fb15796abeda61aab2f83379f4e0d26be fixes this issue for me - read and calculate GTT memory as well as VRAM memory as available memory so it now shows: > time=2025-11-20T22:10:30.804Z level=INFO source=types.go:42 msg="inference compute" PID=1 id=0 filter_id=0 library=ROCm compute=gfx1151 name=ROCm0 description="Radeon 8060S Graphics" libdirs=ollama,rocm_v7 driver=70125.42 pci_id=0000:c2:00.0 type=iGPU total="61.9 GiB" avail able="61.8 GiB" which matches the Vulkan output i am not quite sure how to get this into the llama/patches files ? Is that sth that would be needed for a PR or would those be generated separately somehow ?
Author
Owner

@ndrewpj commented on GitHub (Nov 23, 2025):

Confirmed on Ubuntu 24.04.3 ROCm 7.1 Strix halo 128GB

time=2025-11-23T09:55:01.778Z level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:8192 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE:q8_0 OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:2 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-11-23T09:55:01.781Z level=INFO source=images.go:522 msg="total blobs: 27"
time=2025-11-23T09:55:01.782Z level=INFO source=images.go:529 msg="total unused blobs removed: 0"
time=2025-11-23T09:55:01.782Z level=INFO source=routes.go:1597 msg="Listening on [::]:11434 (version 0.13.0)"
time=2025-11-23T09:55:01.782Z level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2025-11-23T09:55:01.783Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38173"
time=2025-11-23T09:55:02.994Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 41287"
time=2025-11-23T09:55:03.758Z level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1151 name=ROCm0 description="AMD Radeon Graphics" libdirs=ollama,rocm driver=60342.13 pci_id=0000:c5:00.0 type=iGPU total="512.0 MiB" available="34.1 MiB"
time=2025-11-23T09:55:03.758Z level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="512.0 MiB" threshold="20.0 GiB"
[GIN] 2025/11/23 - 09:55:05 | 200 |      49.961µs |     172.19.0.36 | HEAD     "/"
[GIN] 2025/11/23 - 09:55:06 | 200 |  1.162342676s |     172.19.0.36 | POST     "/api/pull"
[GIN] 2025/11/23 - 09:55:06 | 200 |      29.307µs |     172.19.0.36 | HEAD     "/"
[GIN] 2025/11/23 - 09:55:06 | 200 |  572.265752ms |     172.19.0.36 | POST     "/api/pull"
<!-- gh-comment-id:3567734470 --> @ndrewpj commented on GitHub (Nov 23, 2025): Confirmed on Ubuntu 24.04.3 ROCm 7.1 Strix halo 128GB ``` time=2025-11-23T09:55:01.778Z level=INFO source=routes.go:1544 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:8192 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE:q8_0 OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:2 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-11-23T09:55:01.781Z level=INFO source=images.go:522 msg="total blobs: 27" time=2025-11-23T09:55:01.782Z level=INFO source=images.go:529 msg="total unused blobs removed: 0" time=2025-11-23T09:55:01.782Z level=INFO source=routes.go:1597 msg="Listening on [::]:11434 (version 0.13.0)" time=2025-11-23T09:55:01.782Z level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2025-11-23T09:55:01.783Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38173" time=2025-11-23T09:55:02.994Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 41287" time=2025-11-23T09:55:03.758Z level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1151 name=ROCm0 description="AMD Radeon Graphics" libdirs=ollama,rocm driver=60342.13 pci_id=0000:c5:00.0 type=iGPU total="512.0 MiB" available="34.1 MiB" time=2025-11-23T09:55:03.758Z level=INFO source=routes.go:1638 msg="entering low vram mode" "total vram"="512.0 MiB" threshold="20.0 GiB" [GIN] 2025/11/23 - 09:55:05 | 200 | 49.961µs | 172.19.0.36 | HEAD "/" [GIN] 2025/11/23 - 09:55:06 | 200 | 1.162342676s | 172.19.0.36 | POST "/api/pull" [GIN] 2025/11/23 - 09:55:06 | 200 | 29.307µs | 172.19.0.36 | HEAD "/" [GIN] 2025/11/23 - 09:55:06 | 200 | 572.265752ms | 172.19.0.36 | POST "/api/pull" ```
Author
Owner

@normen commented on GitHub (Nov 24, 2025):

Same here on CachyOS / Arch Linux. GTT memory is not used with the ROCm backend, vulkan works

<!-- gh-comment-id:3570954011 --> @normen commented on GitHub (Nov 24, 2025): Same here on CachyOS / Arch Linux. GTT memory is not used with the ROCm backend, vulkan works
Author
Owner

@Bottlecap202 commented on GitHub (Nov 24, 2025):

Developed based on windows 11 newest dev build. I had wsl2/Ubuntu.

On Mon, Nov 24, 2025, 8:08 AM Normen Hansen @.***>
wrote:

normen left a comment (ollama/ollama#13173)
https://github.com/ollama/ollama/issues/13173#issuecomment-3570954011

Same here on CachyOS / Arch Linux. GTT memory is not used.


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/13173#issuecomment-3570954011,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/BDHQPUKVEHDUG73M5NMVYGD36MGMBAVCNFSM6AAAAACMWXOAHOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKNZQHE2TIMBRGE
.
You are receiving this because you are subscribed to this thread.Message
ID: @.***>

<!-- gh-comment-id:3571030429 --> @Bottlecap202 commented on GitHub (Nov 24, 2025): Developed based on windows 11 newest dev build. I had wsl2/Ubuntu. On Mon, Nov 24, 2025, 8:08 AM Normen Hansen ***@***.***> wrote: > *normen* left a comment (ollama/ollama#13173) > <https://github.com/ollama/ollama/issues/13173#issuecomment-3570954011> > > Same here on CachyOS / Arch Linux. GTT memory is not used. > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/13173#issuecomment-3570954011>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/BDHQPUKVEHDUG73M5NMVYGD36MGMBAVCNFSM6AAAAACMWXOAHOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKNZQHE2TIMBRGE> > . > You are receiving this because you are subscribed to this thread.Message > ID: ***@***.***> >
Author
Owner

@moontato commented on GitHub (Nov 24, 2025):

I had this issue as well. Reverted back to 0.12.11 for the time being--a version that doesn't ignore unified memory.

Ubuntu Server 24 (forget which version exactly)
Ryzen 9 6900HX
AMD RX6600M (DISABLED)
AMD Radeon 680M

<!-- gh-comment-id:3571740884 --> @moontato commented on GitHub (Nov 24, 2025): I had this issue as well. Reverted back to 0.12.11 for the time being--a version that doesn't ignore unified memory. Ubuntu Server 24 (forget which version exactly) Ryzen 9 6900HX AMD RX6600M (DISABLED) AMD Radeon 680M
Author
Owner

@ndrewpj commented on GitHub (Nov 24, 2025):

everted back to 0.12.11

that helped me too. Will wait for the latest ollama to be fixed.

<!-- gh-comment-id:3572204128 --> @ndrewpj commented on GitHub (Nov 24, 2025): > everted back to 0.12.11 that helped me too. Will wait for the latest ollama to be fixed.
Author
Owner

@Ricky1975 commented on GitHub (Dec 2, 2025):

Reverted back to 0.12.11

same here. Ryzen HX370 wit Radeon 890M and Ryzen 8945HS with Radeon 780M.
ROCm 7.1.1 on Debian 12

<!-- gh-comment-id:3604205101 --> @Ricky1975 commented on GitHub (Dec 2, 2025): > Reverted back to 0.12.11 same here. Ryzen HX370 wit Radeon 890M and Ryzen 8945HS with Radeon 780M. ROCm 7.1.1 on Debian 12
Author
Owner

@namecaps3k commented on GitHub (Dec 5, 2025):

same, both vulvan and rocm.

<!-- gh-comment-id:3617411404 --> @namecaps3k commented on GitHub (Dec 5, 2025): same, both vulvan and rocm.
Author
Owner

@namecaps3k commented on GitHub (Dec 5, 2025):

It may not be ollama, on my 8060S I get 96GB. The difference is the driver, compared to rjmalagon's log line.

ollama  | time=2025-11-20T16:52:52.907Z level=INFO source=types.go:42 msg="inference compute"
 id=0 filter_id=0 library=ROCm compute=gfx1151 name=ROCm0
 description="AMD Radeon Graphics" libdirs=ollama,rocm driver=60342.13 pci_id=0000:c6:00.0 type=iGPU
 total="96.0 GiB" available="95.8 GiB"

If I load the Vulkan driver, I get 111.5GB.

ollama  | time=2025-11-20T16:54:38.440Z level=INFO source=types.go:42 msg="inference compute"
 id=00000000-c600-0000-0000-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0
 description="AMD Radeon 8060S (RADV GFX1151)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:c6:00.0 type=iGPU
 total="111.5 GiB" available="111.3 GiB"

Hey, I can see your replies in many posts regarding this problem. This is different for us. You have 96GB set, I have 1GB, others have 512MB (lowest possible). With this setting we want GPU to use all GTT but it doesn't see it and ollama sends everything to CPU instead.

<!-- gh-comment-id:3617416150 --> @namecaps3k commented on GitHub (Dec 5, 2025): > It may not be ollama, on my 8060S I get 96GB. The difference is the driver, compared to rjmalagon's log line. > > ``` > ollama | time=2025-11-20T16:52:52.907Z level=INFO source=types.go:42 msg="inference compute" > id=0 filter_id=0 library=ROCm compute=gfx1151 name=ROCm0 > description="AMD Radeon Graphics" libdirs=ollama,rocm driver=60342.13 pci_id=0000:c6:00.0 type=iGPU > total="96.0 GiB" available="95.8 GiB" > ``` > > If I load the Vulkan driver, I get 111.5GB. > > ``` > ollama | time=2025-11-20T16:54:38.440Z level=INFO source=types.go:42 msg="inference compute" > id=00000000-c600-0000-0000-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan0 > description="AMD Radeon 8060S (RADV GFX1151)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:c6:00.0 type=iGPU > total="111.5 GiB" available="111.3 GiB" > ``` Hey, I can see your replies in many posts regarding this problem. This is different for us. You have 96GB set, I have 1GB, others have 512MB (lowest possible). With this setting we want GPU to use all GTT but it doesn't see it and ollama sends everything to CPU instead.
Author
Owner

@t-paul commented on GitHub (Dec 5, 2025):

I have 1GB, others have 512MB (lowest possible). With this setting we want GPU to use all GTT but it doesn't see it and ollama sends everything to CPU instead.

This says the dynamic allocation on Linux would need some kernel parameters to work: https://github.com/kyuz0/amd-strix-halo-llm-finetuning?tab=readme-ov-file#7-kernel-parameters-tested-on-fedora-42

<!-- gh-comment-id:3617449270 --> @t-paul commented on GitHub (Dec 5, 2025): > I have 1GB, others have 512MB (lowest possible). With this setting we want GPU to use all GTT but it doesn't see it and ollama sends everything to CPU instead. This says the dynamic allocation on Linux would need some kernel parameters to work: https://github.com/kyuz0/amd-strix-halo-llm-finetuning?tab=readme-ov-file#7-kernel-parameters-tested-on-fedora-42
Author
Owner

@namecaps3k commented on GitHub (Dec 5, 2025):

I have 1GB, others have 512MB (lowest possible). With this setting we want GPU to use all GTT but it doesn't see it and ollama sends everything to CPU instead.

This says the dynamic allocation on Linux would need some kernel parameters to work: https://github.com/kyuz0/amd-strix-halo-llm-finetuning?tab=readme-ov-file#7-kernel-parameters-tested-on-fedora-42

I have it set properly. Llamacpp works fine. It's only ollama and I already found several posts on the same problem.

Issue is ollama only sees what is set to bios as VRAM and ignored GTT completely (as vram). Meaning it sends everything to CPU.

This post above is exactly on this issue :), other for example https://github.com/ollama/ollama/issues/12062

<!-- gh-comment-id:3617509081 --> @namecaps3k commented on GitHub (Dec 5, 2025): > > I have 1GB, others have 512MB (lowest possible). With this setting we want GPU to use all GTT but it doesn't see it and ollama sends everything to CPU instead. > > This says the dynamic allocation on Linux would need some kernel parameters to work: https://github.com/kyuz0/amd-strix-halo-llm-finetuning?tab=readme-ov-file#7-kernel-parameters-tested-on-fedora-42 I have it set properly. Llamacpp works fine. It's only ollama and I already found several posts on the same problem. Issue is ollama only sees what is set to bios as VRAM and ignored GTT completely (as vram). Meaning it sends everything to CPU. This post above is exactly on this issue :), other for example https://github.com/ollama/ollama/issues/12062
Author
Owner

@BarachielFallen commented on GitHub (Dec 13, 2025):

Gttsize and amdgttsize both dont work past 12.11 on the official ollama rocm build, anything past that version is bios vram detected only. I opened my own bug report on this before finding this thread, here is my report showing the same attempt to run gtt-oss:120b on 12.11 and the latest version with the logs for each:
https://github.com/ollama/ollama/issues/13419

<!-- gh-comment-id:3649471045 --> @BarachielFallen commented on GitHub (Dec 13, 2025): Gttsize and amdgttsize both dont work past 12.11 on the official ollama rocm build, anything past that version is bios vram detected only. I opened my own bug report on this before finding this thread, here is my report showing the same attempt to run gtt-oss:120b on 12.11 and the latest version with the logs for each: https://github.com/ollama/ollama/issues/13419
Author
Owner

@BarachielFallen commented on GitHub (Dec 29, 2025):

How was this bug closed? The issue is still happening in the latest builds of Ollama

<!-- gh-comment-id:3697632029 --> @BarachielFallen commented on GitHub (Dec 29, 2025): How was this bug closed? The issue is still happening in the latest builds of Ollama
Author
Owner

@moontato commented on GitHub (Dec 29, 2025):

How was this bug closed? The issue is still happening in the latest builds of Ollama

I'm wondering the same thing. Was the "fix"--and I add quotes because of the report above--buried in the full changelogs somewhere?

<!-- gh-comment-id:3697666391 --> @moontato commented on GitHub (Dec 29, 2025): > How was this bug closed? The issue is still happening in the latest builds of Ollama I'm wondering the same thing. Was the "fix"--and I add quotes because of the report above--buried in the full changelogs somewhere?
Author
Owner

@t-paul commented on GitHub (Dec 29, 2025):

@BarachielFallen @moontato

That might be a hint:

Latest release "2 weeks ago"

Closed "last week"

<!-- gh-comment-id:3697676306 --> @t-paul commented on GitHub (Dec 29, 2025): @BarachielFallen @moontato That might be a hint: > Latest release "2 weeks ago" <img width="100" src="https://github.com/user-attachments/assets/1f2de38c-e2c2-4405-9db6-0446cb9f993d" /> > Closed "last week" <img width="400" src="https://github.com/user-attachments/assets/7afb404d-8a4f-4bae-92cd-e253f39ac51a" />
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#34471