[GH-ISSUE #6362] Honor/use amdgpu.gttsize Kernel parameter to use all unified memory for AMD APU #29754

Closed
opened 2026-04-22 08:56:45 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @mymistake on GitHub (Aug 14, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6362

Hi,

I do have a feature request.

System

  • Ollama v0.3.6
  • Fedora 40 (Kernel 6.10.3)
  • AMD APU Ryzen 7840u / 780m
  • 64 GB RAM
  • ROCM 6.1.1

Info
Kernel 6.10.3 supports setting amdgpu.gttsize to values like 32768 which I think works around the issue many have with an AMD APU. They can only set a few values for GPU memory in their BIOS mostly not more than 8 GB. Since the memory is always the same pool, it would be nice if Ollama offloaded compute to the GPU in such cases as well.

Example:
Bios GPU memory allocation set to auto
amdgpu.gttsize=32768

--> ollama run gemma2:27b "Tell me a joke"

This runs at 100% on the CPU as the model is 16GB

On the other hand, running (./llama-bench) an even bigger model directly with llama.cpp on the GPU works:

| model | size | params | backend | ngl | test | t/s |
| llama 8x7B Q4_K - Medium | 48.25 GiB | 91.80 B | ROCm | 99 | pp512 | 81.32 ± 0.36 |

Conclusion
Please support the unified memory. I do know that the speed will be in numerous instances not good, but it beats pure CPU compute in performance and thermal output. I do this as a hobby, so tinkering is part of the game. If I recall correctly before using all memory + GPU did only work with the llama.cpp Vulcan backend but as they now support it for the GPU as well (at least that is my perception) would be nice to have it in Ollama as well. Please also correct me if I am wrong - maybe this is already possible, then please point me in the right direction.

Originally created by @mymistake on GitHub (Aug 14, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6362 Hi, I do have a feature request. **System** - Ollama v0.3.6 - Fedora 40 (Kernel 6.10.3) - AMD APU Ryzen 7840u / 780m - 64 GB RAM - ROCM 6.1.1 **Info** Kernel 6.10.3 supports setting amdgpu.gttsize to values like 32768 which I think works around the issue many have with an AMD APU. They can only set a few values for GPU memory in their BIOS mostly not more than 8 GB. Since the memory is always the same pool, it would be nice if Ollama offloaded compute to the GPU in such cases as well. **Example:** Bios GPU memory allocation set to auto amdgpu.gttsize=32768 --> ollama run gemma2:27b "Tell me a joke" This runs at 100% on the CPU as the model is 16GB On the other hand, running (./llama-bench) an even bigger model directly with llama.cpp on the GPU works: > | model | size | params | backend | ngl | test | t/s | > | llama 8x7B Q4_K - Medium | 48.25 GiB | 91.80 B | ROCm | 99 | pp512 | 81.32 ± 0.36 | **Conclusion** Please support the unified memory. I do know that the speed will be in numerous instances not good, but it beats pure CPU compute in performance and thermal output. I do this as a hobby, so tinkering is part of the game. If I recall correctly before using all memory + GPU did only work with the llama.cpp Vulcan backend but as they now support it for the GPU as well (at least that is my perception) would be nice to have it in Ollama as well. Please also correct me if I am wrong - maybe this is already possible, then please point me in the right direction.
GiteaMirror added the feature request label 2026-04-22 08:56:45 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#29754