[PR #6282] [CLOSED] AMD integrated graphic on linux kernel 6.9.9+, GTT memory, loading freeze fix #12071

Closed
opened 2026-04-12 23:48:36 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/6282
Author: @MaciejMogilany
Created: 8/9/2024
Status: Closed

Base: mainHead: AMD_APU_GTT_memory


📝 Commits (4)

  • bdd7e2f rewrite for ollama 4.0
  • 9e82f04 Merge branch 'ollama:main' into AMD_APU_GTT_memory
  • 8170b03 Merge branch 'ollama:main' into AMD_APU_GTT_memory
  • 7ed0c4d Merge branch 'ollama:main' into AMD_APU_GTT_memory

📊 Changes

4 files changed (+115 additions, -10 deletions)

View changed files

📝 discover/amd_linux.go (+95 -6)
📝 discover/gpu_linux.go (+14 -0)
📝 discover/types.go (+4 -3)
📝 llm/server.go (+2 -1)

📄 Description

This commit reflects changes in linux kernel 6.9.9+ on small APU. LLM load to GTT memory, which is set to 1/2 of RAM by default and can be changed. This allows to use bigger models with AMD APU without VRAM carveout and load models bigger than max VRAM carveout of 16GiB. No hacks like torch-apu-helper, force-host-alloction-APU, Rusticl, unlock VRAM allocation are needed

APU-s this applied to
"gfx1103" //890m, 780m, 760m, 740m GPU RDNA3
"gfx1037" //610M GPU RDNA2
"gfx1035" //680m, 660m GPU RDNA2
"gfx1033" //Van Gogh RDNA2
"gfx1036" //RDNA2 APU
"gfx1151" //RDNA3+ APU
"gfx1152" //RDNA3+ APU
"gfx940" //MI300A CDNA3
"gfx90c" //Radeon Vega 7 Ryzen 5600G

commit also address problems with ollama and APU on kernel 6.9.9+:

Changes are applied if only one graphic is preset, and it's from the list above (no discrete graphic card added) and kernel above 6.9.9 is used on the system. Existing discrete graphic functionality is not changed.

Note:
APU are not officially supported. Can be enabled

for 680m by:
export HSA_OVERRIDE_GFX_VERSION=10.3.0
export OLLAMA_MAX_LOADED_MODELS=1
export OLLAMA_NUM_PARALLEL=1

for 780m by:
export HSA_OVERRIDE_GFX_VERSION=11.0.1
export OLLAMA_MAX_LOADED_MODELS=1
export OLLAMA_NUM_PARALLEL=1

To mitigate gpu hang on unsusported rocm gpu use OLLAMA_MAX_LOADED_MODELS=1, OLLAMA_NUM_PARALLEL=1

Memory available to APU can be adjusted by editing /etc/modprobe.d/ttm.conf (in number of 4k pages , for 48Go it will be):

options ttm pages_limit=12582912
options ttm page_pool_size=12582912

more info

fix issues https://github.com/ollama/ollama/issues/6362#issue-2466206599 https://github.com/ollama/ollama/issues/6572#issue-2498127452

Partially fix (ollama part) https://github.com/ollama/ollama/issues/2637#issue-2146959786

To test this:

git clone https://github.com/Maciej-Mogilany/ollama.git
cd ollama
git checkout AMD_APU_GTT_memory
make -j 5
export HSA_OVERRIDE_GFX_VERSION=11.0.1 // for 780m
sudo systemctl stop ollama // stop original ollama for now
./ollama serve
in another terminal
./ollama run model name
if all work, you many replace original ollama bin file with generated form source and add HSA_OVERRIDE_GFX_VERSION=11.0.1 to ollama service for convenience
sudo systemctl start ollama // start original ollama

If you have any problem, please ask sonet 3.5 about it. This way, you will be able to solve 95% of problems.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/6282 **Author:** [@MaciejMogilany](https://github.com/MaciejMogilany) **Created:** 8/9/2024 **Status:** ❌ Closed **Base:** `main` ← **Head:** `AMD_APU_GTT_memory` --- ### 📝 Commits (4) - [`bdd7e2f`](https://github.com/ollama/ollama/commit/bdd7e2f91adcf485a505b929f0d6a5d66a93e2e1) rewrite for ollama 4.0 - [`9e82f04`](https://github.com/ollama/ollama/commit/9e82f04d4b9a95949420f313045c671b20ab23b6) Merge branch 'ollama:main' into AMD_APU_GTT_memory - [`8170b03`](https://github.com/ollama/ollama/commit/8170b0345de6aa78c679c9eae303fbf745f5bd7f) Merge branch 'ollama:main' into AMD_APU_GTT_memory - [`7ed0c4d`](https://github.com/ollama/ollama/commit/7ed0c4d9cce7dd4d729ef724fcd9839259c57809) Merge branch 'ollama:main' into AMD_APU_GTT_memory ### 📊 Changes **4 files changed** (+115 additions, -10 deletions) <details> <summary>View changed files</summary> 📝 `discover/amd_linux.go` (+95 -6) 📝 `discover/gpu_linux.go` (+14 -0) 📝 `discover/types.go` (+4 -3) 📝 `llm/server.go` (+2 -1) </details> ### 📄 Description This commit reflects changes in linux kernel 6.9.9+ on small APU. LLM load to GTT memory, which is set to 1/2 of RAM by default and can be changed. This allows to use bigger models with AMD APU without VRAM carveout and load models bigger than max VRAM carveout of 16GiB. No hacks like [torch-apu-helper](https://github.com/pomoke/torch-apu-helper), [force-host-alloction-APU](https://github.com/segurac/force-host-alloction-APU), [Rusticl](https://docs.mesa3d.org/rusticl.html), [unlock VRAM allocation](https://winstonhyypia.medium.com/amd-apu-how-to-modify-the-dedicated-gpu-memory-e27b75905056) are needed APU-s this applied to "gfx1103" //890m, 780m, 760m, 740m GPU RDNA3 "gfx1037" //610M GPU RDNA2 "gfx1035" //680m, 660m GPU RDNA2 "gfx1033" //Van Gogh RDNA2 "gfx1036" //RDNA2 APU "gfx1151" //RDNA3+ APU "gfx1152" //RDNA3+ APU "gfx940" //MI300A CDNA3 "gfx90c" //Radeon Vega 7 Ryzen 5600G commit also address problems with ollama and APU on kernel 6.9.9+: - ollama server hang due to memory management issues - [CPU tensor buffer causing OOM in linux](https://github.com/ollama/ollama/issues/2637#issuecomment-2306976825) Changes are applied if only one graphic is preset, and it's from the list above (no discrete graphic card added) and kernel above 6.9.9 is used on the system. Existing discrete graphic functionality is not changed. Note: APU are not officially supported. Can be enabled for 680m by: export HSA_OVERRIDE_GFX_VERSION=10.3.0 export OLLAMA_MAX_LOADED_MODELS=1 export OLLAMA_NUM_PARALLEL=1 for 780m by: export HSA_OVERRIDE_GFX_VERSION=11.0.1 export OLLAMA_MAX_LOADED_MODELS=1 export OLLAMA_NUM_PARALLEL=1 To mitigate gpu hang on unsusported rocm gpu use OLLAMA_MAX_LOADED_MODELS=1, OLLAMA_NUM_PARALLEL=1 Memory available to APU can be adjusted by editing /etc/modprobe.d/ttm.conf (in number of 4k pages , for 48Go it will be): options ttm pages_limit=12582912 options ttm page_pool_size=12582912 [more info](https://github.com/ollama/ollama/issues/2637#issuecomment-2272913656) fix issues https://github.com/ollama/ollama/issues/6362#issue-2466206599 https://github.com/ollama/ollama/issues/6572#issue-2498127452 Partially fix (ollama part) https://github.com/ollama/ollama/issues/2637#issue-2146959786 To test this: ``` git clone https://github.com/Maciej-Mogilany/ollama.git cd ollama git checkout AMD_APU_GTT_memory make -j 5 export HSA_OVERRIDE_GFX_VERSION=11.0.1 // for 780m sudo systemctl stop ollama // stop original ollama for now ./ollama serve in another terminal ./ollama run model name if all work, you many replace original ollama bin file with generated form source and add HSA_OVERRIDE_GFX_VERSION=11.0.1 to ollama service for convenience sudo systemctl start ollama // start original ollama ``` If you have any problem, please ask sonet 3.5 about it. This way, you will be able to solve 95% of problems. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-12 23:48:36 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#12071