[GH-ISSUE #11199] Request for Support of AMD Ryzen AI Platform NPU #53891

Open
opened 2026-04-29 04:55:13 -05:00 by GiteaMirror · 56 comments
Owner

Originally created by @netqer on GitHub (Jun 25, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11199

Dear ollama team,

I'm writing to submit a feature request: please consider adding official support for the AMD Ryzen AI platform NPU .

Currently, AMD has provided an initial implementation in their open-source project:
🔗 https://github.com/amd/RyzenAI-SW/tree/1.2.0/example/transformers/ext/llama.cpp

However, based on community feedback, this implementation may still have room for improvement in terms of performance and usability. Integrating this functionality into the official llama.cpp repository could help improve cross-platform compatibility.

It's also worth noting that different platforms support varying quantization precisions:

STX (e.g., HX370) supports both W8A16 and W4ABF16
PHX (e.g., 7840HS), HPT (e.g., 8845HS) support W4ABF16, but NOT support W8A16
For more details, please refer to:
🔗 https://github.com/amd/RyzenAI-SW/blob/1.2.0/example/transformers/models/llm/docs/README.md

Thank you for your hard work and contributions! We look forward to better support for Ryzen AI NPUs in the future.

Originally created by @netqer on GitHub (Jun 25, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11199 Dear ollama team, I'm writing to submit a feature request: please consider adding official support for the AMD Ryzen AI platform NPU . Currently, AMD has provided an initial implementation in their open-source project: 🔗 https://github.com/amd/RyzenAI-SW/tree/1.2.0/example/transformers/ext/llama.cpp However, based on community feedback, this implementation may still have room for improvement in terms of performance and usability. Integrating this functionality into the official llama.cpp repository could help improve cross-platform compatibility. It's also worth noting that different platforms support varying quantization precisions: STX (e.g., HX370) supports both W8A16 and W4ABF16 PHX (e.g., 7840HS), HPT (e.g., 8845HS) support W4ABF16, but NOT support W8A16 For more details, please refer to: 🔗 https://github.com/amd/RyzenAI-SW/blob/1.2.0/example/transformers/models/llm/docs/README.md Thank you for your hard work and contributions! We look forward to better support for Ryzen AI NPUs in the future.
GiteaMirror added the feature request label 2026-04-29 04:55:13 -05:00
Author
Owner

@jmorille commented on GitHub (Jun 25, 2025):

+1 Please add support

<!-- gh-comment-id:3006122983 --> @jmorille commented on GitHub (Jun 25, 2025): +1 Please add support
Author
Owner

@rick-github commented on GitHub (Jun 25, 2025):

#5186

<!-- gh-comment-id:3006240104 --> @rick-github commented on GitHub (Jun 25, 2025): #5186
Author
Owner

@netqer commented on GitHub (Jun 26, 2025):

#5186

We noticed that there has been considerable discussion around NPU support (e.g, #5186), but it turns out that AMD has already provided a preliminary implementation based on llama.cpp. If the ollama team is open to integrating these existing resources, we believe this could significantly accelerate the development of native Ryzen AI platform NPU support.

Moreover, there are hardware capability differences across the different NPU platforms:

PHX (e.g., 7840HS) and HPT (e.g., 8845HS) mainly support W4ABF16 quantization
STX (e.g., HX370) additionally supports W8A16
These differences require careful handling during model loading and inference. Fortunately, AMD’s documentation and code provide necessary guidance for such adaptations.

Notably, AMD has also exposed low-level operator implementations for the NPU in the following directory:
🔗 https://github.com/amd/RyzenAI-SW/tree/1.2.0/example/transformers/ops

This codebase can serve as a crucial reference for integrating the NPU backend into llama.cpp, helping developers build a performant and robust solution more efficiently.

We hope the ollama team can consider including Ryzen AI NPU support in the project's roadmap. we also willing to assist with testing or even contribute to the development if needed.

<!-- gh-comment-id:3007832593 --> @netqer commented on GitHub (Jun 26, 2025): > [#5186](https://github.com/ollama/ollama/issues/5186) We noticed that there has been considerable discussion around NPU support (e.g, #5186), but it turns out that AMD has already provided a preliminary implementation based on llama.cpp. If the ollama team is open to integrating these existing resources, we believe this could significantly accelerate the development of native Ryzen AI platform NPU support. Moreover, there are hardware capability differences across the different NPU platforms: PHX (e.g., 7840HS) and HPT (e.g., 8845HS) mainly support W4ABF16 quantization STX (e.g., HX370) additionally supports W8A16 These differences require careful handling during model loading and inference. Fortunately, AMD’s documentation and code provide necessary guidance for such adaptations. Notably, AMD has also exposed low-level operator implementations for the NPU in the following directory: 🔗 https://github.com/amd/RyzenAI-SW/tree/1.2.0/example/transformers/ops This codebase can serve as a crucial reference for integrating the NPU backend into llama.cpp, helping developers build a performant and robust solution more efficiently. We hope the ollama team can consider including Ryzen AI NPU support in the project's roadmap. we also willing to assist with testing or even contribute to the development if needed.
Author
Owner

@aibotza commented on GitHub (Jul 1, 2025):

+1 Please add support for NPU

<!-- gh-comment-id:3024347730 --> @aibotza commented on GitHub (Jul 1, 2025): +1 Please add support for NPU
Author
Owner

@nedzad-winter commented on GitHub (Jul 2, 2025):

+1 Please

<!-- gh-comment-id:3027986657 --> @nedzad-winter commented on GitHub (Jul 2, 2025): +1 Please
Author
Owner

@jamesalster commented on GitHub (Jul 2, 2025):

Another +1, please!

<!-- gh-comment-id:3028224768 --> @jamesalster commented on GitHub (Jul 2, 2025): Another +1, please!
Author
Owner

@mlaihk commented on GitHub (Jul 4, 2025):

+1 Please!

<!-- gh-comment-id:3034349614 --> @mlaihk commented on GitHub (Jul 4, 2025): +1 Please!
Author
Owner

@TypicalZedF commented on GitHub (Jul 4, 2025):

+1 To this aswell. Id like to utilize the hardware on my computer with a 100% local AI

<!-- gh-comment-id:3037313808 --> @TypicalZedF commented on GitHub (Jul 4, 2025): +1 To this aswell. Id like to utilize the hardware on my computer with a 100% local AI
Author
Owner

@Alex18gr commented on GitHub (Jul 7, 2025):

+1 pleaseee

<!-- gh-comment-id:3046343319 --> @Alex18gr commented on GitHub (Jul 7, 2025): +1 pleaseee
Author
Owner

@jmorille commented on GitHub (Jul 8, 2025):

The integration of an AMD Ryzen AI NPU proves to be significantly more affordable than a comparable Nvidia GPU configuration, while offering a unified memory architecture shared with the CPU of up to 128 GB.

In a Home Lab context, this is particularly compelling because it enables running 70b LLMs locally without resorting to an overpriced high-end Nvidia graphics card to achieve equivalent VRAM capacity

<!-- gh-comment-id:3048528714 --> @jmorille commented on GitHub (Jul 8, 2025): The integration of an AMD Ryzen AI NPU proves to be significantly more affordable than a comparable Nvidia GPU configuration, while offering a unified memory architecture shared with the CPU of up to 128 GB. In a Home Lab context, this is particularly compelling because it enables running 70b LLMs locally without resorting to an overpriced high-end Nvidia graphics card to achieve equivalent VRAM capacity
Author
Owner

@leander19961 commented on GitHub (Jul 11, 2025):

+1 Please

<!-- gh-comment-id:3061309853 --> @leander19961 commented on GitHub (Jul 11, 2025): +1 Please
Author
Owner

@xiongyw commented on GitHub (Jul 12, 2025):

+1 please!

<!-- gh-comment-id:3064910381 --> @xiongyw commented on GitHub (Jul 12, 2025): +1 please!
Author
Owner

@emilianionascu commented on GitHub (Jul 14, 2025):

+1 😄

<!-- gh-comment-id:3069860450 --> @emilianionascu commented on GitHub (Jul 14, 2025): +1 😄
Author
Owner

@rammanokar-plateron commented on GitHub (Jul 16, 2025):

+1

<!-- gh-comment-id:3076813998 --> @rammanokar-plateron commented on GitHub (Jul 16, 2025): +1
Author
Owner

@webprofusion-chrisc commented on GitHub (Jul 18, 2025):

For background, there is an increasing proliferation of mini-PCs (and larger ones) which have the AMD NPU available, but sadly these sit at 0% resource utilization during even the toughest ollama tasks. It would be great to light these up with work to do!

<!-- gh-comment-id:3086241698 --> @webprofusion-chrisc commented on GitHub (Jul 18, 2025): For background, there is an increasing proliferation of mini-PCs (and larger ones) which have the AMD NPU available, but sadly these sit at 0% resource utilization during even the toughest ollama tasks. It would be great to light these up with work to do!
Author
Owner

@kazami308 commented on GitHub (Jul 20, 2025):

+1

<!-- gh-comment-id:3094470894 --> @kazami308 commented on GitHub (Jul 20, 2025): +1
Author
Owner

@ignamartinoli commented on GitHub (Jul 21, 2025):

+1

<!-- gh-comment-id:3098134419 --> @ignamartinoli commented on GitHub (Jul 21, 2025): +1
Author
Owner

@bartschneider commented on GitHub (Jul 24, 2025):

+1

<!-- gh-comment-id:3113442858 --> @bartschneider commented on GitHub (Jul 24, 2025): +1
Author
Owner

@4nd1syntz commented on GitHub (Jul 26, 2025):

+1

<!-- gh-comment-id:3121563967 --> @4nd1syntz commented on GitHub (Jul 26, 2025): +1
Author
Owner

@padthaitofuhot commented on GitHub (Jul 27, 2025):

+1

<!-- gh-comment-id:3123827482 --> @padthaitofuhot commented on GitHub (Jul 27, 2025): +1
Author
Owner

@Jynx commented on GitHub (Jul 28, 2025):

+1

<!-- gh-comment-id:3125461934 --> @Jynx commented on GitHub (Jul 28, 2025): +1
Author
Owner

@TesaTesa commented on GitHub (Jul 30, 2025):

+1

<!-- gh-comment-id:3135331662 --> @TesaTesa commented on GitHub (Jul 30, 2025): +1
Author
Owner

@pramodhrachuri commented on GitHub (Jul 30, 2025):

+1

<!-- gh-comment-id:3136934767 --> @pramodhrachuri commented on GitHub (Jul 30, 2025): +1
Author
Owner

@rafaelfaustini commented on GitHub (Jul 31, 2025):

+1

<!-- gh-comment-id:3141574889 --> @rafaelfaustini commented on GitHub (Jul 31, 2025): +1
Author
Owner

@dr3dr3 commented on GitHub (Aug 3, 2025):

+1

<!-- gh-comment-id:3148726818 --> @dr3dr3 commented on GitHub (Aug 3, 2025): +1
Author
Owner

@geethangreets commented on GitHub (Aug 6, 2025):

+1

<!-- gh-comment-id:3160729715 --> @geethangreets commented on GitHub (Aug 6, 2025): +1
Author
Owner

@KeithBronstrup commented on GitHub (Aug 6, 2025):

+1

<!-- gh-comment-id:3160801337 --> @KeithBronstrup commented on GitHub (Aug 6, 2025): +1
Author
Owner

@lizer2014 commented on GitHub (Aug 8, 2025):

+10086

<!-- gh-comment-id:3166698975 --> @lizer2014 commented on GitHub (Aug 8, 2025): +10086
Author
Owner

@UN-9BOT commented on GitHub (Aug 8, 2025):

+1

<!-- gh-comment-id:3167305500 --> @UN-9BOT commented on GitHub (Aug 8, 2025): +1
Author
Owner

@arhe1on commented on GitHub (Aug 12, 2025):

+1

<!-- gh-comment-id:3178138367 --> @arhe1on commented on GitHub (Aug 12, 2025): +1
Author
Owner

@theCalcaholic commented on GitHub (Aug 14, 2025):

+1

While I'm also very much hoping for the implementation of AMD NPU support in ollama, I'd ask you to please refrain from adding messages like this which do not contribute to a solution.

You can instead use emoji reactions (e.g. 👍) to show that you want this feature.

Thank you :)

<!-- gh-comment-id:3188260622 --> @theCalcaholic commented on GitHub (Aug 14, 2025): > +1 While I'm also very much hoping for the implementation of AMD NPU support in ollama, I'd ask you to please refrain from adding messages like this which do not contribute to a solution. You can instead use emoji reactions (e.g. :+1:) to show that you want this feature. Thank you :)
Author
Owner

@amxstudiohub commented on GitHub (Aug 14, 2025):

+1

<!-- gh-comment-id:3189290067 --> @amxstudiohub commented on GitHub (Aug 14, 2025): +1
Author
Owner

@ignamartinoli commented on GitHub (Aug 14, 2025):

@amxstudiohub based

<!-- gh-comment-id:3189477198 --> @ignamartinoli commented on GitHub (Aug 14, 2025): @amxstudiohub based
Author
Owner

@akulmambetov commented on GitHub (Aug 26, 2025):

+1

<!-- gh-comment-id:3223543823 --> @akulmambetov commented on GitHub (Aug 26, 2025): +1
Author
Owner

@Edu171002 commented on GitHub (Sep 18, 2025):

+1

<!-- gh-comment-id:3307224180 --> @Edu171002 commented on GitHub (Sep 18, 2025): +1
Author
Owner

@rstein85 commented on GitHub (Sep 30, 2025):

+1

<!-- gh-comment-id:3349871351 --> @rstein85 commented on GitHub (Sep 30, 2025): +1
Author
Owner

@Ragua1 commented on GitHub (Oct 2, 2025):

+1

<!-- gh-comment-id:3360383296 --> @Ragua1 commented on GitHub (Oct 2, 2025): +1
Author
Owner

@Tim-Gabrikowski commented on GitHub (Oct 20, 2025):

+1

<!-- gh-comment-id:3421812832 --> @Tim-Gabrikowski commented on GitHub (Oct 20, 2025): +1
Author
Owner

@aethersis commented on GitHub (Oct 21, 2025):

+1

<!-- gh-comment-id:3427876754 --> @aethersis commented on GitHub (Oct 21, 2025): +1
Author
Owner

@NinjaCats commented on GitHub (Oct 23, 2025):

+1😄

<!-- gh-comment-id:3435407225 --> @NinjaCats commented on GitHub (Oct 23, 2025): +1😄
Author
Owner

@Joeeey0219 commented on GitHub (Oct 23, 2025):

+1

<!-- gh-comment-id:3436265216 --> @Joeeey0219 commented on GitHub (Oct 23, 2025): +1
Author
Owner

@mistrjirka commented on GitHub (Oct 23, 2025):

For anyone wondering why the support is not there and probably won't be for some time is that AMD hard s*cks at supporting the NPUs (amazing mentality of investing billions into hw but having no interesting software support). There is very little documentation and sdk that actually work on all the OS llama.cpp supports. The AMD sdk works currently only on windows. If you want to use the NPU on windows you can try Lemonade server.

<!-- gh-comment-id:3437288344 --> @mistrjirka commented on GitHub (Oct 23, 2025): For anyone wondering why the support is not there and probably won't be for some time is that AMD hard s*cks at supporting the NPUs (amazing mentality of investing billions into hw but having no interesting software support). There is very little documentation and sdk that actually work on all the OS llama.cpp supports. The AMD sdk works currently only on windows. If you want to use the NPU on windows you can try Lemonade server.
Author
Owner

@ziouf commented on GitHub (Oct 30, 2025):

NPU seem to be supported on Linux as stated by Th3raid0r in a comment on lemonade-sdk Discord chanel

NPU is detected

$ lspci | grep -i neural

c4:00.1 Signal processing controller: Advanced Micro Devices, Inc. [AMD] Strix/Krackan/Strix Halo Neural Processing Unit (rev 11)

The driver is loaded

$ lsmod | grep xdna

amdxdna               176128  0
gpu_sched              69632  2 amdxdna,amdgpu

rocminfo sees it

$ rocminfo | grep aie2 -A 4

  Name:                    aie2
  Uuid:                    AIE-XX
  Marketing Name:          AIE-ML
  Vendor Name:             AMD
  Feature:                 AGENT_DISPATCH

we can afford to be a little optimistic

edit:
https://www.phoronix.com/news/AMD-Ryzen-AI-XDNA-NPU3A

<!-- gh-comment-id:3466727986 --> @ziouf commented on GitHub (Oct 30, 2025): NPU seem to be supported on Linux as stated by Th3raid0r in a comment on lemonade-sdk Discord chanel - need kernel 6.14 for the driver to be built-in - need the shim in the repo driver https://github.com/amd/xdna-driver - need latest build of TheROCK https://github.com/ROCm/TheRock - Officials toolkits known are https://github.com/Xilinx/XRT, https://github.com/Xilinx/mlir-aie NPU is detected ``` $ lspci | grep -i neural c4:00.1 Signal processing controller: Advanced Micro Devices, Inc. [AMD] Strix/Krackan/Strix Halo Neural Processing Unit (rev 11) ``` The driver is loaded ``` $ lsmod | grep xdna amdxdna 176128 0 gpu_sched 69632 2 amdxdna,amdgpu ``` rocminfo sees it ``` $ rocminfo | grep aie2 -A 4 Name: aie2 Uuid: AIE-XX Marketing Name: AIE-ML Vendor Name: AMD Feature: AGENT_DISPATCH ``` we can afford to be a little optimistic edit: https://www.phoronix.com/news/AMD-Ryzen-AI-XDNA-NPU3A
Author
Owner

@kenneth-ge commented on GitHub (Jan 20, 2026):

+1

<!-- gh-comment-id:3771278363 --> @kenneth-ge commented on GitHub (Jan 20, 2026): +1
Author
Owner

@mistrjirka commented on GitHub (Jan 20, 2026):

NPU seem to be supported on Linux as stated by Th3raid0r in a comment on lemonade-sdk Discord chanel

* need kernel 6.14 for the driver to be built-in

* need the shim in the repo driver https://github.com/amd/xdna-driver

* need latest build of TheROCK https://github.com/ROCm/TheRock

* Officials toolkits known are https://github.com/Xilinx/XRT, https://github.com/Xilinx/mlir-aie

NPU is detected

$ lspci | grep -i neural

c4:00.1 Signal processing controller: Advanced Micro Devices, Inc. [AMD] Strix/Krackan/Strix Halo Neural Processing Unit (rev 11)

The driver is loaded

$ lsmod | grep xdna

amdxdna               176128  0
gpu_sched              69632  2 amdxdna,amdgpu

rocminfo sees it

$ rocminfo | grep aie2 -A 4

  Name:                    aie2
  Uuid:                    AIE-XX
  Marketing Name:          AIE-ML
  Vendor Name:             AMD
  Feature:                 AGENT_DISPATCH

we can afford to be a little optimistic

edit: https://www.phoronix.com/news/AMD-Ryzen-AI-XDNA-NPU3A

I tried running some demos on the NPU, sadly the current XRT that actually supports XDNA2 is in closed beta and to get access to it you need to "sign" an NDA and be part of some larger company. I think that it would be possible for someone from here to get the access, I tried it personally but could not.

<!-- gh-comment-id:3772240340 --> @mistrjirka commented on GitHub (Jan 20, 2026): > NPU seem to be supported on Linux as stated by Th3raid0r in a comment on lemonade-sdk Discord chanel > > * need kernel 6.14 for the driver to be built-in > > * need the shim in the repo driver https://github.com/amd/xdna-driver > > * need latest build of TheROCK https://github.com/ROCm/TheRock > > * Officials toolkits known are https://github.com/Xilinx/XRT, https://github.com/Xilinx/mlir-aie > > > NPU is detected > > ``` > $ lspci | grep -i neural > > c4:00.1 Signal processing controller: Advanced Micro Devices, Inc. [AMD] Strix/Krackan/Strix Halo Neural Processing Unit (rev 11) > ``` > > The driver is loaded > > ``` > $ lsmod | grep xdna > > amdxdna 176128 0 > gpu_sched 69632 2 amdxdna,amdgpu > ``` > > rocminfo sees it > > ``` > $ rocminfo | grep aie2 -A 4 > > Name: aie2 > Uuid: AIE-XX > Marketing Name: AIE-ML > Vendor Name: AMD > Feature: AGENT_DISPATCH > ``` > > we can afford to be a little optimistic > > edit: https://www.phoronix.com/news/AMD-Ryzen-AI-XDNA-NPU3A I tried running some demos on the NPU, sadly the current XRT that actually supports XDNA2 is in closed beta and to get access to it you need to "sign" an NDA and be part of some larger company. I think that it would be possible for someone from here to get the access, I tried it personally but could not.
Author
Owner

@ivpoov commented on GitHub (Feb 7, 2026):

+1

<!-- gh-comment-id:3863882862 --> @ivpoov commented on GitHub (Feb 7, 2026): +1
Author
Owner

@rladinger commented on GitHub (Feb 14, 2026):

+1

<!-- gh-comment-id:3902283246 --> @rladinger commented on GitHub (Feb 14, 2026): +1
Author
Owner

@ClementGayet commented on GitHub (Feb 16, 2026):

+1

<!-- gh-comment-id:3907536330 --> @ClementGayet commented on GitHub (Feb 16, 2026): +1
Author
Owner

@TSJasonH commented on GitHub (Mar 20, 2026):

+1

<!-- gh-comment-id:4098332190 --> @TSJasonH commented on GitHub (Mar 20, 2026): +1
Author
Owner

@PCAssistSoftware commented on GitHub (Mar 24, 2026):

+1

<!-- gh-comment-id:4118929561 --> @PCAssistSoftware commented on GitHub (Mar 24, 2026): +1
Author
Owner

@d4rkr0n1n commented on GitHub (Apr 3, 2026):

+1

<!-- gh-comment-id:4182700180 --> @d4rkr0n1n commented on GitHub (Apr 3, 2026): +1
Author
Owner

@xiongsongsong commented on GitHub (Apr 4, 2026):

+1

<!-- gh-comment-id:4185927290 --> @xiongsongsong commented on GitHub (Apr 4, 2026): +1
Author
Owner

@Faisal-Alessa commented on GitHub (Apr 4, 2026):

+1

<!-- gh-comment-id:4185990726 --> @Faisal-Alessa commented on GitHub (Apr 4, 2026): +1
Author
Owner

@Laotu77 commented on GitHub (Apr 14, 2026):

+1

<!-- gh-comment-id:4242666201 --> @Laotu77 commented on GitHub (Apr 14, 2026): +1
Author
Owner

@pmssantos commented on GitHub (Apr 17, 2026):

+1

<!-- gh-comment-id:4268830377 --> @pmssantos commented on GitHub (Apr 17, 2026): +1
Author
Owner

@Syphdias commented on GitHub (Apr 18, 2026):

Please lock this issue or limit to maintainers to stop people from commenting "+1" instead of reacting to the issue because this triggers notifications to ~50 watchers. Thank you :)

<!-- gh-comment-id:4273253874 --> @Syphdias commented on GitHub (Apr 18, 2026): Please lock this issue or limit to maintainers to stop people from commenting "+1" instead of reacting to the issue because this triggers notifications to ~50 watchers. Thank you :)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#53891