[GH-ISSUE #5186] AMD Ryzen NPU support #3262

Open
opened 2026-04-12 13:47:51 -05:00 by GiteaMirror · 59 comments
Owner

Originally created by @ivanbrash on GitHub (Jun 20, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5186

Originally assigned to: @dhiltgen on GitHub.

Hello! I'm want to buy Lenovo Xiaoxin 14 AI laptop on AMD Ryzen 7 8845H on my birthday and I will install Artix Linux to this. Do you will to add AMD Ryzen NPU support to Ollama on Linux and Windows? If anything, AMD Ryzen NPU driver for Linux is already available on Github:
https://github.com/amd/xdna-driver.git
Sorry for my bad English, please!

Originally created by @ivanbrash on GitHub (Jun 20, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5186 Originally assigned to: @dhiltgen on GitHub. Hello! I'm want to buy Lenovo Xiaoxin 14 AI laptop on AMD Ryzen 7 8845H on my birthday and I will install Artix Linux to this. Do you will to add AMD Ryzen NPU support to Ollama on Linux and Windows? If anything, AMD Ryzen NPU driver for Linux is already available on Github: https://github.com/amd/xdna-driver.git Sorry for my bad English, please!
GiteaMirror added the feature requestamd labels 2026-04-12 13:47:51 -05:00
Author
Owner

@billtown commented on GitHub (Jun 20, 2024):

I have an AMD Ryzen 7 7840U w/ Radeon 780M Graphics and recently got inference working in the igpu.
On linux for me, the rocm support works. I have to use the OVERRIDE_GFX_VERSION. people seem to have various luck messing with the version. Not sure if this helps at all.

podman run -d --name ollama --replace --pull=always --restart=always -p 0.0.0.0:11434:11434 -v ollama:/root/.ollama --stop-signal=SIGKILL --device /dev/dri --device /dev/kfd -e HSA_OVERRIDE_GFX_VERSION=11.0.2 -e HSA_ENABLE_SDMA=0 docker.io/ollama/ollama:rocm

<!-- gh-comment-id:2181166824 --> @billtown commented on GitHub (Jun 20, 2024): I have an AMD Ryzen 7 7840U w/ Radeon 780M Graphics and recently got inference working in the igpu. On linux for me, the rocm support works. I have to use the OVERRIDE_GFX_VERSION. people seem to have various luck messing with the version. Not sure if this helps at all. `podman run -d --name ollama --replace --pull=always --restart=always -p 0.0.0.0:11434:11434 -v ollama:/root/.ollama --stop-signal=SIGKILL --device /dev/dri --device /dev/kfd -e HSA_OVERRIDE_GFX_VERSION=11.0.2 -e HSA_ENABLE_SDMA=0 docker.io/ollama/ollama:rocm`
Author
Owner

@jasalt commented on GitHub (Jun 23, 2024):

There were some recent patches to llamafile and llama.cpp linked here also with ability to use more ram than what is dedicated to iGPU (HIP_UMA) https://github.com/ROCm/ROCm/discussions/2631#discussioncomment-9849190, looks promising.

<!-- gh-comment-id:2185280398 --> @jasalt commented on GitHub (Jun 23, 2024): There were some recent patches to llamafile and llama.cpp linked here also with ability to use more ram than what is dedicated to iGPU (HIP_UMA) https://github.com/ROCm/ROCm/discussions/2631#discussioncomment-9849190, looks promising.
Author
Owner

@coreybutler commented on GitHub (Aug 14, 2024):

Running a AMD Ryzen 9 8945HS here. Would love to see support for this.

<!-- gh-comment-id:2289276806 --> @coreybutler commented on GitHub (Aug 14, 2024): Running a AMD Ryzen 9 8945HS here. Would love to see support for this.
Author
Owner

@2018wzh commented on GitHub (Aug 16, 2024):

Running a AMD AI 9 370HX here, Same as above. Hoping to see support

<!-- gh-comment-id:2292676687 --> @2018wzh commented on GitHub (Aug 16, 2024): Running a AMD AI 9 370HX here, Same as above. Hoping to see support
Author
Owner

@grigio commented on GitHub (Aug 23, 2024):

Here are some news.. but Linux support seems lacking..
https://community.amd.com/t5/ai/get-a-powerful-ai-assistant-with-document-chat-accelerated-by/ba-p/704092
https://lmstudio.ai/ryzenai

<!-- gh-comment-id:2306909450 --> @grigio commented on GitHub (Aug 23, 2024): Here are some news.. but Linux support seems lacking.. https://community.amd.com/t5/ai/get-a-powerful-ai-assistant-with-document-chat-accelerated-by/ba-p/704092 https://lmstudio.ai/ryzenai
Author
Owner

@henry2man commented on GitHub (Sep 5, 2024):

I have an AMD Ryzen 7 7840U w/ Radeon 780M Graphics and recently got inference working in the igpu.

@billtown What's the performance of your setup? I've recently purchased a Ryzen 9 8945HS + 64Gb RAM MiniPC for some Docker + VM and (hopefully) some lightweight LLM workloads with Ollama.

PS: I'm not and expert on Ollama intrinsics but I have enough experience to help with testing with my own Hardware in order to make this request reality.

<!-- gh-comment-id:2330781786 --> @henry2man commented on GitHub (Sep 5, 2024): > I have an AMD Ryzen 7 7840U w/ Radeon 780M Graphics and recently got inference working in the igpu. @billtown What's the performance of your setup? I've recently purchased a Ryzen 9 8945HS + 64Gb RAM MiniPC for some Docker + VM and (hopefully) some lightweight LLM workloads with Ollama. PS: I'm not and expert on Ollama intrinsics but I have enough experience to help with testing with my own Hardware in order to make this request reality.
Author
Owner

@grigio commented on GitHub (Sep 5, 2024):

Running a AMD AI 9 370HX here, Same as above. Hoping to see support

Can you share how many token/s you get with llama3.1-Q4_k_m or similar ?

<!-- gh-comment-id:2330841907 --> @grigio commented on GitHub (Sep 5, 2024): > Running a AMD AI 9 370HX here, Same as above. Hoping to see support Can you share how many token/s you get with llama3.1-Q4_k_m or similar ?
Author
Owner

@fan123450 commented on GitHub (Sep 11, 2024):

Running a AMD 8845HS here, Same as above. Hoping to see support both gpu and npu.

<!-- gh-comment-id:2343800380 --> @fan123450 commented on GitHub (Sep 11, 2024): Running a AMD 8845HS here, Same as above. Hoping to see support both gpu and npu.
Author
Owner

@billtown commented on GitHub (Sep 11, 2024):

I have an AMD Ryzen 7 7840U w/ Radeon 780M Graphics and recently got inference working in the igpu.

@billtown What's the performance of your setup? I've recently purchased a Ryzen 9 8945HS + 64Gb RAM MiniPC for some Docker + VM and (hopefully) some lightweight LLM workloads with Ollama.

PS: I'm not and expert on Ollama intrinsics but I have enough experience to help with testing with my own Hardware in order to make this request reality.

total duration: 22.204829879s
load duration: 16.99589ms
prompt eval count: 1411 token(s)
prompt eval duration: 625.952ms
prompt eval rate: 2254.17 tokens/s
eval count: 269 token(s)
eval duration: 20.76486s
eval rate: 12.95 tokens/s < after building some context.

llama3:8b 365c0bd3c000 6.7 GB 100% GPU

radeontop at least shows vram and shaders and pipes hitting 100% when running. I have 16gb allocated in the bios

0.80G / 0.80G Memory Clock 100.00%
2.13G / 2.70G Shader Clock 78.81%
Graphics pipe 99.17%
Shader Interpolator 92.50%
Clip Rectangle 100.00%
these are what come alive in radeontop. And then a single thread on cpu hit's 100% (ollama).

<!-- gh-comment-id:2344379559 --> @billtown commented on GitHub (Sep 11, 2024): > > I have an AMD Ryzen 7 7840U w/ Radeon 780M Graphics and recently got inference working in the igpu. > > @billtown What's the performance of your setup? I've recently purchased a Ryzen 9 8945HS + 64Gb RAM MiniPC for some Docker + VM and (hopefully) some lightweight LLM workloads with Ollama. > > PS: I'm not and expert on Ollama intrinsics but I have enough experience to help with testing with my own Hardware in order to make this request reality. total duration: 22.204829879s load duration: 16.99589ms prompt eval count: 1411 token(s) prompt eval duration: 625.952ms prompt eval rate: 2254.17 tokens/s eval count: 269 token(s) eval duration: 20.76486s eval rate: 12.95 tokens/s < after building some context. llama3:8b 365c0bd3c000 6.7 GB 100% GPU radeontop at least shows vram and shaders and pipes hitting 100% when running. I have 16gb allocated in the bios 0.80G / 0.80G Memory Clock 100.00% 2.13G / 2.70G Shader Clock 78.81% Graphics pipe 99.17% Shader Interpolator 92.50% Clip Rectangle 100.00% these are what come alive in radeontop. And then a single thread on cpu hit's 100% (ollama).
Author
Owner

@fan123450 commented on GitHub (Sep 12, 2024):

I have an AMD Ryzen 7 7840U w/ Radeon 780M Graphics and recently got inference working in the igpu.

@billtown What's the performance of your setup? I've recently purchased a Ryzen 9 8945HS + 64Gb RAM MiniPC for some Docker + VM and (hopefully) some lightweight LLM workloads with Ollama.
PS: I'm not and expert on Ollama intrinsics but I have enough experience to help with testing with my own Hardware in order to make this request reality.

total duration: 22.204829879s load duration: 16.99589ms prompt eval count: 1411 token(s) prompt eval duration: 625.952ms prompt eval rate: 2254.17 tokens/s eval count: 269 token(s) eval duration: 20.76486s eval rate: 12.95 tokens/s < after building some context.

llama3:8b 365c0bd3c000 6.7 GB 100% GPU

radeontop at least shows vram and shaders and pipes hitting 100% when running. I have 16gb allocated in the bios

0.80G / 0.80G Memory Clock 100.00% 2.13G / 2.70G Shader Clock 78.81% Graphics pipe 99.17% Shader Interpolator 92.50% Clip Rectangle 100.00% these are what come alive in radeontop. And then a single thread on cpu hit's 100% (ollama).

Great!Is there a detailed implementation steps reference? If available,I will be very grateful!

<!-- gh-comment-id:2345193666 --> @fan123450 commented on GitHub (Sep 12, 2024): > > > I have an AMD Ryzen 7 7840U w/ Radeon 780M Graphics and recently got inference working in the igpu. > > > > > > @billtown What's the performance of your setup? I've recently purchased a Ryzen 9 8945HS + 64Gb RAM MiniPC for some Docker + VM and (hopefully) some lightweight LLM workloads with Ollama. > > PS: I'm not and expert on Ollama intrinsics but I have enough experience to help with testing with my own Hardware in order to make this request reality. > > total duration: 22.204829879s load duration: 16.99589ms prompt eval count: 1411 token(s) prompt eval duration: 625.952ms prompt eval rate: 2254.17 tokens/s eval count: 269 token(s) eval duration: 20.76486s eval rate: 12.95 tokens/s < after building some context. > > llama3:8b 365c0bd3c000 6.7 GB 100% GPU > > radeontop at least shows vram and shaders and pipes hitting 100% when running. I have 16gb allocated in the bios > > 0.80G / 0.80G Memory Clock 100.00% 2.13G / 2.70G Shader Clock 78.81% Graphics pipe 99.17% Shader Interpolator 92.50% Clip Rectangle 100.00% these are what come alive in radeontop. And then a single thread on cpu hit's 100% (ollama). Great!Is there a detailed implementation steps reference? If available,I will be very grateful!
Author
Owner

@evansrrr commented on GitHub (Sep 24, 2024):

Hello! I'm want to buy Lenovo Xiaoxin 14 AI laptop on AMD Ryzen 7 8845H on my birthday and I will install Artix Linux to this. Do you will to add AMD Ryzen NPU support to Ollama on Linux and Windows? If anything, AMD Ryzen NPU driver for Linux is already available on Github: https://github.com/amd/xdna-driver.git Sorry for my bad English, please!

Running Lenovo Xiaoxin pro 16, R7-8845H as the processor, same as above. Hope to enable AMD NPU soon!

<!-- gh-comment-id:2370922001 --> @evansrrr commented on GitHub (Sep 24, 2024): > Hello! I'm want to buy Lenovo Xiaoxin 14 AI laptop on AMD Ryzen 7 8845H on my birthday and I will install Artix Linux to this. Do you will to add AMD Ryzen NPU support to Ollama on Linux and Windows? If anything, AMD Ryzen NPU driver for Linux is already available on Github: https://github.com/amd/xdna-driver.git Sorry for my bad English, please! Running Lenovo Xiaoxin pro 16, R7-8845H as the processor, same as above. Hope to enable AMD NPU soon!
Author
Owner

@robfuscator commented on GitHub (Oct 22, 2024):

We'll have to wait at least until february before this is even possible on linux using a mainline kernel:

https://www.phoronix.com/news/AMD-XDNA-Linux-Driver-v4

<!-- gh-comment-id:2428755494 --> @robfuscator commented on GitHub (Oct 22, 2024): We'll have to wait at least until february before this is even possible on linux using a mainline kernel: https://www.phoronix.com/news/AMD-XDNA-Linux-Driver-v4
Author
Owner

@ivanbrash commented on GitHub (Nov 29, 2024):

I bought a Honor Magicbook X14 Pro on Ryzen 7 7840HS and installed a Gentoo with KDE on it. So far, I have not tried to install Ollama on it, since there is no NPU support on it. But when it appears, I will definitely install it.

<!-- gh-comment-id:2507187224 --> @ivanbrash commented on GitHub (Nov 29, 2024): I bought a Honor Magicbook X14 Pro on Ryzen 7 7840HS and installed a Gentoo with KDE on it. So far, I have not tried to install Ollama on it, since there is no NPU support on it. But when it appears, I will definitely install it.
Author
Owner

@ToeiRei commented on GitHub (Nov 29, 2024):

I bought a Honor Magicbook X14 Pro on Ryzen 7 7840HS and installed a Gentoo with KDE on it. So far, I have not tried to install Ollama on it, since there is no NPU support on it. But when it appears, I will definitely install it.

I did play around with AI accelerators a bit and my Frame.Work has the same CPU (as you mention about your magicbook). The TOPS value was disappointing - to put it mildly. Don't get your hopes up; 25 TOPS max with different applications. It's a blast for image recognition, OCR,... but falls flat on LLM tasks.

<!-- gh-comment-id:2508111501 --> @ToeiRei commented on GitHub (Nov 29, 2024): > I bought a Honor Magicbook X14 Pro on Ryzen 7 7840HS and installed a Gentoo with KDE on it. So far, I have not tried to install Ollama on it, since there is no NPU support on it. But when it appears, I will definitely install it. I did play around with AI accelerators a bit and my Frame.Work has the same CPU (as you mention about your magicbook). The TOPS value was disappointing - to put it mildly. Don't get your hopes up; 25 TOPS max with different applications. It's a blast for image recognition, OCR,... but falls flat on LLM tasks.
Author
Owner

@JiapengLi commented on GitHub (Dec 20, 2024):

Here is my test result under:

  • AMD Ryzen AI 9 HX 370 w/ Radeon 890M
  • LPDDR5 16GB
  • Ubuntu 24.04
  • Kernel 6.8.0

The performance is not as good as expected,


dev@VM100:~$ ollama run llama3.2:latest 'Develop a python function that solves the following problem, sudoku game' --verbose

...

total duration:       27.298931858s
load duration:        2.052439105s
prompt eval count:    37 token(s)
prompt eval duration: 405ms
prompt eval rate:     91.36 tokens/s
eval count:           675 token(s)
eval duration:        24.839s
eval rate:            27.18 tokens/s
dev@VM100:~$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=24.04
DISTRIB_CODENAME=noble
DISTRIB_DESCRIPTION="Ubuntu 24.04.1 LTS"


dev@VM100:~$ uname -a
Linux VM100 6.8.0-50-generic #51-Ubuntu SMP PREEMPT_DYNAMIC Sat Nov  9 17:58:29 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux


dev@VM100:~$ lscpu
Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          48 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   24
  On-line CPU(s) list:    0-23
Vendor ID:                AuthenticAMD
  Model name:             AMD Ryzen AI 9 HX 370 w/ Radeon 890M
    CPU family:           26
    Model:                36
    Thread(s) per core:   1
    Core(s) per socket:   24
    Socket(s):            1
    Stepping:             0
    BogoMIPS:             3992.46
    Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic mo
                          vbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw perfctr_core ssbd ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms in
                          vpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx_vnni avx512_bf16 clzero xsaveerptr wbnoinvd arat npt lbrv nrip_save tsc_scale vmcb_clean flushbyasid pausefilt
                          er pfthreshold v_vmsave_vmload vgif vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid bus_lock_detect movdiri movdir64b fsrm avx512_vp2intersect flush_l1d arch_capabilities
Virtualization features:
  Virtualization:         AMD-V
  Hypervisor vendor:      KVM
  Virtualization type:    full
Caches (sum of all):
  L1d:                    1.5 MiB (24 instances)
  L1i:                    1.5 MiB (24 instances)
  L2:                     12 MiB (24 instances)
  L3:                     384 MiB (24 instances)
NUMA:
  NUMA node(s):           1
  NUMA node0 CPU(s):      0-23
Vulnerabilities:
  Gather data sampling:   Not affected
  Itlb multihit:          Not affected
  L1tf:                   Not affected
  Mds:                    Not affected
  Meltdown:               Not affected
  Mmio stale data:        Not affected
  Reg file data sampling: Not affected
  Retbleed:               Not affected
  Spec rstack overflow:   Not affected
  Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
  Srbds:                  Not affected
  Tsx async abort:        Not affected


dev@VM100:~$ ollama --version
ollama version is 0.5.4
dev@VM100:~$ ollama list
NAME               ID              SIZE      MODIFIED
llama3.1:8b        46e0c10c039e    4.9 GB    3 hours ago
llama3.2:latest    a80c4f17acd5    2.0 GB    3 hours ago

<!-- gh-comment-id:2556496478 --> @JiapengLi commented on GitHub (Dec 20, 2024): Here is my test result under: - AMD Ryzen AI 9 HX 370 w/ Radeon 890M - LPDDR5 16GB - Ubuntu 24.04 - Kernel 6.8.0 The performance is not as good as expected, ``` dev@VM100:~$ ollama run llama3.2:latest 'Develop a python function that solves the following problem, sudoku game' --verbose ... total duration: 27.298931858s load duration: 2.052439105s prompt eval count: 37 token(s) prompt eval duration: 405ms prompt eval rate: 91.36 tokens/s eval count: 675 token(s) eval duration: 24.839s eval rate: 27.18 tokens/s ``` ``` dev@VM100:~$ cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=24.04 DISTRIB_CODENAME=noble DISTRIB_DESCRIPTION="Ubuntu 24.04.1 LTS" dev@VM100:~$ uname -a Linux VM100 6.8.0-50-generic #51-Ubuntu SMP PREEMPT_DYNAMIC Sat Nov 9 17:58:29 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux dev@VM100:~$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 48 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 24 On-line CPU(s) list: 0-23 Vendor ID: AuthenticAMD Model name: AMD Ryzen AI 9 HX 370 w/ Radeon 890M CPU family: 26 Model: 36 Thread(s) per core: 1 Core(s) per socket: 24 Socket(s): 1 Stepping: 0 BogoMIPS: 3992.46 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic mo vbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw perfctr_core ssbd ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms in vpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx_vnni avx512_bf16 clzero xsaveerptr wbnoinvd arat npt lbrv nrip_save tsc_scale vmcb_clean flushbyasid pausefilt er pfthreshold v_vmsave_vmload vgif vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid bus_lock_detect movdiri movdir64b fsrm avx512_vp2intersect flush_l1d arch_capabilities Virtualization features: Virtualization: AMD-V Hypervisor vendor: KVM Virtualization type: full Caches (sum of all): L1d: 1.5 MiB (24 instances) L1i: 1.5 MiB (24 instances) L2: 12 MiB (24 instances) L3: 384 MiB (24 instances) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-23 Vulnerabilities: Gather data sampling: Not affected Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Not affected Reg file data sampling: Not affected Retbleed: Not affected Spec rstack overflow: Not affected Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected Srbds: Not affected Tsx async abort: Not affected dev@VM100:~$ ollama --version ollama version is 0.5.4 dev@VM100:~$ ollama list NAME ID SIZE MODIFIED llama3.1:8b 46e0c10c039e 4.9 GB 3 hours ago llama3.2:latest a80c4f17acd5 2.0 GB 3 hours ago ```
Author
Owner

@JiapengLi commented on GitHub (Dec 20, 2024):

Related topics:
#3004

<!-- gh-comment-id:2556526769 --> @JiapengLi commented on GitHub (Dec 20, 2024): Related topics: #3004
Author
Owner

@Pekkari commented on GitHub (Dec 20, 2024):

@JiapengLi I don't think that is using your NPU in any ways, the amd-xdna driver is most likely be available in linux 6.14, then you may need the user space libraries from amd to interact to it, like rocm when talking amd gpus, or just cuda for nvidia, and then ollama may need to have code to call those libraries, which is the reason for this issue to exist. I'm no ollama maintainer though, they may know more details of those I mentioned.

<!-- gh-comment-id:2556776488 --> @Pekkari commented on GitHub (Dec 20, 2024): @JiapengLi I don't think that is using your NPU in any ways, the amd-xdna driver is most likely be available in linux 6.14, then you may need the user space libraries from amd to interact to it, like rocm when talking amd gpus, or just cuda for nvidia, and then ollama may need to have code to call those libraries, which is the reason for this issue to exist. I'm no ollama maintainer though, they may know more details of those I mentioned.
Author
Owner

@grigio commented on GitHub (Dec 21, 2024):

@JiapengLi I think Linux 6.14 should improve the situation, keep up updated
https://www.phoronix.com/news/Ryzen-AI-NPU6-Linux-6.14

<!-- gh-comment-id:2558098417 --> @grigio commented on GitHub (Dec 21, 2024): @JiapengLi I think Linux 6.14 should improve the situation, keep up updated https://www.phoronix.com/news/Ryzen-AI-NPU6-Linux-6.14
Author
Owner

@sinchichou commented on GitHub (Dec 26, 2024):

So I'am trying to let LLM running on AMD NPU.
But look like it need Visual Studio 2022 Community, CMake, Anaconda or Miniconda.
And the lib all need Ryzen AI SW or any else.
On ONNX Runtime supported list, there no show AMD NPU.
Maybe can try DirectML or ROCm.
I'll try that late.

<!-- gh-comment-id:2562364255 --> @sinchichou commented on GitHub (Dec 26, 2024): So I'am trying to let LLM running on AMD NPU. But look like it need Visual Studio 2022 Community, CMake, Anaconda or Miniconda. And the lib all need Ryzen AI SW or any else. On ONNX Runtime supported list, there no show AMD NPU. Maybe can try DirectML or ROCm. I'll try that late.
Author
Owner

@GreyXor commented on GitHub (Mar 18, 2025):

@JiapengLi I think Linux 6.14 should improve the situation, keep up updated https://www.phoronix.com/news/Ryzen-AI-NPU6-Linux-6.14

Yes I confirm that I can run amdxdna driver on my 6.14

<!-- gh-comment-id:2732657869 --> @GreyXor commented on GitHub (Mar 18, 2025): > [@JiapengLi](https://github.com/JiapengLi) I think Linux 6.14 should improve the situation, keep up updated https://www.phoronix.com/news/Ryzen-AI-NPU6-Linux-6.14 Yes I confirm that I can run amdxdna driver on my 6.14
Author
Owner

@grigio commented on GitHub (Mar 18, 2025):

@GreyXor do you have improvements in tokens/sec over cpu or vulkan ?

<!-- gh-comment-id:2734680350 --> @grigio commented on GitHub (Mar 18, 2025): @GreyXor do you have improvements in tokens/sec over cpu or vulkan ?
Author
Owner

@GreyXor commented on GitHub (Mar 18, 2025):

I mean, amdxdna is loaded and working. but I don't have app to effectively inference something. want me to try something ? I would be happy to try and run some benchmark

amdxdna has been some kind of vaporware since mid-2023. At least now the driver is working but nothing uses it.
I asked AMD for some doc here https://github.com/AMD-AIG-AIMA/Instella/issues/1 and we are some to wait for support here : https://github.com/ggml-org/llama.cpp/issues/1499

<!-- gh-comment-id:2734877139 --> @GreyXor commented on GitHub (Mar 18, 2025): I mean, amdxdna is loaded and working. but I don't have app to effectively inference something. want me to try something ? I would be happy to try and run some benchmark `amdxdna` has been some kind of vaporware since mid-2023. At least now the driver is working but nothing uses it. I asked AMD for some doc here https://github.com/AMD-AIG-AIMA/Instella/issues/1 and we are some to wait for support here : https://github.com/ggml-org/llama.cpp/issues/1499
Author
Owner

@wishx commented on GitHub (Mar 24, 2025):

The 6.14 kernel has been released and is widely available now. Just letting folks know.
https://www.phoronix.com/news/Linux-6.14

<!-- gh-comment-id:2749089741 --> @wishx commented on GitHub (Mar 24, 2025): The 6.14 kernel has been released and is widely available now. Just letting folks know. https://www.phoronix.com/news/Linux-6.14
Author
Owner

@evansrrr commented on GitHub (Apr 10, 2025):

yeah, and I saw that linux platform did have a leap in AI techs, e.g. having a maximum effect in proceeding sd-webui-aki etc.

<!-- gh-comment-id:2791381278 --> @evansrrr commented on GitHub (Apr 10, 2025): yeah, and I saw that linux platform did have a leap in AI techs, e.g. having a maximum effect in proceeding sd-webui-aki etc.
Author
Owner

@DocMAX commented on GitHub (Apr 14, 2025):

I'm going to buy a laptop for local LLM use. Can you recommend a good CPU for this? I prefer Lenovos ThinkPad line. Any experiences? Should i wait for the next generation?

<!-- gh-comment-id:2800841557 --> @DocMAX commented on GitHub (Apr 14, 2025): I'm going to buy a laptop for local LLM use. Can you recommend a good CPU for this? I prefer Lenovos ThinkPad line. Any experiences? Should i wait for the next generation?
Author
Owner

@XenoAmess commented on GitHub (Apr 14, 2025):

I'm going to buy a laptop for local LLM use. Can you recommend a good CPU for this? I prefer Lenovos ThinkPad line. Any experiences? Should i wait for the next generation?

well I hate apple but... well, just buy mac book(IMO).

<!-- gh-comment-id:2801033618 --> @XenoAmess commented on GitHub (Apr 14, 2025): > I'm going to buy a laptop for local LLM use. Can you recommend a good CPU for this? I prefer Lenovos ThinkPad line. Any experiences? Should i wait for the next generation? well I hate apple but... well, just buy mac book(IMO).
Author
Owner

@DocMAX commented on GitHub (Apr 14, 2025):

I hate apple too, so this is no option...

<!-- gh-comment-id:2801041928 --> @DocMAX commented on GitHub (Apr 14, 2025): I hate apple too, so this is no option...
Author
Owner

@Bush-cat commented on GitHub (Apr 14, 2025):

I'm going to buy a laptop for local LLM use. Can you recommend a good CPU for this? I prefer Lenovos ThinkPad line. Any experiences? Should i wait for the next generation?

you'd want a ryzen ai max cpu

<!-- gh-comment-id:2801237100 --> @Bush-cat commented on GitHub (Apr 14, 2025): > I'm going to buy a laptop for local LLM use. Can you recommend a good CPU for this? I prefer Lenovos ThinkPad line. Any experiences? Should i wait for the next generation? you'd want a ryzen ai max cpu
Author
Owner

@DocMAX commented on GitHub (Apr 14, 2025):

How does it perform with ROCm and Ollama (Tokens/s)? Can't find any benchmark comparison list anywhere.

<!-- gh-comment-id:2801297255 --> @DocMAX commented on GitHub (Apr 14, 2025): How does it perform with ROCm and Ollama (Tokens/s)? Can't find any benchmark comparison list anywhere.
Author
Owner

@XenoAmess commented on GitHub (Apr 14, 2025):

I have AMD cpu with NPU.
I have ubuntu kernel with 6.14 linux.
I have no way to run any llm backend with them.
So thanks your hell for ai, AMD.
Oh maybe I should say the word 'help', but I don't think they deserve.

Image

<!-- gh-comment-id:2801423081 --> @XenoAmess commented on GitHub (Apr 14, 2025): I have AMD cpu with NPU. I have ubuntu kernel with 6.14 linux. I have no way to run any llm backend with them. So thanks your hell for ai, AMD. Oh maybe I should say the word 'help', but I don't think they deserve. ![Image](https://github.com/user-attachments/assets/649ee16c-3ca5-4ed2-b1c1-537ce0610670)
Author
Owner

@Bush-cat commented on GitHub (Apr 14, 2025):

How does it perform with ROCm and Ollama (Tokens/s)? Can't find any benchmark comparison list anywhere.

It's the fastest igpu as it has very fast 4channel memory,
you can only get faster with a thick discrete gpu but those have less vram so you're limited in the size of the models you can run

<!-- gh-comment-id:2801432787 --> @Bush-cat commented on GitHub (Apr 14, 2025): > How does it perform with ROCm and Ollama (Tokens/s)? Can't find any benchmark comparison list anywhere. It's the fastest igpu as it has very fast 4channel memory, you can only get faster with a thick discrete gpu but those have less vram so you're limited in the size of the models you can run
Author
Owner

@Bush-cat commented on GitHub (Apr 14, 2025):

I have AMD cpu with NPU. I have ubuntu kernel with 6.14 linux. I have no way to run any llm backend with them. So thanks your hell for ai, AMD. Oh maybe I should say the word 'help', but I don't think they deserve.

Welp it always was more marketing for windows users and features I guess, the first ads for the npus were for applicaitons like ms teams...

also don't expect much from the 15 tops of your ryzen 8000 (or 10 tops of my ryzen 7000) npu, copilot+ pcs require at least a 50tops npu to do anything.
I saw a person benchmarking the ryzen 8000 npu and it took several minutes to finish an output with the tiniest llama model

<!-- gh-comment-id:2801444041 --> @Bush-cat commented on GitHub (Apr 14, 2025): > I have AMD cpu with NPU. I have ubuntu kernel with 6.14 linux. I have no way to run any llm backend with them. So thanks your hell for ai, AMD. Oh maybe I should say the word 'help', but I don't think they deserve. Welp it always was more marketing for windows users and features I guess, the first ads for the npus were for applicaitons like ms teams... also don't expect much from the 15 tops of your ryzen 8000 (or 10 tops of my ryzen 7000) npu, copilot+ pcs require at least a 50tops npu to do anything. I saw a person benchmarking the ryzen 8000 npu and it took several minutes to finish an output with the tiniest llama model
Author
Owner

@XenoAmess commented on GitHub (Apr 14, 2025):

I have AMD cpu with NPU. I have ubuntu kernel with 6.14 linux. I have no way to run any llm backend with them. So thanks your hell for ai, AMD. Oh maybe I should say the word 'help', but I don't think they deserve.

Welp it always was more marketing for windows users and features I guess, the first ads for the npus were for applicaitons like ms teams...

also don't expect much from the 15 tops of your ryzen 8000 (or 10 tops of my ryzen 7000) npu, copilot+ pcs require at least a 50tops npu to do anything. I saw a person benchmarking the ryzen 8000 npu and it took several minutes to finish an output with the tiniest llama model

The tiniest llama model we've seen is 0.5B. I can't quite believe it would use minutes to handle requests with 0.5B but...
Well, let's wait for AMD engineers to make it useable in another 2 years. maybe there is still hope they can achieeeeeeve it then?

<!-- gh-comment-id:2801458477 --> @XenoAmess commented on GitHub (Apr 14, 2025): > > I have AMD cpu with NPU. I have ubuntu kernel with 6.14 linux. I have no way to run any llm backend with them. So thanks your hell for ai, AMD. Oh maybe I should say the word 'help', but I don't think they deserve. > > Welp it always was more marketing for windows users and features I guess, the first ads for the npus were for applicaitons like ms teams... > > also don't expect much from the 15 tops of your ryzen 8000 (or 10 tops of my ryzen 7000) npu, copilot+ pcs require at least a 50tops npu to do anything. I saw a person benchmarking the ryzen 8000 npu and it took several minutes to finish an output with the tiniest llama model The tiniest llama model we've seen is 0.5B. I can't quite believe it would use minutes to handle requests with 0.5B but... Well, let's wait for AMD engineers to make it useable in another 2 years. maybe there is still hope they can achieeeeeeve it then?
Author
Owner

@Pekkari commented on GitHub (Apr 14, 2025):

I have AMD cpu with NPU. I have ubuntu kernel with 6.14 linux. I have no way to run any llm backend with them. So thanks your hell for ai, AMD. Oh maybe I should say the word 'help', but I don't think they deserve.

Image

The marketing info around suggest maybe it may run using either LMStudio or vllm, needless to say, on linux side, one always may expect to tinker a bit to get those topics working.

<!-- gh-comment-id:2801484209 --> @Pekkari commented on GitHub (Apr 14, 2025): > I have AMD cpu with NPU. I have ubuntu kernel with 6.14 linux. I have no way to run any llm backend with them. So thanks your hell for ai, AMD. Oh maybe I should say the word 'help', but I don't think they deserve. > > ![Image](https://github.com/user-attachments/assets/649ee16c-3ca5-4ed2-b1c1-537ce0610670) The marketing info around suggest maybe it may run using either LMStudio or vllm, needless to say, on linux side, one always may expect to tinker a bit to get those topics working.
Author
Owner

@bonswouar commented on GitHub (Apr 14, 2025):

Please guys try it out instead of speculating: https://github.com/ollama/ollama/pull/6282
Not sure if it correctly uses the NPU, but it's working!

I saw a person benchmarking the ryzen 8000 npu and it took several minutes to finish an output with the tiniest llama model

Probably fake news, I have a 8845hs and the few models I've tried (8B to 15B) run pretty well (but of course it depends what you compare it to).
Not "several minutes" for "tiniest llama model" for sure though

<!-- gh-comment-id:2801511131 --> @bonswouar commented on GitHub (Apr 14, 2025): Please guys try it out instead of speculating: https://github.com/ollama/ollama/pull/6282 Not sure if it correctly uses the NPU, but it's working! > I saw a person benchmarking the ryzen 8000 npu and it took several minutes to finish an output with the tiniest llama model Probably fake news, I have a 8845hs and the few models I've tried (8B to 15B) run pretty well (but of course it depends what you compare it to). Not "several minutes" for "tiniest llama model" for sure though
Author
Owner

@DocMAX commented on GitHub (Apr 14, 2025):

AMD 5800U APU with ROCm: Llama3.1 8B. Question: "Who is Bill Gates":

total duration: 1m7.077394148s
load duration: 22.877264ms
prompt eval count: 51 token(s)
prompt eval duration: 8.593335ms
prompt eval rate: 5934.83 tokens/s
eval count: 435 token(s)
eval duration: 1m7.044265099s
eval rate: 6.49 tokens/s

<!-- gh-comment-id:2801558078 --> @DocMAX commented on GitHub (Apr 14, 2025): AMD 5800U APU with ROCm: Llama3.1 8B. Question: "Who is Bill Gates": total duration: 1m7.077394148s load duration: 22.877264ms prompt eval count: 51 token(s) prompt eval duration: 8.593335ms prompt eval rate: 5934.83 tokens/s eval count: 435 token(s) eval duration: 1m7.044265099s eval rate: 6.49 tokens/s
Author
Owner

@Bush-cat commented on GitHub (Apr 14, 2025):

I have AMD cpu with NPU. I have ubuntu kernel with 6.14 linux. I have no way to run any llm backend with them. So thanks your hell for ai, AMD. Oh maybe I should say the word 'help', but I don't think they deserve.

Welp it always was more marketing for windows users and features I guess, the first ads for the npus were for applicaitons like ms teams...
also don't expect much from the 15 tops of your ryzen 8000 (or 10 tops of my ryzen 7000) npu, copilot+ pcs require at least a 50tops npu to do anything. I saw a person benchmarking the ryzen 8000 npu and it took several minutes to finish an output with the tiniest llama model

The tiniest llama model we've seen is 0.5B. I can't quite believe it would use minutes to handle requests with 0.5B but... Well, let's wait for AMD engineers to make it useable in another 2 years. maybe there is still hope they can achieeeeeeve it then?

I saw ryzen 7000 users got 2-10tokens per second with some optimized llama 3.1 8b model
https://www.reddit.com/r/LocalLLaMA/comments/1d9m0z3/running_llama_3_on_the_npu_of_a_firstgeneration/

and with models lower than 3b the quality of the output is really bad.

<!-- gh-comment-id:2801700363 --> @Bush-cat commented on GitHub (Apr 14, 2025): > > > I have AMD cpu with NPU. I have ubuntu kernel with 6.14 linux. I have no way to run any llm backend with them. So thanks your hell for ai, AMD. Oh maybe I should say the word 'help', but I don't think they deserve. > > > > > > Welp it always was more marketing for windows users and features I guess, the first ads for the npus were for applicaitons like ms teams... > > also don't expect much from the 15 tops of your ryzen 8000 (or 10 tops of my ryzen 7000) npu, copilot+ pcs require at least a 50tops npu to do anything. I saw a person benchmarking the ryzen 8000 npu and it took several minutes to finish an output with the tiniest llama model > > The tiniest llama model we've seen is 0.5B. I can't quite believe it would use minutes to handle requests with 0.5B but... Well, let's wait for AMD engineers to make it useable in another 2 years. maybe there is still hope they can achieeeeeeve it then? I saw ryzen 7000 users got 2-10tokens per second with some optimized llama 3.1 8b model https://www.reddit.com/r/LocalLLaMA/comments/1d9m0z3/running_llama_3_on_the_npu_of_a_firstgeneration/ and with models lower than 3b the quality of the output is really bad.
Author
Owner

@Bush-cat commented on GitHub (Apr 14, 2025):

Please guys try it out instead of speculating: #6282 Not sure if it correctly uses the NPU, but it's working!

I saw a person benchmarking the ryzen 8000 npu and it took several minutes to finish an output with the tiniest llama model

Probably fake news, I have a 8845hs and the few models I've tried (8B to 15B) run pretty well (but of course it depends what you compare it to). Not "several minutes" for "tiniest llama model" for sure though

you probably used the igpu and not the NPU, I was only talking about the npu which is much slower than using the full igpu

<!-- gh-comment-id:2801706129 --> @Bush-cat commented on GitHub (Apr 14, 2025): > Please guys try it out instead of speculating: [#6282](https://github.com/ollama/ollama/pull/6282) Not sure if it correctly uses the NPU, but it's working! > > > I saw a person benchmarking the ryzen 8000 npu and it took several minutes to finish an output with the tiniest llama model > > Probably fake news, I have a 8845hs and the few models I've tried (8B to 15B) run pretty well (but of course it depends what you compare it to). Not "several minutes" for "tiniest llama model" for sure though you probably used the igpu and not the NPU, I was only talking about the npu which is much slower than using the full igpu
Author
Owner

@DocMAX commented on GitHub (Apr 14, 2025):

Anyone can benchmark on a AMD HX 375 for me please? I really wonder how fast it is. I expect around 20 tok/s with llama 3.1 8b with all the hype around the AMD AI processors.

<!-- gh-comment-id:2801707526 --> @DocMAX commented on GitHub (Apr 14, 2025): Anyone can benchmark on a AMD HX 375 for me please? I really wonder how fast it is. I expect around 20 tok/s with llama 3.1 8b with all the hype around the AMD AI processors.
Author
Owner

@androidacy-user commented on GitHub (Jun 17, 2025):

People in this issue thread seem to be getting the GPU and NPU mixed up. Ollama can be forced to run on the iGPU, but seems to completely lack support for the (much more efficient) NPU on these chipsets.

<!-- gh-comment-id:2981643453 --> @androidacy-user commented on GitHub (Jun 17, 2025): People in this issue thread seem to be getting the **GPU** and **NPU** mixed up. Ollama can be forced to run on the iGPU, but seems to completely lack support for the (much more efficient) NPU on these chipsets.
Author
Owner

@reneleonhardt commented on GitHub (Jun 19, 2025):

Anyone can benchmark on a AMD HX 375 for me please? I really wonder how fast it is. I expect around 20 tok/s with llama 3.1 8b with all the hype around the AMD AI processors.

The NPU seems comparable with 50 TOPS, but a lot of unified RAM always helps of course 😅

https://www.techpowerup.com/334223/amds-ryzen-ai-max-395-delivers-up-to-12x-ai-llm-performance-compared-to-intels-lunar-lake
https://en.wikipedia.org/wiki/List_of_AMD_Ryzen_processors#Ryzen_AI_300_series

It looks like NPU support in Ollama would be amazing to run LLMs even on notebooks ❤

<!-- gh-comment-id:2988945804 --> @reneleonhardt commented on GitHub (Jun 19, 2025): > Anyone can benchmark on a AMD HX 375 for me please? I really wonder how fast it is. I expect around 20 tok/s with llama 3.1 8b with all the hype around the AMD AI processors. The NPU seems comparable with 50 TOPS, but a lot of unified RAM always helps of course 😅 https://www.techpowerup.com/334223/amds-ryzen-ai-max-395-delivers-up-to-12x-ai-llm-performance-compared-to-intels-lunar-lake https://en.wikipedia.org/wiki/List_of_AMD_Ryzen_processors#Ryzen_AI_300_series It looks like NPU support in Ollama would be amazing to run LLMs even on notebooks ❤
Author
Owner

@regulad commented on GitHub (Jun 22, 2025):

I have a notebook with the Ryzen AI Max+ 395 at my disposal. I was able to get iGPU inference to work in rootless podman with the following command, but still no NPU inference in sight.

podman run --rm \
  --name ollama \
  --user root \
  --pull=newer \
  --device /dev/kfd \
  --device /dev/dri \
  --group-add keep-groups \
  --privileged \
  -e HSA_OVERRIDE_GFX_VERSION=11.5.1 \
  -e HCC_AMDGPU_TARGET=gfx1151 \
  -v $HOME/.ollama:/root/.ollama:Z \
  -p 11434:11434 \
  docker.io/ollama/ollama:rocm

@DocMAX If you're interested in my speed, here is a prompt from 27B parameter Gemma 3

Image

<!-- gh-comment-id:2994386318 --> @regulad commented on GitHub (Jun 22, 2025): I have a notebook with the Ryzen AI Max+ 395 at my disposal. I was able to get iGPU inference to work in rootless podman with the following command, but still no NPU inference in sight. ```bash podman run --rm \ --name ollama \ --user root \ --pull=newer \ --device /dev/kfd \ --device /dev/dri \ --group-add keep-groups \ --privileged \ -e HSA_OVERRIDE_GFX_VERSION=11.5.1 \ -e HCC_AMDGPU_TARGET=gfx1151 \ -v $HOME/.ollama:/root/.ollama:Z \ -p 11434:11434 \ docker.io/ollama/ollama:rocm ``` @DocMAX If you're interested in my speed, here is a prompt from 27B parameter Gemma 3 ![Image](https://github.com/user-attachments/assets/03df4822-edba-4790-ad59-f72b258c6fee)
Author
Owner

@padthaitofuhot commented on GitHub (Jul 27, 2025):

This please.

I have AMD Ryzen AI 7 PRO 360 w/ Radeon 880M in this Thinkpad. It's not a very powerful NPU, but it would be super keen to get a tiny model on it for quick enhanced local autocomplete or embedding vectors for RAG.

kernel: amdxdna 0000:c4:00.1: enabling device (0000 -> 0002)
kernel: [drm] Initialized amdxdna_accel_driver 0.0.0 for 0000:c4:00.1 on minor 0
<!-- gh-comment-id:3123834283 --> @padthaitofuhot commented on GitHub (Jul 27, 2025): This please. I have AMD Ryzen AI 7 PRO 360 w/ Radeon 880M in this Thinkpad. It's not a very powerful NPU, but it would be super keen to get a tiny model on it for quick enhanced local autocomplete or embedding vectors for RAG. ``` kernel: amdxdna 0000:c4:00.1: enabling device (0000 -> 0002) kernel: [drm] Initialized amdxdna_accel_driver 0.0.0 for 0000:c4:00.1 on minor 0 ```
Author
Owner

@androidacy-user commented on GitHub (Jul 27, 2025):

This please.

I have AMD Ryzen AI 7 PRO 360 w/ Radeon 880M in this Thinkpad. It's not a very powerful NPU, but it would be super keen to get a tiny model on it for quick enhanced local autocomplete or embedding vectors for RAG.

kernel: amdxdna 0000:c4:00.1: enabling device (0000 -> 0002)
kernel: [drm] Initialized amdxdna_accel_driver 0.0.0 for 0000:c4:00.1 on minor 0

50 TOPs is plenty for a smaller or quantized model, or larger if you're willing to deal with slower inference times

<!-- gh-comment-id:3123850228 --> @androidacy-user commented on GitHub (Jul 27, 2025): > This please. > > I have AMD Ryzen AI 7 PRO 360 w/ Radeon 880M in this Thinkpad. It's not a very powerful NPU, but it would be super keen to get a tiny model on it for quick enhanced local autocomplete or embedding vectors for RAG. > ``` > kernel: amdxdna 0000:c4:00.1: enabling device (0000 -> 0002) > kernel: [drm] Initialized amdxdna_accel_driver 0.0.0 for 0000:c4:00.1 on minor 0 > ``` 50 TOPs is plenty for a smaller or quantized model, or larger if you're willing to deal with slower inference times
Author
Owner

@muety commented on GitHub (Jul 27, 2025):

I have AMD Ryzen AI 7 PRO 360 w/ Radeon 880M

Would love to see how it performs on reasonably large models (like ~21B or so)!

<!-- gh-comment-id:3124173733 --> @muety commented on GitHub (Jul 27, 2025): > I have AMD Ryzen AI 7 PRO 360 w/ Radeon 880M Would love to see how it performs on reasonably large models (like ~21B or so)!
Author
Owner

@jcubic commented on GitHub (Jul 27, 2025):

AMD NPU is supported by the mainline Linux kernel from 6.14 released on March 2025.

I wanted to buy a laptop with this NPU, and it would be great to be able to use bigger models with Ollama.

<!-- gh-comment-id:3124271435 --> @jcubic commented on GitHub (Jul 27, 2025): AMD NPU is supported by the mainline Linux kernel from [6.14 released on March 2025](https://kernelnewbies.org/Linux_6.14). * [AMD Previews Mysterious Linux Runtime Stack For Ryzen AI NPUs](https://www.phoronix.com/news/AMD-Linux-RT-Preview-Ryzen-AI) I wanted to buy a laptop with this NPU, and it would be great to be able to use bigger models with Ollama.
Author
Owner

@gururise commented on GitHub (Sep 1, 2025):

AMD NPU is supported by the mainline Linux kernel from 6.14 released on March 2025.

* [AMD Previews Mysterious Linux Runtime Stack For Ryzen AI NPUs](https://www.phoronix.com/news/AMD-Linux-RT-Preview-Ryzen-AI)

I wanted to buy a laptop with this NPU, and it would be great to be able to use bigger models with Ollama.

NPU support can speed things up significantly. There are two other projects that support inference with AMD NPU's and show significant perf improvements over iGPU or CPU only:

  1. AMD GAIA - supports hybrid NPU + iGPU or NPU only modes
  2. FastFlowLLM - supports NPU
<!-- gh-comment-id:3242594736 --> @gururise commented on GitHub (Sep 1, 2025): > AMD NPU is supported by the mainline Linux kernel from [6.14 released on March 2025](https://kernelnewbies.org/Linux_6.14). > > * [AMD Previews Mysterious Linux Runtime Stack For Ryzen AI NPUs](https://www.phoronix.com/news/AMD-Linux-RT-Preview-Ryzen-AI) > > > I wanted to buy a laptop with this NPU, and it would be great to be able to use bigger models with Ollama. NPU support can speed things up significantly. There are two other projects that support inference with AMD NPU's and show significant perf improvements over iGPU or CPU only: 1. [AMD GAIA](https://github.com/amd/gaia) - supports hybrid NPU + iGPU or NPU only modes 2. [FastFlowLLM](https://github.com/FastFlowLM/FastFlowLM) - supports NPU
Author
Owner

@ha-pf-tickerer commented on GitHub (Sep 1, 2025):

I wanted to buy a laptop with this NPU, and it would be great to be able to use bigger models with Ollama.

NPU support can speed things up significantly. There are two other projects that support inference with AMD NPU's and show significant perf improvements over iGPU or CPU only:

1. [AMD GAIA](https://github.com/amd/gaia) - supports hybrid NPU + iGPU or NPU only modes

2. [FastFlowLLM](https://github.com/FastFlowLM/FastFlowLM) - supports NPU

Gaia or fastflowlm are great project that support the AMD Ryzen AI processors but support for olloma would be really, really great.

Our use case is a dedicated local AI mini-pc , to be used by the kids as a better google/alexa search
AND probably most off the time to give Home Assistant a local conversation agent using the
https://www.home-assistant.io/integrations/ollama/

This would allow a " it's too hot in here !" voice prompt to Home Assistant,
letting the Home Assistant correctly understand that the user "bob" sitting in the living room is not happy
with the temperature and that the Home Assistant server should lower the temperature in the room using
the AC or lower the thermostats based on the controls that Home Assistant already has.

I promised this functionality to my SO in order to hang the house full with zigbee sensors and have seriously expensive AMD Ryzen AI mini boxes in the house -)

<!-- gh-comment-id:3242781927 --> @ha-pf-tickerer commented on GitHub (Sep 1, 2025): > > > > I wanted to buy a laptop with this NPU, and it would be great to be able to use bigger models with Ollama. > > NPU support can speed things up significantly. There are two other projects that support inference with AMD NPU's and show significant perf improvements over iGPU or CPU only: > > 1. [AMD GAIA](https://github.com/amd/gaia) - supports hybrid NPU + iGPU or NPU only modes > > 2. [FastFlowLLM](https://github.com/FastFlowLM/FastFlowLM) - supports NPU Gaia or fastflowlm are great project that support the AMD Ryzen AI processors but support for olloma would be really, really great. Our use case is a dedicated local AI mini-pc , to be used by the kids as a better google/alexa search AND probably most off the time to give Home Assistant a local conversation agent using the https://www.home-assistant.io/integrations/ollama/ This would allow a " it's too hot in here !" voice prompt to Home Assistant, letting the Home Assistant correctly understand that the user "bob" sitting in the living room is not happy with the temperature and that the Home Assistant server should lower the temperature in the room using the AC or lower the thermostats based on the controls that Home Assistant already has. I promised this functionality to my SO in order to hang the house full with zigbee sensors and have seriously expensive AMD Ryzen AI mini boxes in the house -)
Author
Owner

@z0xca commented on GitHub (Dec 23, 2025):

Running a AMD 8845HS here too, Same as above. Hoping to see npu support.

<!-- gh-comment-id:3687593422 --> @z0xca commented on GitHub (Dec 23, 2025): Running a AMD 8845HS here too, Same as above. Hoping to see npu support.
Author
Owner

@alerque commented on GitHub (Dec 23, 2025):

How is this affected by the merge of #13196?

<!-- gh-comment-id:3688083999 --> @alerque commented on GitHub (Dec 23, 2025): How is this affected by the merge of #13196?
Author
Owner

@Pekkari commented on GitHub (Dec 24, 2025):

How is this affected by the merge of #13196?

not affected at all. The merge is about the iGPU support, not the NPU, and for what I know, the NPU in 8845HS is not worth supporting, since the extra capacity it will provide for hybrid setup(GPU + NPU) is not really a deal maker.

<!-- gh-comment-id:3689297179 --> @Pekkari commented on GitHub (Dec 24, 2025): > How is this affected by the merge of [#13196](https://github.com/ollama/ollama/pull/13196)? not affected at all. The merge is about the iGPU support, not the NPU, and for what I know, the NPU in 8845HS is not worth supporting, since the extra capacity it will provide for hybrid setup(GPU + NPU) is not really a deal maker.
Author
Owner

@bonswouar commented on GitHub (Dec 24, 2025):

the NPU in 8845HS is not worth supporting, since the extra capacity it will provide for hybrid setup(GPU + NPU) is not really a deal maker.

Isn't the NPU supposed to be more energy efficient than the GPU though?
The 8845HS being a laptop cpu I'd say it could be a huge deal maker, if it helps to run models on battery.

But if it's not more energy efficient, and not noticeably improving performances with hybrid setup, then I really don't see the point yeah

<!-- gh-comment-id:3689425952 --> @bonswouar commented on GitHub (Dec 24, 2025): > the NPU in 8845HS is not worth supporting, since the extra capacity it will provide for hybrid setup(GPU + NPU) is not really a deal maker. Isn't the NPU supposed to be more energy efficient than the GPU though? The 8845HS being a laptop cpu I'd say it could be a _huge_ deal maker, _if_ it helps to run models on battery. _But if it's not more energy efficient, and not noticeably improving performances with hybrid setup, then I really don't see the point yeah_
Author
Owner

@Pekkari commented on GitHub (Dec 24, 2025):

Isn't the NPU supposed to be more energy efficient than the GPU though? The 8845HS being a laptop cpu I'd say it could be a huge deal maker, if it helps to run models on battery.

But if it's not more energy efficient, and not noticeably improving performances with hybrid setup, then I really don't see the point yeah

Don't kill the messenger, I'm just voicing what I heard from AMD, I'd love to see the support coming anyways, since, I bought the hardware for the NPU, and suddenly got to the same situation :|

<!-- gh-comment-id:3689460321 --> @Pekkari commented on GitHub (Dec 24, 2025): > Isn't the NPU supposed to be more energy efficient than the GPU though? The 8845HS being a laptop cpu I'd say it could be a _huge_ deal maker, _if_ it helps to run models on battery. > > _But if it's not more energy efficient, and not noticeably improving performances with hybrid setup, then I really don't see the point yeah_ Don't kill the messenger, I'm just voicing what I heard from AMD, I'd love to see the support coming anyways, since, I bought the hardware *for the NPU*, and suddenly got to the same situation :|
Author
Owner

@alerque commented on GitHub (Dec 24, 2025):

Fair enough. I'm still figuring out what is what here.

Partly out of personal curiosity and partly because I'm an Arch Linux packager looking over the ROCM related packages wondering if there is anything we are missing out on that I could help fix... my personal hardware is an integrated AMD Ryzen AI 9 HX 370 w/ Radeon 890M which I assume does have an NPU and would benefit from this requested support correct? And also a AMD Ryzen 5 3600 6-Core processor with a discrete Radeon RX 5500 graphics card for which I assume there is no NPU correct?

Is there somewhere that has commands to actually ferret out or a good table somewhere showing where AMD has NPUs at all and by what they are/are not supported?

<!-- gh-comment-id:3689480780 --> @alerque commented on GitHub (Dec 24, 2025): Fair enough. I'm still figuring out what is what here. Partly out of personal curiosity and partly because I'm an Arch Linux packager looking over the ROCM related packages wondering if there is anything we are missing out on that I could help fix... my personal hardware is an integrated `AMD Ryzen AI 9 HX 370 w/ Radeon 890M` which I assume does have an NPU and would benefit from this requested support correct? And also a `AMD Ryzen 5 3600 6-Core` processor with a discrete `Radeon RX 5500` graphics card for which I assume there is no NPU correct? Is there somewhere that has commands to actually ferret out or a good table somewhere showing where AMD has NPUs at all and by what they are/are not supported?
Author
Owner

@Pekkari commented on GitHub (Dec 24, 2025):

Fair enough. I'm still figuring out what is what here.

Partly out of personal curiosity and partly because I'm an Arch Linux packager looking over the ROCM related packages wondering if there is anything we are missing out on that I could help fix... my personal hardware is an integrated AMD Ryzen AI 9 HX 370 w/ Radeon 890M which I assume does have an NPU and would benefit from this requested support correct? And also a AMD Ryzen 5 3600 6-Core Processor with a discrete Radeon RX 5500 graphics card for while I assume there is no NPU correct? Is there somewhere that has commands to actually ferret out or a good table somewhere showing where AMD has NPUs at all and by what they are/are not supported?

Fail to remember, but I read something about Strix Point support coming, which I think it is your hardware, however, I think it was GPU support in ROCM, so chances are that you may be still in the safe zone. 8845HS is prior to the Strix Point, and after is the Strix Halo that is the first intended to be supported, but community push made the support for Strix point also in ROCM to happen.

<!-- gh-comment-id:3689500907 --> @Pekkari commented on GitHub (Dec 24, 2025): > Fair enough. I'm still figuring out what is what here. > > Partly out of personal curiosity and partly because I'm an Arch Linux packager looking over the ROCM related packages wondering if there is anything we are missing out on that I could help fix... my personal hardware is an integrated `AMD Ryzen AI 9 HX 370 w/ Radeon 890M` which I assume does have an NPU and would benefit from this requested support correct? And also a `AMD Ryzen 5 3600 6-Core Processor` with a discrete `Radeon RX 5500` graphics card for while I assume there is no NPU correct? Is there somewhere that has commands to actually ferret out or a good table somewhere showing where AMD has NPUs at all and by what they are/are not supported? Fail to remember, but I read something about Strix Point support coming, which I think it is your hardware, however, I think it was GPU support in ROCM, so chances are that you may be still in the safe zone. 8845HS is prior to the Strix Point, and after is the Strix Halo that is the first intended to be supported, but community push made the support for Strix point also in ROCM to happen.
Author
Owner

@z0xca commented on GitHub (Dec 24, 2025):

Fair enough. I'm still figuring out what is what here.

Partly out of personal curiosity and partly because I'm an Arch Linux packager looking over the ROCM related packages wondering if there is anything we are missing out on that I could help fix... my personal hardware is an integrated AMD Ryzen AI 9 HX 370 w/ Radeon 890M which I assume does have an NPU and would benefit from this requested support correct? And also a AMD Ryzen 5 3600 6-Core processor with a discrete Radeon RX 5500 graphics card for which I assume there is no NPU correct?

Is there somewhere that has commands to actually ferret out or a good table somewhere showing where AMD has NPUs at all and by what they are/are not supported?

The Wikipedia List of AMD Ryzen processors shows which CPU's have an NPU or not

<!-- gh-comment-id:3690482813 --> @z0xca commented on GitHub (Dec 24, 2025): > Fair enough. I'm still figuring out what is what here. > > Partly out of personal curiosity and partly because I'm an Arch Linux packager looking over the ROCM related packages wondering if there is anything we are missing out on that I could help fix... my personal hardware is an integrated `AMD Ryzen AI 9 HX 370 w/ Radeon 890M` which I assume does have an NPU and would benefit from this requested support correct? And also a `AMD Ryzen 5 3600 6-Core` processor with a discrete `Radeon RX 5500` graphics card for which I assume there is no NPU correct? > > Is there somewhere that has commands to actually ferret out or a good table somewhere showing where AMD has NPUs at all and by what they are/are not supported? The Wikipedia [List of AMD Ryzen processors](https://en.wikipedia.org/wiki/List_of_AMD_Ryzen_processors) shows which CPU's have an NPU or not
Author
Owner

@GreyXor commented on GitHub (Feb 25, 2026):

If anyone interested, I wrote a little guide with FastFlowLM to use the NPU https://community.frame.work/t/guide-use-npu-xdna2-with-arch-linux-and-fastflowlm/80879

<!-- gh-comment-id:3958642656 --> @GreyXor commented on GitHub (Feb 25, 2026): If anyone interested, I wrote a little guide with FastFlowLM to use the NPU https://community.frame.work/t/guide-use-npu-xdna2-with-arch-linux-and-fastflowlm/80879
Author
Owner

@poplk commented on GitHub (Mar 22, 2026):

Hi, I am thinking to buy an AMD Ryzen ai 9 max+ 395. Does anybody own one I have some questions ?

<!-- gh-comment-id:4105248044 --> @poplk commented on GitHub (Mar 22, 2026): Hi, I am thinking to buy an AMD Ryzen ai 9 max+ 395. Does anybody own one I have some questions ?
Author
Owner

@alerque commented on GitHub (Mar 22, 2026):

@poplk This is an issue report on a piece of software and it is followed by people who want to be notified about updates to the software issue. This is not an open-topic forum or hardware buyers guide. Please don't spam the issue tracker.

<!-- gh-comment-id:4105904335 --> @alerque commented on GitHub (Mar 22, 2026): @poplk This is an issue report on a piece of software and it is followed by people who want to be notified about updates to the software issue. This is not an open-topic forum or hardware buyers guide. Please don't spam the issue tracker.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#3262