[GH-ISSUE #5186] AMD Ryzen NPU support #3262

New Issue

GiteaMirror · 2026-04-12T13:47:51-05:00

GiteaMirror commented

2026-04-12 13:47:51 -05:00

Originally created by @ivanbrash on GitHub (Jun 20, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5186

Originally assigned to: @dhiltgen on GitHub.

Hello! I'm want to buy Lenovo Xiaoxin 14 AI laptop on AMD Ryzen 7 8845H on my birthday and I will install Artix Linux to this. Do you will to add AMD Ryzen NPU support to Ollama on Linux and Windows? If anything, AMD Ryzen NPU driver for Linux is already available on Github:
https://github.com/amd/xdna-driver.git
Sorry for my bad English, please!

Originally created by @ivanbrash on GitHub (Jun 20, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5186 Originally assigned to: @dhiltgen on GitHub. Hello! I'm want to buy Lenovo Xiaoxin 14 AI laptop on AMD Ryzen 7 8845H on my birthday and I will install Artix Linux to this. Do you will to add AMD Ryzen NPU support to Ollama on Linux and Windows? If anything, AMD Ryzen NPU driver for Linux is already available on Github: https://github.com/amd/xdna-driver.git Sorry for my bad English, please!

GiteaMirror added the feature request amd labels 2026-04-12 13:47:51 -05:00

GiteaMirror commented

2026-04-12 13:47:52 -05:00

@billtown commented on GitHub (Jun 20, 2024):

I have an AMD Ryzen 7 7840U w/ Radeon 780M Graphics and recently got inference working in the igpu.
On linux for me, the rocm support works. I have to use the OVERRIDE_GFX_VERSION. people seem to have various luck messing with the version. Not sure if this helps at all.

podman run -d --name ollama --replace --pull=always --restart=always -p 0.0.0.0:11434:11434 -v ollama:/root/.ollama --stop-signal=SIGKILL --device /dev/dri --device /dev/kfd -e HSA_OVERRIDE_GFX_VERSION=11.0.2 -e HSA_ENABLE_SDMA=0 docker.io/ollama/ollama:rocm

@billtown commented on GitHub (Jun 20, 2024): I have an AMD Ryzen 7 7840U w/ Radeon 780M Graphics and recently got inference working in the igpu. On linux for me, the rocm support works. I have to use the OVERRIDE_GFX_VERSION. people seem to have various luck messing with the version. Not sure if this helps at all. `podman run -d --name ollama --replace --pull=always --restart=always -p 0.0.0.0:11434:11434 -v ollama:/root/.ollama --stop-signal=SIGKILL --device /dev/dri --device /dev/kfd -e HSA_OVERRIDE_GFX_VERSION=11.0.2 -e HSA_ENABLE_SDMA=0 docker.io/ollama/ollama:rocm`

GiteaMirror commented

2026-04-12 13:47:53 -05:00

@jasalt commented on GitHub (Jun 23, 2024):

There were some recent patches to llamafile and llama.cpp linked here also with ability to use more ram than what is dedicated to iGPU (HIP_UMA) https://github.com/ROCm/ROCm/discussions/2631#discussioncomment-9849190, looks promising.

@jasalt commented on GitHub (Jun 23, 2024): There were some recent patches to llamafile and llama.cpp linked here also with ability to use more ram than what is dedicated to iGPU (HIP_UMA) https://github.com/ROCm/ROCm/discussions/2631#discussioncomment-9849190, looks promising.

GiteaMirror commented

2026-04-12 13:47:55 -05:00

@coreybutler commented on GitHub (Aug 14, 2024):

Running a AMD Ryzen 9 8945HS here. Would love to see support for this.

@coreybutler commented on GitHub (Aug 14, 2024): Running a AMD Ryzen 9 8945HS here. Would love to see support for this.

GiteaMirror commented

2026-04-12 13:47:57 -05:00

@2018wzh commented on GitHub (Aug 16, 2024):

Running a AMD AI 9 370HX here, Same as above. Hoping to see support

@2018wzh commented on GitHub (Aug 16, 2024): Running a AMD AI 9 370HX here, Same as above. Hoping to see support

GiteaMirror commented

2026-04-12 13:47:59 -05:00

@grigio commented on GitHub (Aug 23, 2024):

Here are some news.. but Linux support seems lacking..
https://community.amd.com/t5/ai/get-a-powerful-ai-assistant-with-document-chat-accelerated-by/ba-p/704092
https://lmstudio.ai/ryzenai

@grigio commented on GitHub (Aug 23, 2024): Here are some news.. but Linux support seems lacking.. https://community.amd.com/t5/ai/get-a-powerful-ai-assistant-with-document-chat-accelerated-by/ba-p/704092 https://lmstudio.ai/ryzenai

GiteaMirror commented

2026-04-12 13:48:00 -05:00

@henry2man commented on GitHub (Sep 5, 2024):

I have an AMD Ryzen 7 7840U w/ Radeon 780M Graphics and recently got inference working in the igpu.

@billtown What's the performance of your setup? I've recently purchased a Ryzen 9 8945HS + 64Gb RAM MiniPC for some Docker + VM and (hopefully) some lightweight LLM workloads with Ollama.

PS: I'm not and expert on Ollama intrinsics but I have enough experience to help with testing with my own Hardware in order to make this request reality.

@henry2man commented on GitHub (Sep 5, 2024): > I have an AMD Ryzen 7 7840U w/ Radeon 780M Graphics and recently got inference working in the igpu. @billtown What's the performance of your setup? I've recently purchased a Ryzen 9 8945HS + 64Gb RAM MiniPC for some Docker + VM and (hopefully) some lightweight LLM workloads with Ollama. PS: I'm not and expert on Ollama intrinsics but I have enough experience to help with testing with my own Hardware in order to make this request reality.

GiteaMirror commented

2026-04-12 13:48:00 -05:00

@grigio commented on GitHub (Sep 5, 2024):

Running a AMD AI 9 370HX here, Same as above. Hoping to see support

Can you share how many token/s you get with llama3.1-Q4_k_m or similar ?

@grigio commented on GitHub (Sep 5, 2024): > Running a AMD AI 9 370HX here, Same as above. Hoping to see support Can you share how many token/s you get with llama3.1-Q4_k_m or similar ?

GiteaMirror commented

2026-04-12 13:48:01 -05:00

@fan123450 commented on GitHub (Sep 11, 2024):

Running a AMD 8845HS here, Same as above. Hoping to see support both gpu and npu.

@fan123450 commented on GitHub (Sep 11, 2024): Running a AMD 8845HS here, Same as above. Hoping to see support both gpu and npu.

GiteaMirror commented

2026-04-12 13:48:02 -05:00

@billtown commented on GitHub (Sep 11, 2024):

I have an AMD Ryzen 7 7840U w/ Radeon 780M Graphics and recently got inference working in the igpu.

@billtown What's the performance of your setup? I've recently purchased a Ryzen 9 8945HS + 64Gb RAM MiniPC for some Docker + VM and (hopefully) some lightweight LLM workloads with Ollama.

PS: I'm not and expert on Ollama intrinsics but I have enough experience to help with testing with my own Hardware in order to make this request reality.

total duration: 22.204829879s
load duration: 16.99589ms
prompt eval count: 1411 token(s)
prompt eval duration: 625.952ms
prompt eval rate: 2254.17 tokens/s
eval count: 269 token(s)
eval duration: 20.76486s
eval rate: 12.95 tokens/s < after building some context.

llama3:8b 365c0bd3c000 6.7 GB 100% GPU

radeontop at least shows vram and shaders and pipes hitting 100% when running. I have 16gb allocated in the bios

0.80G / 0.80G Memory Clock 100.00%
2.13G / 2.70G Shader Clock 78.81%
Graphics pipe 99.17%
Shader Interpolator 92.50%
Clip Rectangle 100.00%
these are what come alive in radeontop. And then a single thread on cpu hit's 100% (ollama).

@billtown commented on GitHub (Sep 11, 2024): > > I have an AMD Ryzen 7 7840U w/ Radeon 780M Graphics and recently got inference working in the igpu. > > @billtown What's the performance of your setup? I've recently purchased a Ryzen 9 8945HS + 64Gb RAM MiniPC for some Docker + VM and (hopefully) some lightweight LLM workloads with Ollama. > > PS: I'm not and expert on Ollama intrinsics but I have enough experience to help with testing with my own Hardware in order to make this request reality. total duration: 22.204829879s load duration: 16.99589ms prompt eval count: 1411 token(s) prompt eval duration: 625.952ms prompt eval rate: 2254.17 tokens/s eval count: 269 token(s) eval duration: 20.76486s eval rate: 12.95 tokens/s < after building some context. llama3:8b 365c0bd3c000 6.7 GB 100% GPU radeontop at least shows vram and shaders and pipes hitting 100% when running. I have 16gb allocated in the bios 0.80G / 0.80G Memory Clock 100.00% 2.13G / 2.70G Shader Clock 78.81% Graphics pipe 99.17% Shader Interpolator 92.50% Clip Rectangle 100.00% these are what come alive in radeontop. And then a single thread on cpu hit's 100% (ollama).

GiteaMirror commented

2026-04-12 13:48:02 -05:00

@fan123450 commented on GitHub (Sep 12, 2024):

I have an AMD Ryzen 7 7840U w/ Radeon 780M Graphics and recently got inference working in the igpu.

@billtown What's the performance of your setup? I've recently purchased a Ryzen 9 8945HS + 64Gb RAM MiniPC for some Docker + VM and (hopefully) some lightweight LLM workloads with Ollama.
PS: I'm not and expert on Ollama intrinsics but I have enough experience to help with testing with my own Hardware in order to make this request reality.

total duration: 22.204829879s load duration: 16.99589ms prompt eval count: 1411 token(s) prompt eval duration: 625.952ms prompt eval rate: 2254.17 tokens/s eval count: 269 token(s) eval duration: 20.76486s eval rate: 12.95 tokens/s < after building some context.

llama3:8b 365c0bd3c000 6.7 GB 100% GPU

radeontop at least shows vram and shaders and pipes hitting 100% when running. I have 16gb allocated in the bios

0.80G / 0.80G Memory Clock 100.00% 2.13G / 2.70G Shader Clock 78.81% Graphics pipe 99.17% Shader Interpolator 92.50% Clip Rectangle 100.00% these are what come alive in radeontop. And then a single thread on cpu hit's 100% (ollama).

Great!Is there a detailed implementation steps reference? If available,I will be very grateful!

@fan123450 commented on GitHub (Sep 12, 2024): > > > I have an AMD Ryzen 7 7840U w/ Radeon 780M Graphics and recently got inference working in the igpu. > > > > > > @billtown What's the performance of your setup? I've recently purchased a Ryzen 9 8945HS + 64Gb RAM MiniPC for some Docker + VM and (hopefully) some lightweight LLM workloads with Ollama. > > PS: I'm not and expert on Ollama intrinsics but I have enough experience to help with testing with my own Hardware in order to make this request reality. > > total duration: 22.204829879s load duration: 16.99589ms prompt eval count: 1411 token(s) prompt eval duration: 625.952ms prompt eval rate: 2254.17 tokens/s eval count: 269 token(s) eval duration: 20.76486s eval rate: 12.95 tokens/s < after building some context. > > llama3:8b 365c0bd3c000 6.7 GB 100% GPU > > radeontop at least shows vram and shaders and pipes hitting 100% when running. I have 16gb allocated in the bios > > 0.80G / 0.80G Memory Clock 100.00% 2.13G / 2.70G Shader Clock 78.81% Graphics pipe 99.17% Shader Interpolator 92.50% Clip Rectangle 100.00% these are what come alive in radeontop. And then a single thread on cpu hit's 100% (ollama). Great!Is there a detailed implementation steps reference? If available,I will be very grateful!

GiteaMirror commented

2026-04-12 13:48:03 -05:00

@evansrrr commented on GitHub (Sep 24, 2024):

Hello! I'm want to buy Lenovo Xiaoxin 14 AI laptop on AMD Ryzen 7 8845H on my birthday and I will install Artix Linux to this. Do you will to add AMD Ryzen NPU support to Ollama on Linux and Windows? If anything, AMD Ryzen NPU driver for Linux is already available on Github: https://github.com/amd/xdna-driver.git Sorry for my bad English, please!

Running Lenovo Xiaoxin pro 16, R7-8845H as the processor, same as above. Hope to enable AMD NPU soon!

@evansrrr commented on GitHub (Sep 24, 2024): > Hello! I'm want to buy Lenovo Xiaoxin 14 AI laptop on AMD Ryzen 7 8845H on my birthday and I will install Artix Linux to this. Do you will to add AMD Ryzen NPU support to Ollama on Linux and Windows? If anything, AMD Ryzen NPU driver for Linux is already available on Github: https://github.com/amd/xdna-driver.git Sorry for my bad English, please! Running Lenovo Xiaoxin pro 16, R7-8845H as the processor, same as above. Hope to enable AMD NPU soon!

GiteaMirror commented

2026-04-12 13:48:03 -05:00

@robfuscator commented on GitHub (Oct 22, 2024):

We'll have to wait at least until february before this is even possible on linux using a mainline kernel:

https://www.phoronix.com/news/AMD-XDNA-Linux-Driver-v4

@robfuscator commented on GitHub (Oct 22, 2024): We'll have to wait at least until february before this is even possible on linux using a mainline kernel: https://www.phoronix.com/news/AMD-XDNA-Linux-Driver-v4

GiteaMirror commented

2026-04-12 13:48:04 -05:00

@ivanbrash commented on GitHub (Nov 29, 2024):

I bought a Honor Magicbook X14 Pro on Ryzen 7 7840HS and installed a Gentoo with KDE on it. So far, I have not tried to install Ollama on it, since there is no NPU support on it. But when it appears, I will definitely install it.

@ivanbrash commented on GitHub (Nov 29, 2024): I bought a Honor Magicbook X14 Pro on Ryzen 7 7840HS and installed a Gentoo with KDE on it. So far, I have not tried to install Ollama on it, since there is no NPU support on it. But when it appears, I will definitely install it.

GiteaMirror commented

2026-04-12 13:48:04 -05:00

@ToeiRei commented on GitHub (Nov 29, 2024):

I bought a Honor Magicbook X14 Pro on Ryzen 7 7840HS and installed a Gentoo with KDE on it. So far, I have not tried to install Ollama on it, since there is no NPU support on it. But when it appears, I will definitely install it.

I did play around with AI accelerators a bit and my Frame.Work has the same CPU (as you mention about your magicbook). The TOPS value was disappointing - to put it mildly. Don't get your hopes up; 25 TOPS max with different applications. It's a blast for image recognition, OCR,... but falls flat on LLM tasks.

@ToeiRei commented on GitHub (Nov 29, 2024): > I bought a Honor Magicbook X14 Pro on Ryzen 7 7840HS and installed a Gentoo with KDE on it. So far, I have not tried to install Ollama on it, since there is no NPU support on it. But when it appears, I will definitely install it. I did play around with AI accelerators a bit and my Frame.Work has the same CPU (as you mention about your magicbook). The TOPS value was disappointing - to put it mildly. Don't get your hopes up; 25 TOPS max with different applications. It's a blast for image recognition, OCR,... but falls flat on LLM tasks.

GiteaMirror commented

2026-04-12 13:48:04 -05:00

@JiapengLi commented on GitHub (Dec 20, 2024):

Here is my test result under:

AMD Ryzen AI 9 HX 370 w/ Radeon 890M
LPDDR5 16GB
Ubuntu 24.04
Kernel 6.8.0

The performance is not as good as expected,


dev@VM100:~$ ollama run llama3.2:latest 'Develop a python function that solves the following problem, sudoku game' --verbose

...

total duration:       27.298931858s
load duration:        2.052439105s
prompt eval count:    37 token(s)
prompt eval duration: 405ms
prompt eval rate:     91.36 tokens/s
eval count:           675 token(s)
eval duration:        24.839s
eval rate:            27.18 tokens/s

dev@VM100:~$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=24.04
DISTRIB_CODENAME=noble
DISTRIB_DESCRIPTION="Ubuntu 24.04.1 LTS"


dev@VM100:~$ uname -a
Linux VM100 6.8.0-50-generic #51-Ubuntu SMP PREEMPT_DYNAMIC Sat Nov  9 17:58:29 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux


dev@VM100:~$ lscpu
Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          48 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   24
  On-line CPU(s) list:    0-23
Vendor ID:                AuthenticAMD
  Model name:             AMD Ryzen AI 9 HX 370 w/ Radeon 890M
    CPU family:           26
    Model:                36
    Thread(s) per core:   1
    Core(s) per socket:   24
    Socket(s):            1
    Stepping:             0
    BogoMIPS:             3992.46
    Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic mo
                          vbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw perfctr_core ssbd ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms in
                          vpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx_vnni avx512_bf16 clzero xsaveerptr wbnoinvd arat npt lbrv nrip_save tsc_scale vmcb_clean flushbyasid pausefilt
                          er pfthreshold v_vmsave_vmload vgif vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid bus_lock_detect movdiri movdir64b fsrm avx512_vp2intersect flush_l1d arch_capabilities
Virtualization features:
  Virtualization:         AMD-V
  Hypervisor vendor:      KVM
  Virtualization type:    full
Caches (sum of all):
  L1d:                    1.5 MiB (24 instances)
  L1i:                    1.5 MiB (24 instances)
  L2:                     12 MiB (24 instances)
  L3:                     384 MiB (24 instances)
NUMA:
  NUMA node(s):           1
  NUMA node0 CPU(s):      0-23
Vulnerabilities:
  Gather data sampling:   Not affected
  Itlb multihit:          Not affected
  L1tf:                   Not affected
  Mds:                    Not affected
  Meltdown:               Not affected
  Mmio stale data:        Not affected
  Reg file data sampling: Not affected
  Retbleed:               Not affected
  Spec rstack overflow:   Not affected
  Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
  Srbds:                  Not affected
  Tsx async abort:        Not affected


dev@VM100:~$ ollama --version
ollama version is 0.5.4
dev@VM100:~$ ollama list
NAME               ID              SIZE      MODIFIED
llama3.1:8b        46e0c10c039e    4.9 GB    3 hours ago
llama3.2:latest    a80c4f17acd5    2.0 GB    3 hours ago

@JiapengLi commented on GitHub (Dec 20, 2024): Here is my test result under: - AMD Ryzen AI 9 HX 370 w/ Radeon 890M - LPDDR5 16GB - Ubuntu 24.04 - Kernel 6.8.0 The performance is not as good as expected, ``` dev@VM100:~$ ollama run llama3.2:latest 'Develop a python function that solves the following problem, sudoku game' --verbose ... total duration: 27.298931858s load duration: 2.052439105s prompt eval count: 37 token(s) prompt eval duration: 405ms prompt eval rate: 91.36 tokens/s eval count: 675 token(s) eval duration: 24.839s eval rate: 27.18 tokens/s ``` ``` dev@VM100:~$ cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=24.04 DISTRIB_CODENAME=noble DISTRIB_DESCRIPTION="Ubuntu 24.04.1 LTS" dev@VM100:~$ uname -a Linux VM100 6.8.0-50-generic #51-Ubuntu SMP PREEMPT_DYNAMIC Sat Nov 9 17:58:29 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux dev@VM100:~$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 48 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 24 On-line CPU(s) list: 0-23 Vendor ID: AuthenticAMD Model name: AMD Ryzen AI 9 HX 370 w/ Radeon 890M CPU family: 26 Model: 36 Thread(s) per core: 1 Core(s) per socket: 24 Socket(s): 1 Stepping: 0 BogoMIPS: 3992.46 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic mo vbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw perfctr_core ssbd ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms in vpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx_vnni avx512_bf16 clzero xsaveerptr wbnoinvd arat npt lbrv nrip_save tsc_scale vmcb_clean flushbyasid pausefilt er pfthreshold v_vmsave_vmload vgif vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid bus_lock_detect movdiri movdir64b fsrm avx512_vp2intersect flush_l1d arch_capabilities Virtualization features: Virtualization: AMD-V Hypervisor vendor: KVM Virtualization type: full Caches (sum of all): L1d: 1.5 MiB (24 instances) L1i: 1.5 MiB (24 instances) L2: 12 MiB (24 instances) L3: 384 MiB (24 instances) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-23 Vulnerabilities: Gather data sampling: Not affected Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Not affected Reg file data sampling: Not affected Retbleed: Not affected Spec rstack overflow: Not affected Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected Srbds: Not affected Tsx async abort: Not affected dev@VM100:~$ ollama --version ollama version is 0.5.4 dev@VM100:~$ ollama list NAME ID SIZE MODIFIED llama3.1:8b 46e0c10c039e 4.9 GB 3 hours ago llama3.2:latest a80c4f17acd5 2.0 GB 3 hours ago ```

GiteaMirror commented

2026-04-12 13:48:05 -05:00

@JiapengLi commented on GitHub (Dec 20, 2024):

Related topics:
#3004

@JiapengLi commented on GitHub (Dec 20, 2024): Related topics: #3004

GiteaMirror commented

2026-04-12 13:48:06 -05:00

@Pekkari commented on GitHub (Dec 20, 2024):

@JiapengLi I don't think that is using your NPU in any ways, the amd-xdna driver is most likely be available in linux 6.14, then you may need the user space libraries from amd to interact to it, like rocm when talking amd gpus, or just cuda for nvidia, and then ollama may need to have code to call those libraries, which is the reason for this issue to exist. I'm no ollama maintainer though, they may know more details of those I mentioned.

@Pekkari commented on GitHub (Dec 20, 2024): @JiapengLi I don't think that is using your NPU in any ways, the amd-xdna driver is most likely be available in linux 6.14, then you may need the user space libraries from amd to interact to it, like rocm when talking amd gpus, or just cuda for nvidia, and then ollama may need to have code to call those libraries, which is the reason for this issue to exist. I'm no ollama maintainer though, they may know more details of those I mentioned.

GiteaMirror commented

2026-04-12 13:48:06 -05:00

@grigio commented on GitHub (Dec 21, 2024):

@JiapengLi I think Linux 6.14 should improve the situation, keep up updated
https://www.phoronix.com/news/Ryzen-AI-NPU6-Linux-6.14

@grigio commented on GitHub (Dec 21, 2024): @JiapengLi I think Linux 6.14 should improve the situation, keep up updated https://www.phoronix.com/news/Ryzen-AI-NPU6-Linux-6.14

GiteaMirror commented

2026-04-12 13:48:06 -05:00

@sinchichou commented on GitHub (Dec 26, 2024):

So I'am trying to let LLM running on AMD NPU.
But look like it need Visual Studio 2022 Community, CMake, Anaconda or Miniconda.
And the lib all need Ryzen AI SW or any else.
On ONNX Runtime supported list, there no show AMD NPU.
Maybe can try DirectML or ROCm.
I'll try that late.

@sinchichou commented on GitHub (Dec 26, 2024): So I'am trying to let LLM running on AMD NPU. But look like it need Visual Studio 2022 Community, CMake, Anaconda or Miniconda. And the lib all need Ryzen AI SW or any else. On ONNX Runtime supported list, there no show AMD NPU. Maybe can try DirectML or ROCm. I'll try that late.

GiteaMirror commented

2026-04-12 13:48:07 -05:00

@GreyXor commented on GitHub (Mar 18, 2025):

@JiapengLi I think Linux 6.14 should improve the situation, keep up updated https://www.phoronix.com/news/Ryzen-AI-NPU6-Linux-6.14

Yes I confirm that I can run amdxdna driver on my 6.14

@GreyXor commented on GitHub (Mar 18, 2025): > [@JiapengLi](https://github.com/JiapengLi) I think Linux 6.14 should improve the situation, keep up updated https://www.phoronix.com/news/Ryzen-AI-NPU6-Linux-6.14 Yes I confirm that I can run amdxdna driver on my 6.14

GiteaMirror commented

2026-04-12 13:48:07 -05:00

@grigio commented on GitHub (Mar 18, 2025):

@GreyXor do you have improvements in tokens/sec over cpu or vulkan ?

@grigio commented on GitHub (Mar 18, 2025): @GreyXor do you have improvements in tokens/sec over cpu or vulkan ?

GiteaMirror commented

2026-04-12 13:48:08 -05:00

@GreyXor commented on GitHub (Mar 18, 2025):

I mean, amdxdna is loaded and working. but I don't have app to effectively inference something. want me to try something ? I would be happy to try and run some benchmark

amdxdna has been some kind of vaporware since mid-2023. At least now the driver is working but nothing uses it.
I asked AMD for some doc here https://github.com/AMD-AIG-AIMA/Instella/issues/1 and we are some to wait for support here : https://github.com/ggml-org/llama.cpp/issues/1499

@GreyXor commented on GitHub (Mar 18, 2025): I mean, amdxdna is loaded and working. but I don't have app to effectively inference something. want me to try something ? I would be happy to try and run some benchmark `amdxdna` has been some kind of vaporware since mid-2023. At least now the driver is working but nothing uses it. I asked AMD for some doc here https://github.com/AMD-AIG-AIMA/Instella/issues/1 and we are some to wait for support here : https://github.com/ggml-org/llama.cpp/issues/1499

GiteaMirror commented

2026-04-12 13:48:08 -05:00

@wishx commented on GitHub (Mar 24, 2025):

The 6.14 kernel has been released and is widely available now. Just letting folks know.
https://www.phoronix.com/news/Linux-6.14

@wishx commented on GitHub (Mar 24, 2025): The 6.14 kernel has been released and is widely available now. Just letting folks know. https://www.phoronix.com/news/Linux-6.14

GiteaMirror commented

2026-04-12 13:48:09 -05:00

@evansrrr commented on GitHub (Apr 10, 2025):

yeah, and I saw that linux platform did have a leap in AI techs, e.g. having a maximum effect in proceeding sd-webui-aki etc.

@evansrrr commented on GitHub (Apr 10, 2025): yeah, and I saw that linux platform did have a leap in AI techs, e.g. having a maximum effect in proceeding sd-webui-aki etc.

GiteaMirror commented

2026-04-12 13:48:10 -05:00

@DocMAX commented on GitHub (Apr 14, 2025):

I'm going to buy a laptop for local LLM use. Can you recommend a good CPU for this? I prefer Lenovos ThinkPad line. Any experiences? Should i wait for the next generation?

@DocMAX commented on GitHub (Apr 14, 2025): I'm going to buy a laptop for local LLM use. Can you recommend a good CPU for this? I prefer Lenovos ThinkPad line. Any experiences? Should i wait for the next generation?

GiteaMirror commented

2026-04-12 13:48:11 -05:00

@XenoAmess commented on GitHub (Apr 14, 2025):

I'm going to buy a laptop for local LLM use. Can you recommend a good CPU for this? I prefer Lenovos ThinkPad line. Any experiences? Should i wait for the next generation?

well I hate apple but... well, just buy mac book(IMO).

@XenoAmess commented on GitHub (Apr 14, 2025): > I'm going to buy a laptop for local LLM use. Can you recommend a good CPU for this? I prefer Lenovos ThinkPad line. Any experiences? Should i wait for the next generation? well I hate apple but... well, just buy mac book(IMO).

GiteaMirror commented

2026-04-12 13:48:13 -05:00

@DocMAX commented on GitHub (Apr 14, 2025):

I hate apple too, so this is no option...

@DocMAX commented on GitHub (Apr 14, 2025): I hate apple too, so this is no option...

GiteaMirror commented

2026-04-12 13:48:14 -05:00

@Bush-cat commented on GitHub (Apr 14, 2025):

I'm going to buy a laptop for local LLM use. Can you recommend a good CPU for this? I prefer Lenovos ThinkPad line. Any experiences? Should i wait for the next generation?

you'd want a ryzen ai max cpu

@Bush-cat commented on GitHub (Apr 14, 2025): > I'm going to buy a laptop for local LLM use. Can you recommend a good CPU for this? I prefer Lenovos ThinkPad line. Any experiences? Should i wait for the next generation? you'd want a ryzen ai max cpu

GiteaMirror commented

2026-04-12 13:48:15 -05:00

@DocMAX commented on GitHub (Apr 14, 2025):

How does it perform with ROCm and Ollama (Tokens/s)? Can't find any benchmark comparison list anywhere.

@DocMAX commented on GitHub (Apr 14, 2025): How does it perform with ROCm and Ollama (Tokens/s)? Can't find any benchmark comparison list anywhere.

GiteaMirror commented

2026-04-12 13:48:15 -05:00

@XenoAmess commented on GitHub (Apr 14, 2025):

I have AMD cpu with NPU.
I have ubuntu kernel with 6.14 linux.
I have no way to run any llm backend with them.
So thanks your hell for ai, AMD.
Oh maybe I should say the word 'help', but I don't think they deserve.

@XenoAmess commented on GitHub (Apr 14, 2025): I have AMD cpu with NPU. I have ubuntu kernel with 6.14 linux. I have no way to run any llm backend with them. So thanks your hell for ai, AMD. Oh maybe I should say the word 'help', but I don't think they deserve. ![Image](https://github.com/user-attachments/assets/649ee16c-3ca5-4ed2-b1c1-537ce0610670)

GiteaMirror commented

2026-04-12 13:48:16 -05:00

@Bush-cat commented on GitHub (Apr 14, 2025):

How does it perform with ROCm and Ollama (Tokens/s)? Can't find any benchmark comparison list anywhere.

It's the fastest igpu as it has very fast 4channel memory,
you can only get faster with a thick discrete gpu but those have less vram so you're limited in the size of the models you can run

@Bush-cat commented on GitHub (Apr 14, 2025): > How does it perform with ROCm and Ollama (Tokens/s)? Can't find any benchmark comparison list anywhere. It's the fastest igpu as it has very fast 4channel memory, you can only get faster with a thick discrete gpu but those have less vram so you're limited in the size of the models you can run

GiteaMirror commented

2026-04-12 13:48:17 -05:00

@Bush-cat commented on GitHub (Apr 14, 2025):

I have AMD cpu with NPU. I have ubuntu kernel with 6.14 linux. I have no way to run any llm backend with them. So thanks your hell for ai, AMD. Oh maybe I should say the word 'help', but I don't think they deserve.

Welp it always was more marketing for windows users and features I guess, the first ads for the npus were for applicaitons like ms teams...

also don't expect much from the 15 tops of your ryzen 8000 (or 10 tops of my ryzen 7000) npu, copilot+ pcs require at least a 50tops npu to do anything.
I saw a person benchmarking the ryzen 8000 npu and it took several minutes to finish an output with the tiniest llama model

@Bush-cat commented on GitHub (Apr 14, 2025): > I have AMD cpu with NPU. I have ubuntu kernel with 6.14 linux. I have no way to run any llm backend with them. So thanks your hell for ai, AMD. Oh maybe I should say the word 'help', but I don't think they deserve. Welp it always was more marketing for windows users and features I guess, the first ads for the npus were for applicaitons like ms teams... also don't expect much from the 15 tops of your ryzen 8000 (or 10 tops of my ryzen 7000) npu, copilot+ pcs require at least a 50tops npu to do anything. I saw a person benchmarking the ryzen 8000 npu and it took several minutes to finish an output with the tiniest llama model

GiteaMirror commented

2026-04-12 13:48:17 -05:00

@XenoAmess commented on GitHub (Apr 14, 2025):

I have AMD cpu with NPU. I have ubuntu kernel with 6.14 linux. I have no way to run any llm backend with them. So thanks your hell for ai, AMD. Oh maybe I should say the word 'help', but I don't think they deserve.

Welp it always was more marketing for windows users and features I guess, the first ads for the npus were for applicaitons like ms teams...

also don't expect much from the 15 tops of your ryzen 8000 (or 10 tops of my ryzen 7000) npu, copilot+ pcs require at least a 50tops npu to do anything. I saw a person benchmarking the ryzen 8000 npu and it took several minutes to finish an output with the tiniest llama model

The tiniest llama model we've seen is 0.5B. I can't quite believe it would use minutes to handle requests with 0.5B but...
Well, let's wait for AMD engineers to make it useable in another 2 years. maybe there is still hope they can achieeeeeeve it then?

@XenoAmess commented on GitHub (Apr 14, 2025): > > I have AMD cpu with NPU. I have ubuntu kernel with 6.14 linux. I have no way to run any llm backend with them. So thanks your hell for ai, AMD. Oh maybe I should say the word 'help', but I don't think they deserve. > > Welp it always was more marketing for windows users and features I guess, the first ads for the npus were for applicaitons like ms teams... > > also don't expect much from the 15 tops of your ryzen 8000 (or 10 tops of my ryzen 7000) npu, copilot+ pcs require at least a 50tops npu to do anything. I saw a person benchmarking the ryzen 8000 npu and it took several minutes to finish an output with the tiniest llama model The tiniest llama model we've seen is 0.5B. I can't quite believe it would use minutes to handle requests with 0.5B but... Well, let's wait for AMD engineers to make it useable in another 2 years. maybe there is still hope they can achieeeeeeve it then?

GiteaMirror commented

2026-04-12 13:48:18 -05:00

@Pekkari commented on GitHub (Apr 14, 2025):

I have AMD cpu with NPU. I have ubuntu kernel with 6.14 linux. I have no way to run any llm backend with them. So thanks your hell for ai, AMD. Oh maybe I should say the word 'help', but I don't think they deserve.

The marketing info around suggest maybe it may run using either LMStudio or vllm, needless to say, on linux side, one always may expect to tinker a bit to get those topics working.

@Pekkari commented on GitHub (Apr 14, 2025): > I have AMD cpu with NPU. I have ubuntu kernel with 6.14 linux. I have no way to run any llm backend with them. So thanks your hell for ai, AMD. Oh maybe I should say the word 'help', but I don't think they deserve. > > ![Image](https://github.com/user-attachments/assets/649ee16c-3ca5-4ed2-b1c1-537ce0610670) The marketing info around suggest maybe it may run using either LMStudio or vllm, needless to say, on linux side, one always may expect to tinker a bit to get those topics working.

GiteaMirror commented

2026-04-12 13:48:19 -05:00

@bonswouar commented on GitHub (Apr 14, 2025):

Please guys try it out instead of speculating: https://github.com/ollama/ollama/pull/6282
Not sure if it correctly uses the NPU, but it's working!

I saw a person benchmarking the ryzen 8000 npu and it took several minutes to finish an output with the tiniest llama model

Probably fake news, I have a 8845hs and the few models I've tried (8B to 15B) run pretty well (but of course it depends what you compare it to).
Not "several minutes" for "tiniest llama model" for sure though

@bonswouar commented on GitHub (Apr 14, 2025): Please guys try it out instead of speculating: https://github.com/ollama/ollama/pull/6282 Not sure if it correctly uses the NPU, but it's working! > I saw a person benchmarking the ryzen 8000 npu and it took several minutes to finish an output with the tiniest llama model Probably fake news, I have a 8845hs and the few models I've tried (8B to 15B) run pretty well (but of course it depends what you compare it to). Not "several minutes" for "tiniest llama model" for sure though

GiteaMirror commented

2026-04-12 13:48:19 -05:00

@DocMAX commented on GitHub (Apr 14, 2025):

AMD 5800U APU with ROCm: Llama3.1 8B. Question: "Who is Bill Gates":

total duration: 1m7.077394148s
load duration: 22.877264ms
prompt eval count: 51 token(s)
prompt eval duration: 8.593335ms
prompt eval rate: 5934.83 tokens/s
eval count: 435 token(s)
eval duration: 1m7.044265099s
eval rate: 6.49 tokens/s

@DocMAX commented on GitHub (Apr 14, 2025): AMD 5800U APU with ROCm: Llama3.1 8B. Question: "Who is Bill Gates": total duration: 1m7.077394148s load duration: 22.877264ms prompt eval count: 51 token(s) prompt eval duration: 8.593335ms prompt eval rate: 5934.83 tokens/s eval count: 435 token(s) eval duration: 1m7.044265099s eval rate: 6.49 tokens/s

GiteaMirror commented

2026-04-12 13:48:20 -05:00

@Bush-cat commented on GitHub (Apr 14, 2025):

I have AMD cpu with NPU. I have ubuntu kernel with 6.14 linux. I have no way to run any llm backend with them. So thanks your hell for ai, AMD. Oh maybe I should say the word 'help', but I don't think they deserve.

Welp it always was more marketing for windows users and features I guess, the first ads for the npus were for applicaitons like ms teams...
also don't expect much from the 15 tops of your ryzen 8000 (or 10 tops of my ryzen 7000) npu, copilot+ pcs require at least a 50tops npu to do anything. I saw a person benchmarking the ryzen 8000 npu and it took several minutes to finish an output with the tiniest llama model

The tiniest llama model we've seen is 0.5B. I can't quite believe it would use minutes to handle requests with 0.5B but... Well, let's wait for AMD engineers to make it useable in another 2 years. maybe there is still hope they can achieeeeeeve it then?

I saw ryzen 7000 users got 2-10tokens per second with some optimized llama 3.1 8b model
https://www.reddit.com/r/LocalLLaMA/comments/1d9m0z3/running_llama_3_on_the_npu_of_a_firstgeneration/

and with models lower than 3b the quality of the output is really bad.

@Bush-cat commented on GitHub (Apr 14, 2025): > > > I have AMD cpu with NPU. I have ubuntu kernel with 6.14 linux. I have no way to run any llm backend with them. So thanks your hell for ai, AMD. Oh maybe I should say the word 'help', but I don't think they deserve. > > > > > > Welp it always was more marketing for windows users and features I guess, the first ads for the npus were for applicaitons like ms teams... > > also don't expect much from the 15 tops of your ryzen 8000 (or 10 tops of my ryzen 7000) npu, copilot+ pcs require at least a 50tops npu to do anything. I saw a person benchmarking the ryzen 8000 npu and it took several minutes to finish an output with the tiniest llama model > > The tiniest llama model we've seen is 0.5B. I can't quite believe it would use minutes to handle requests with 0.5B but... Well, let's wait for AMD engineers to make it useable in another 2 years. maybe there is still hope they can achieeeeeeve it then? I saw ryzen 7000 users got 2-10tokens per second with some optimized llama 3.1 8b model https://www.reddit.com/r/LocalLLaMA/comments/1d9m0z3/running_llama_3_on_the_npu_of_a_firstgeneration/ and with models lower than 3b the quality of the output is really bad.

GiteaMirror commented

2026-04-12 13:48:21 -05:00

@Bush-cat commented on GitHub (Apr 14, 2025):

Please guys try it out instead of speculating: #6282 Not sure if it correctly uses the NPU, but it's working!

I saw a person benchmarking the ryzen 8000 npu and it took several minutes to finish an output with the tiniest llama model

Probably fake news, I have a 8845hs and the few models I've tried (8B to 15B) run pretty well (but of course it depends what you compare it to). Not "several minutes" for "tiniest llama model" for sure though

you probably used the igpu and not the NPU, I was only talking about the npu which is much slower than using the full igpu

@Bush-cat commented on GitHub (Apr 14, 2025): > Please guys try it out instead of speculating: [#6282](https://github.com/ollama/ollama/pull/6282) Not sure if it correctly uses the NPU, but it's working! > > > I saw a person benchmarking the ryzen 8000 npu and it took several minutes to finish an output with the tiniest llama model > > Probably fake news, I have a 8845hs and the few models I've tried (8B to 15B) run pretty well (but of course it depends what you compare it to). Not "several minutes" for "tiniest llama model" for sure though you probably used the igpu and not the NPU, I was only talking about the npu which is much slower than using the full igpu

GiteaMirror commented

2026-04-12 13:48:21 -05:00

@DocMAX commented on GitHub (Apr 14, 2025):

Anyone can benchmark on a AMD HX 375 for me please? I really wonder how fast it is. I expect around 20 tok/s with llama 3.1 8b with all the hype around the AMD AI processors.

@DocMAX commented on GitHub (Apr 14, 2025): Anyone can benchmark on a AMD HX 375 for me please? I really wonder how fast it is. I expect around 20 tok/s with llama 3.1 8b with all the hype around the AMD AI processors.

GiteaMirror commented

2026-04-12 13:48:21 -05:00

@androidacy-user commented on GitHub (Jun 17, 2025):

People in this issue thread seem to be getting the GPU and NPU mixed up. Ollama can be forced to run on the iGPU, but seems to completely lack support for the (much more efficient) NPU on these chipsets.

@androidacy-user commented on GitHub (Jun 17, 2025): People in this issue thread seem to be getting the **GPU** and **NPU** mixed up. Ollama can be forced to run on the iGPU, but seems to completely lack support for the (much more efficient) NPU on these chipsets.

GiteaMirror commented

2026-04-12 13:48:21 -05:00

@reneleonhardt commented on GitHub (Jun 19, 2025):

Anyone can benchmark on a AMD HX 375 for me please? I really wonder how fast it is. I expect around 20 tok/s with llama 3.1 8b with all the hype around the AMD AI processors.

The NPU seems comparable with 50 TOPS, but a lot of unified RAM always helps of course 😅

https://www.techpowerup.com/334223/amds-ryzen-ai-max-395-delivers-up-to-12x-ai-llm-performance-compared-to-intels-lunar-lake
https://en.wikipedia.org/wiki/List_of_AMD_Ryzen_processors#Ryzen_AI_300_series

It looks like NPU support in Ollama would be amazing to run LLMs even on notebooks ❤

@reneleonhardt commented on GitHub (Jun 19, 2025): > Anyone can benchmark on a AMD HX 375 for me please? I really wonder how fast it is. I expect around 20 tok/s with llama 3.1 8b with all the hype around the AMD AI processors. The NPU seems comparable with 50 TOPS, but a lot of unified RAM always helps of course 😅 https://www.techpowerup.com/334223/amds-ryzen-ai-max-395-delivers-up-to-12x-ai-llm-performance-compared-to-intels-lunar-lake https://en.wikipedia.org/wiki/List_of_AMD_Ryzen_processors#Ryzen_AI_300_series It looks like NPU support in Ollama would be amazing to run LLMs even on notebooks ❤

GiteaMirror commented

2026-04-12 13:48:22 -05:00

@regulad commented on GitHub (Jun 22, 2025):

I have a notebook with the Ryzen AI Max+ 395 at my disposal. I was able to get iGPU inference to work in rootless podman with the following command, but still no NPU inference in sight.

podman run --rm \
  --name ollama \
  --user root \
  --pull=newer \
  --device /dev/kfd \
  --device /dev/dri \
  --group-add keep-groups \
  --privileged \
  -e HSA_OVERRIDE_GFX_VERSION=11.5.1 \
  -e HCC_AMDGPU_TARGET=gfx1151 \
  -v $HOME/.ollama:/root/.ollama:Z \
  -p 11434:11434 \
  docker.io/ollama/ollama:rocm

@DocMAX If you're interested in my speed, here is a prompt from 27B parameter Gemma 3

@regulad commented on GitHub (Jun 22, 2025): I have a notebook with the Ryzen AI Max+ 395 at my disposal. I was able to get iGPU inference to work in rootless podman with the following command, but still no NPU inference in sight. ```bash podman run --rm \ --name ollama \ --user root \ --pull=newer \ --device /dev/kfd \ --device /dev/dri \ --group-add keep-groups \ --privileged \ -e HSA_OVERRIDE_GFX_VERSION=11.5.1 \ -e HCC_AMDGPU_TARGET=gfx1151 \ -v $HOME/.ollama:/root/.ollama:Z \ -p 11434:11434 \ docker.io/ollama/ollama:rocm ``` @DocMAX If you're interested in my speed, here is a prompt from 27B parameter Gemma 3 ![Image](https://github.com/user-attachments/assets/03df4822-edba-4790-ad59-f72b258c6fee)

GiteaMirror commented

2026-04-12 13:48:22 -05:00

@padthaitofuhot commented on GitHub (Jul 27, 2025):

This please.

I have AMD Ryzen AI 7 PRO 360 w/ Radeon 880M in this Thinkpad. It's not a very powerful NPU, but it would be super keen to get a tiny model on it for quick enhanced local autocomplete or embedding vectors for RAG.

kernel: amdxdna 0000:c4:00.1: enabling device (0000 -> 0002)
kernel: [drm] Initialized amdxdna_accel_driver 0.0.0 for 0000:c4:00.1 on minor 0

@padthaitofuhot commented on GitHub (Jul 27, 2025): This please. I have AMD Ryzen AI 7 PRO 360 w/ Radeon 880M in this Thinkpad. It's not a very powerful NPU, but it would be super keen to get a tiny model on it for quick enhanced local autocomplete or embedding vectors for RAG. ``` kernel: amdxdna 0000:c4:00.1: enabling device (0000 -> 0002) kernel: [drm] Initialized amdxdna_accel_driver 0.0.0 for 0000:c4:00.1 on minor 0 ```

GiteaMirror commented

2026-04-12 13:48:23 -05:00

@androidacy-user commented on GitHub (Jul 27, 2025):

This please.

I have AMD Ryzen AI 7 PRO 360 w/ Radeon 880M in this Thinkpad. It's not a very powerful NPU, but it would be super keen to get a tiny model on it for quick enhanced local autocomplete or embedding vectors for RAG.
kernel: amdxdna 0000:c4:00.1: enabling device (0000 -> 0002)
kernel: [drm] Initialized amdxdna_accel_driver 0.0.0 for 0000:c4:00.1 on minor 0

50 TOPs is plenty for a smaller or quantized model, or larger if you're willing to deal with slower inference times

@androidacy-user commented on GitHub (Jul 27, 2025): > This please. > > I have AMD Ryzen AI 7 PRO 360 w/ Radeon 880M in this Thinkpad. It's not a very powerful NPU, but it would be super keen to get a tiny model on it for quick enhanced local autocomplete or embedding vectors for RAG. > ``` > kernel: amdxdna 0000:c4:00.1: enabling device (0000 -> 0002) > kernel: [drm] Initialized amdxdna_accel_driver 0.0.0 for 0000:c4:00.1 on minor 0 > ``` 50 TOPs is plenty for a smaller or quantized model, or larger if you're willing to deal with slower inference times

GiteaMirror commented

2026-04-12 13:48:23 -05:00

@muety commented on GitHub (Jul 27, 2025):

I have AMD Ryzen AI 7 PRO 360 w/ Radeon 880M

Would love to see how it performs on reasonably large models (like ~21B or so)!

@muety commented on GitHub (Jul 27, 2025): > I have AMD Ryzen AI 7 PRO 360 w/ Radeon 880M Would love to see how it performs on reasonably large models (like ~21B or so)!

GiteaMirror commented

2026-04-12 13:48:24 -05:00

@jcubic commented on GitHub (Jul 27, 2025):

AMD NPU is supported by the mainline Linux kernel from 6.14 released on March 2025.

AMD Previews Mysterious Linux Runtime Stack For Ryzen AI NPUs

I wanted to buy a laptop with this NPU, and it would be great to be able to use bigger models with Ollama.

@jcubic commented on GitHub (Jul 27, 2025): AMD NPU is supported by the mainline Linux kernel from [6.14 released on March 2025](https://kernelnewbies.org/Linux_6.14). * [AMD Previews Mysterious Linux Runtime Stack For Ryzen AI NPUs](https://www.phoronix.com/news/AMD-Linux-RT-Preview-Ryzen-AI) I wanted to buy a laptop with this NPU, and it would be great to be able to use bigger models with Ollama.

GiteaMirror commented

2026-04-12 13:48:24 -05:00

@gururise commented on GitHub (Sep 1, 2025):

AMD NPU is supported by the mainline Linux kernel from 6.14 released on March 2025.
* [AMD Previews Mysterious Linux Runtime Stack For Ryzen AI NPUs](https://www.phoronix.com/news/AMD-Linux-RT-Preview-Ryzen-AI)
I wanted to buy a laptop with this NPU, and it would be great to be able to use bigger models with Ollama.

NPU support can speed things up significantly. There are two other projects that support inference with AMD NPU's and show significant perf improvements over iGPU or CPU only:

AMD GAIA - supports hybrid NPU + iGPU or NPU only modes
FastFlowLLM - supports NPU

@gururise commented on GitHub (Sep 1, 2025): > AMD NPU is supported by the mainline Linux kernel from [6.14 released on March 2025](https://kernelnewbies.org/Linux_6.14). > > * [AMD Previews Mysterious Linux Runtime Stack For Ryzen AI NPUs](https://www.phoronix.com/news/AMD-Linux-RT-Preview-Ryzen-AI) > > > I wanted to buy a laptop with this NPU, and it would be great to be able to use bigger models with Ollama. NPU support can speed things up significantly. There are two other projects that support inference with AMD NPU's and show significant perf improvements over iGPU or CPU only: 1. [AMD GAIA](https://github.com/amd/gaia) - supports hybrid NPU + iGPU or NPU only modes 2. [FastFlowLLM](https://github.com/FastFlowLM/FastFlowLM) - supports NPU

GiteaMirror commented

2026-04-12 13:48:25 -05:00

@ha-pf-tickerer commented on GitHub (Sep 1, 2025):

I wanted to buy a laptop with this NPU, and it would be great to be able to use bigger models with Ollama.

NPU support can speed things up significantly. There are two other projects that support inference with AMD NPU's and show significant perf improvements over iGPU or CPU only:
1. [AMD GAIA](https://github.com/amd/gaia) - supports hybrid NPU + iGPU or NPU only modes

2. [FastFlowLLM](https://github.com/FastFlowLM/FastFlowLM) - supports NPU

Gaia or fastflowlm are great project that support the AMD Ryzen AI processors but support for olloma would be really, really great.

Our use case is a dedicated local AI mini-pc , to be used by the kids as a better google/alexa search
AND probably most off the time to give Home Assistant a local conversation agent using the
https://www.home-assistant.io/integrations/ollama/

This would allow a " it's too hot in here !" voice prompt to Home Assistant,
letting the Home Assistant correctly understand that the user "bob" sitting in the living room is not happy
with the temperature and that the Home Assistant server should lower the temperature in the room using
the AC or lower the thermostats based on the controls that Home Assistant already has.

I promised this functionality to my SO in order to hang the house full with zigbee sensors and have seriously expensive AMD Ryzen AI mini boxes in the house -)

@ha-pf-tickerer commented on GitHub (Sep 1, 2025): > > > > I wanted to buy a laptop with this NPU, and it would be great to be able to use bigger models with Ollama. > > NPU support can speed things up significantly. There are two other projects that support inference with AMD NPU's and show significant perf improvements over iGPU or CPU only: > > 1. [AMD GAIA](https://github.com/amd/gaia) - supports hybrid NPU + iGPU or NPU only modes > > 2. [FastFlowLLM](https://github.com/FastFlowLM/FastFlowLM) - supports NPU Gaia or fastflowlm are great project that support the AMD Ryzen AI processors but support for olloma would be really, really great. Our use case is a dedicated local AI mini-pc , to be used by the kids as a better google/alexa search AND probably most off the time to give Home Assistant a local conversation agent using the https://www.home-assistant.io/integrations/ollama/ This would allow a " it's too hot in here !" voice prompt to Home Assistant, letting the Home Assistant correctly understand that the user "bob" sitting in the living room is not happy with the temperature and that the Home Assistant server should lower the temperature in the room using the AC or lower the thermostats based on the controls that Home Assistant already has. I promised this functionality to my SO in order to hang the house full with zigbee sensors and have seriously expensive AMD Ryzen AI mini boxes in the house -)

GiteaMirror commented

2026-04-12 13:48:25 -05:00

@z0xca commented on GitHub (Dec 23, 2025):

Running a AMD 8845HS here too, Same as above. Hoping to see npu support.

@z0xca commented on GitHub (Dec 23, 2025): Running a AMD 8845HS here too, Same as above. Hoping to see npu support.

GiteaMirror commented

2026-04-12 13:48:26 -05:00

@alerque commented on GitHub (Dec 23, 2025):

How is this affected by the merge of #13196?

@alerque commented on GitHub (Dec 23, 2025): How is this affected by the merge of #13196?

GiteaMirror commented

2026-04-12 13:48:27 -05:00

@Pekkari commented on GitHub (Dec 24, 2025):

How is this affected by the merge of #13196?

not affected at all. The merge is about the iGPU support, not the NPU, and for what I know, the NPU in 8845HS is not worth supporting, since the extra capacity it will provide for hybrid setup(GPU + NPU) is not really a deal maker.

@Pekkari commented on GitHub (Dec 24, 2025): > How is this affected by the merge of [#13196](https://github.com/ollama/ollama/pull/13196)? not affected at all. The merge is about the iGPU support, not the NPU, and for what I know, the NPU in 8845HS is not worth supporting, since the extra capacity it will provide for hybrid setup(GPU + NPU) is not really a deal maker.

GiteaMirror commented

2026-04-12 13:48:29 -05:00

@bonswouar commented on GitHub (Dec 24, 2025):

the NPU in 8845HS is not worth supporting, since the extra capacity it will provide for hybrid setup(GPU + NPU) is not really a deal maker.

Isn't the NPU supposed to be more energy efficient than the GPU though?
The 8845HS being a laptop cpu I'd say it could be a huge deal maker, if it helps to run models on battery.

But if it's not more energy efficient, and not noticeably improving performances with hybrid setup, then I really don't see the point yeah

@bonswouar commented on GitHub (Dec 24, 2025): > the NPU in 8845HS is not worth supporting, since the extra capacity it will provide for hybrid setup(GPU + NPU) is not really a deal maker. Isn't the NPU supposed to be more energy efficient than the GPU though? The 8845HS being a laptop cpu I'd say it could be a _huge_ deal maker, _if_ it helps to run models on battery. _But if it's not more energy efficient, and not noticeably improving performances with hybrid setup, then I really don't see the point yeah_

GiteaMirror commented

2026-04-12 13:48:30 -05:00

@Pekkari commented on GitHub (Dec 24, 2025):

Isn't the NPU supposed to be more energy efficient than the GPU though? The 8845HS being a laptop cpu I'd say it could be a huge deal maker, if it helps to run models on battery.

But if it's not more energy efficient, and not noticeably improving performances with hybrid setup, then I really don't see the point yeah

Don't kill the messenger, I'm just voicing what I heard from AMD, I'd love to see the support coming anyways, since, I bought the hardware for the NPU, and suddenly got to the same situation :|

@Pekkari commented on GitHub (Dec 24, 2025): > Isn't the NPU supposed to be more energy efficient than the GPU though? The 8845HS being a laptop cpu I'd say it could be a _huge_ deal maker, _if_ it helps to run models on battery. > > _But if it's not more energy efficient, and not noticeably improving performances with hybrid setup, then I really don't see the point yeah_ Don't kill the messenger, I'm just voicing what I heard from AMD, I'd love to see the support coming anyways, since, I bought the hardware *for the NPU*, and suddenly got to the same situation :|

GiteaMirror commented

2026-04-12 13:48:31 -05:00

@alerque commented on GitHub (Dec 24, 2025):

Fair enough. I'm still figuring out what is what here.

Partly out of personal curiosity and partly because I'm an Arch Linux packager looking over the ROCM related packages wondering if there is anything we are missing out on that I could help fix... my personal hardware is an integrated AMD Ryzen AI 9 HX 370 w/ Radeon 890M which I assume does have an NPU and would benefit from this requested support correct? And also a AMD Ryzen 5 3600 6-Core processor with a discrete Radeon RX 5500 graphics card for which I assume there is no NPU correct?

Is there somewhere that has commands to actually ferret out or a good table somewhere showing where AMD has NPUs at all and by what they are/are not supported?

@alerque commented on GitHub (Dec 24, 2025): Fair enough. I'm still figuring out what is what here. Partly out of personal curiosity and partly because I'm an Arch Linux packager looking over the ROCM related packages wondering if there is anything we are missing out on that I could help fix... my personal hardware is an integrated `AMD Ryzen AI 9 HX 370 w/ Radeon 890M` which I assume does have an NPU and would benefit from this requested support correct? And also a `AMD Ryzen 5 3600 6-Core` processor with a discrete `Radeon RX 5500` graphics card for which I assume there is no NPU correct? Is there somewhere that has commands to actually ferret out or a good table somewhere showing where AMD has NPUs at all and by what they are/are not supported?

GiteaMirror commented

2026-04-12 13:48:32 -05:00

@Pekkari commented on GitHub (Dec 24, 2025):

Fair enough. I'm still figuring out what is what here.

Partly out of personal curiosity and partly because I'm an Arch Linux packager looking over the ROCM related packages wondering if there is anything we are missing out on that I could help fix... my personal hardware is an integrated AMD Ryzen AI 9 HX 370 w/ Radeon 890M which I assume does have an NPU and would benefit from this requested support correct? And also a AMD Ryzen 5 3600 6-Core Processor with a discrete Radeon RX 5500 graphics card for while I assume there is no NPU correct? Is there somewhere that has commands to actually ferret out or a good table somewhere showing where AMD has NPUs at all and by what they are/are not supported?

Fail to remember, but I read something about Strix Point support coming, which I think it is your hardware, however, I think it was GPU support in ROCM, so chances are that you may be still in the safe zone. 8845HS is prior to the Strix Point, and after is the Strix Halo that is the first intended to be supported, but community push made the support for Strix point also in ROCM to happen.

@Pekkari commented on GitHub (Dec 24, 2025): > Fair enough. I'm still figuring out what is what here. > > Partly out of personal curiosity and partly because I'm an Arch Linux packager looking over the ROCM related packages wondering if there is anything we are missing out on that I could help fix... my personal hardware is an integrated `AMD Ryzen AI 9 HX 370 w/ Radeon 890M` which I assume does have an NPU and would benefit from this requested support correct? And also a `AMD Ryzen 5 3600 6-Core Processor` with a discrete `Radeon RX 5500` graphics card for while I assume there is no NPU correct? Is there somewhere that has commands to actually ferret out or a good table somewhere showing where AMD has NPUs at all and by what they are/are not supported? Fail to remember, but I read something about Strix Point support coming, which I think it is your hardware, however, I think it was GPU support in ROCM, so chances are that you may be still in the safe zone. 8845HS is prior to the Strix Point, and after is the Strix Halo that is the first intended to be supported, but community push made the support for Strix point also in ROCM to happen.

GiteaMirror commented

2026-04-12 13:48:33 -05:00

@z0xca commented on GitHub (Dec 24, 2025):

Fair enough. I'm still figuring out what is what here.

Partly out of personal curiosity and partly because I'm an Arch Linux packager looking over the ROCM related packages wondering if there is anything we are missing out on that I could help fix... my personal hardware is an integrated AMD Ryzen AI 9 HX 370 w/ Radeon 890M which I assume does have an NPU and would benefit from this requested support correct? And also a AMD Ryzen 5 3600 6-Core processor with a discrete Radeon RX 5500 graphics card for which I assume there is no NPU correct?

Is there somewhere that has commands to actually ferret out or a good table somewhere showing where AMD has NPUs at all and by what they are/are not supported?

The Wikipedia List of AMD Ryzen processors shows which CPU's have an NPU or not

@z0xca commented on GitHub (Dec 24, 2025): > Fair enough. I'm still figuring out what is what here. > > Partly out of personal curiosity and partly because I'm an Arch Linux packager looking over the ROCM related packages wondering if there is anything we are missing out on that I could help fix... my personal hardware is an integrated `AMD Ryzen AI 9 HX 370 w/ Radeon 890M` which I assume does have an NPU and would benefit from this requested support correct? And also a `AMD Ryzen 5 3600 6-Core` processor with a discrete `Radeon RX 5500` graphics card for which I assume there is no NPU correct? > > Is there somewhere that has commands to actually ferret out or a good table somewhere showing where AMD has NPUs at all and by what they are/are not supported? The Wikipedia [List of AMD Ryzen processors](https://en.wikipedia.org/wiki/List_of_AMD_Ryzen_processors) shows which CPU's have an NPU or not

GiteaMirror commented

2026-04-12 13:48:33 -05:00

@GreyXor commented on GitHub (Feb 25, 2026):

If anyone interested, I wrote a little guide with FastFlowLM to use the NPU https://community.frame.work/t/guide-use-npu-xdna2-with-arch-linux-and-fastflowlm/80879

@GreyXor commented on GitHub (Feb 25, 2026): If anyone interested, I wrote a little guide with FastFlowLM to use the NPU https://community.frame.work/t/guide-use-npu-xdna2-with-arch-linux-and-fastflowlm/80879

GiteaMirror commented

2026-04-12 13:48:34 -05:00

@poplk commented on GitHub (Mar 22, 2026):

Hi, I am thinking to buy an AMD Ryzen ai 9 max+ 395. Does anybody own one I have some questions ?

@poplk commented on GitHub (Mar 22, 2026): Hi, I am thinking to buy an AMD Ryzen ai 9 max+ 395. Does anybody own one I have some questions ?

GiteaMirror commented

2026-04-12 13:48:35 -05:00

@alerque commented on GitHub (Mar 22, 2026):

@poplk This is an issue report on a piece of software and it is followed by people who want to be notified about updates to the software issue. This is not an open-topic forum or hardware buyers guide. Please don't spam the issue tracker.

@alerque commented on GitHub (Mar 22, 2026): @poplk This is an issue report on a piece of software and it is followed by people who want to be notified about updates to the software issue. This is not an open-topic forum or hardware buyers guide. Please don't spam the issue tracker.

GiteaMirror referenced this issue

2026-04-22 05:20:42 -05:00

[GH-ISSUE #3262] Ollama can support windows 7? #27767

GiteaMirror referenced this issue

2026-04-28 08:46:47 -05:00

[GH-ISSUE #3262] Ollama can support windows 7? #48519

GiteaMirror referenced this issue

2026-05-03 15:58:09 -05:00

[GH-ISSUE #3262] Ollama can support windows 7? #64045

Sign in to join this conversation.

Branches Tags

main

dhiltgen/ci

parth-launch-plan-gating

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#3262