[GH-ISSUE #8538] Add support for the AI HAT+ #5508

Open
opened 2026-04-12 16:45:19 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @sealad886 on GitHub (Jan 22, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8538

Add support for the new AI HAT+ that can be added on to Raspberry Pi 5 info here to enable speedups.

Originally created by @sealad886 on GitHub (Jan 22, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8538 Add support for the new AI HAT+ that can be added on to Raspberry Pi 5 [info here](https://www.raspberrypi.com/products/ai-hat/) to enable speedups.
GiteaMirror added the feature request label 2026-04-12 16:45:19 -05:00
Author
Owner

@smakonin commented on GitHub (Aug 17, 2025):

I am very interested in working on this. It is a research project. If anyone can point me in the right direction, I can start working on it in my spare time.

<!-- gh-comment-id:3194037676 --> @smakonin commented on GitHub (Aug 17, 2025): I am very interested in working on this. It is a research project. If anyone can point me in the right direction, I can start working on it in my spare time.
Author
Owner

@znmeb commented on GitHub (Apr 3, 2026):

There's now an AI HAT+ 2 with 8 GB of its own RAM and 40 TOPS on int4. I wouldn't mess with the old one; it's useful for vision at 26 8-bit TOPS but the language models really need the faster one.

<!-- gh-comment-id:4184921666 --> @znmeb commented on GitHub (Apr 3, 2026): There's now an AI HAT+ 2 with 8 GB of its own RAM and 40 TOPS on int4. I wouldn't mess with the old one; it's useful for vision at 26 8-bit TOPS but the language models really need the faster one.
Author
Owner

@znmeb commented on GitHub (Apr 4, 2026):

I forgot to mention yesterday - I have a Pi 5 with the AI HAT + (the one with 26 INT8 TOPS but not the 8 GB of dedicated RAM). My main applications are going to be audio and I took a look at the Hailo documentation today. It looks like speech-to-text and text-to-speech will run in the AI HAT + so I'm going to at least experiment with those.

One thing to watch out for - do not enable Vulkan when building llama.cpp or code derived from llama.cpp for a Raspberry Pi. Vulkan is there but llama.cpp will crash trying to load models because the Pi Vulkan GPU doesn't have enough memory for matrix multiplies. You can use OpenBLAS though - you need the -dev files for both 32-bit and 64-bit versions.

<!-- gh-comment-id:4187899792 --> @znmeb commented on GitHub (Apr 4, 2026): I forgot to mention yesterday - I have a Pi 5 with the AI HAT + (the one with 26 INT8 TOPS but not the 8 GB of dedicated RAM). My main applications are going to be audio and I took a look at the Hailo documentation today. It looks like speech-to-text and text-to-speech _**will**_ run in the AI HAT + so I'm going to at least experiment with those. One thing to watch out for - do _**not**_ enable Vulkan when building `llama.cpp` or code derived from `llama.cpp` for a Raspberry Pi. Vulkan is there but `llama.cpp` will crash trying to load models because the Pi Vulkan GPU doesn't have enough memory for matrix multiplies. You can use OpenBLAS though - you need the `-dev` files for both 32-bit and 64-bit versions.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#5508