[GH-ISSUE #10667] Axelera Metis card support #69071

Open
opened 2026-05-04 17:05:28 -05:00 by GiteaMirror · 18 comments
Owner

Originally created by @Smiril on GitHub (May 11, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10667

is it possible to plan in future support axelera.ai card support?

See https://store.axelera.ai

Originally created by @Smiril on GitHub (May 11, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10667 is it possible to plan in future support axelera.ai card support? See https://store.axelera.ai
GiteaMirror added the feature request label 2026-05-04 17:05:28 -05:00
Author
Owner

@jclsn commented on GitHub (May 12, 2025):

I have also looked into this. The card sounds promising, but it seems to be mainly focused on object detection and not LLMs. The current PCIe card comes only with 4GB of DRAM, so could only hold very small models anyway.

There supposedly is an upcoming card called Axelera Europa that is going to support LLMs. See the Retailer FAQ here

<!-- gh-comment-id:2873975725 --> @jclsn commented on GitHub (May 12, 2025): I have also looked into this. The card sounds promising, but it seems to be mainly focused on object detection and not LLMs. The current PCIe card comes only with 4GB of DRAM, so could only hold very small models anyway. There supposedly is an upcoming card called Axelera Europa that is going to support LLMs. See the Retailer FAQ [here](https://buyzero.de/products/axelera-ai-quad-core-metis-pcie-accelerator-card-214-tops?_pos=1&_sid=9aa6c92dd&_ss=r)
Author
Owner

@Smiril commented on GitHub (May 13, 2025):

should it not work ? even it work on cpu it will take ram and not dram
can you only export the learning to this card here

<!-- gh-comment-id:2875530816 --> @Smiril commented on GitHub (May 13, 2025): should it not work ? even it work on cpu it will take ram and not dram can you only export the learning to this card [here](https://buyzero.de/products/axelera-ai-1gb-quad-core-metis-m-2-m-key-accelerator-card-214-tops?_pos=1&_fid=1dfc8149d&_ss=c)
Author
Owner

@jclsn commented on GitHub (May 13, 2025):

I am not an expert, but I would assume its advertised speed is due to the AIPU being directly connected to its own RAM. If this is the case than the 1GB version would definitely not be able to hold any LLM model. Maybe someone else here knows more than me!

<!-- gh-comment-id:2875864406 --> @jclsn commented on GitHub (May 13, 2025): I am not an expert, but I would assume its advertised speed is due to the AIPU being directly connected to its own RAM. If this is the case than the 1GB version would definitely not be able to hold any LLM model. Maybe someone else here knows more than me!
Author
Owner

@Smiril commented on GitHub (May 13, 2025):

The LLM should hold by System RAM and the Learning thread should Running on this m.2 Module
Forget that to load the complete Modell into one ram Bank
Eventual it is more effizient to load only the Learning Processing to an AIPU or graphiccard

<!-- gh-comment-id:2876176166 --> @Smiril commented on GitHub (May 13, 2025): The LLM should hold by System RAM and the Learning thread should Running on this m.2 Module Forget that to load the complete Modell into one ram Bank Eventual it is more effizient to load only the Learning Processing to an AIPU or graphiccard
Author
Owner

@jclsn commented on GitHub (May 22, 2025):

I dug some more in their website. "Axelera Europa" seems to be the name of the fund by the European Union. The upcoming processor for generative AI is called Axelera Titania. You can find more information here.

Furthermore, what makes the Axelera architecture stand out is the D-IMC (direct in-memory calculation) technology, which somehow does do the matrix calculation directly in the card's D-RAM I assume. In any case, this means that the models need to be fully loaded into the dedicated memory of the chip and absolutely cannot make use of the system memory, as you suggested. The target release year is 2028 for this chip, so we still have to wait for some time :)

<!-- gh-comment-id:2900874817 --> @jclsn commented on GitHub (May 22, 2025): I dug some more in their website. "Axelera Europa" seems to be the name of the fund by the European Union. The upcoming processor for generative AI is called Axelera Titania. You can find more information [here](https://axelera.ai/news/axelera-ai-secures-up-to-61-million-grant-to-develop-scalable-ai-chiplet-for-high-performance-computing). Furthermore, what makes the Axelera architecture stand out is the D-IMC (direct in-memory calculation) technology, which somehow does do the matrix calculation directly in the card's D-RAM I assume. In any case, this means that the models need to be fully loaded into the dedicated memory of the chip and absolutely cannot make use of the system memory, as you suggested. The target release year is 2028 for this chip, so we still have to wait for some time :)
Author
Owner

@Smiril commented on GitHub (May 29, 2025):

Whats with the Gemma Model what fits under 1GB
Is there an Option to load like

„ollama run gemma3:1b:aipu“
And Not:
„ollama run gemma3:1b:cpu“

<!-- gh-comment-id:2920536603 --> @Smiril commented on GitHub (May 29, 2025): Whats with the Gemma Model what fits under 1GB Is there an Option to load like „ollama run gemma3:1b:aipu“ And Not: „ollama run gemma3:1b:cpu“
Author
Owner

@alsutton commented on GitHub (Jul 25, 2025):

They're now offering a 16GB PCIe Card, which is more useful, and seems attractively priced, and pre-orders of evaluation units of a self-contained board with 16GB of RAM, an ARM CPU, and a their own chip.

<!-- gh-comment-id:3116879943 --> @alsutton commented on GitHub (Jul 25, 2025): They're now offering a 16GB PCIe Card, which is more useful, and [seems attractively priced](https://store.axelera.ai/products/metis-pcie-card-unmatched-performance-for-edge-ai-applications), and pre-orders of [evaluation units of a self-contained board with 16GB of RAM, an ARM CPU, and a their own chip](https://store.axelera.ai/products/metis-compute-board-with-arm-based-rk3588).
Author
Owner

@Smiril commented on GitHub (Jul 25, 2025):

Well Long awaiting this cards
But Price in Shop is without Tax

<!-- gh-comment-id:3116930390 --> @Smiril commented on GitHub (Jul 25, 2025): Well Long awaiting this cards But Price in Shop is without Tax
Author
Owner

@Cyrille37 commented on GitHub (Nov 14, 2025):

Hi

A demonstration from Axelera : Llama 3.2B chatbot running on our Metis® platform (2025-05-29)

With some details SLM Inference on Axelera AI Platform

And a PCIe 16Go is for sale ...

<!-- gh-comment-id:3534148246 --> @Cyrille37 commented on GitHub (Nov 14, 2025): Hi A demonstration from Axelera : [Llama 3.2B chatbot running on our Metis® platform](https://community.axelera.ai/llms-nlp-neural-networks-53/llama-3-2b-chatbot-demo-on-metis-fully-offline-209) (_2025-05-29_) With some details [SLM Inference on Axelera AI Platform](https://github.com/axelera-ai-hub/voyager-sdk/blob/release/v1.3/docs/tutorials/llm.md) And a [PCIe 16Go](https://store.axelera.ai/products/metis-pcie-card-unmatched-performance-for-edge-ai-applications?variant=51109413486933) is for sale ...
Author
Owner

@jclsn commented on GitHub (Nov 14, 2025):

@Cyrille37 Those news are not so new and I think no one is going to make the effort to add support for a card that only delivers ~7 tokens/s with a 3B-model.

This upcoming chip here looks more promising https://axelera.ai/ai-accelerators/aipu/europa, although it's questionable if this can compete with GPUs either. I am also excited about this card, but from what I heard on Discord, models would need to get recompiled for this architecture, which devs would only do if the performance is worth it I guess.

<!-- gh-comment-id:3534201244 --> @jclsn commented on GitHub (Nov 14, 2025): @Cyrille37 Those news are not so new and I think no one is going to make the effort to add support for a card that only delivers ~7 tokens/s with a 3B-model. This upcoming chip here looks more promising https://axelera.ai/ai-accelerators/aipu/europa, although it's questionable if this can compete with GPUs either. I am also excited about this card, but from what I heard on Discord, models would need to get recompiled for this architecture, which devs would only do if the performance is worth it I guess.
Author
Owner

@ergohaxor commented on GitHub (Nov 16, 2025):

@Cyrille37 Those news are not so new and I think no one is going to make the effort to add support for a card that only delivers ~7 tokens/s with a 3B-model.

That was a single core result; Metis is a quad core chip.

<!-- gh-comment-id:3538943037 --> @ergohaxor commented on GitHub (Nov 16, 2025): > [@Cyrille37](https://github.com/Cyrille37) Those news are not so new and I think no one is going to make the effort to add support for a card that only delivers ~7 tokens/s with a 3B-model. That was a single core result; Metis is a quad core chip.
Author
Owner

@jclsn commented on GitHub (Nov 16, 2025):

@ergohaxor Good point! I guess it doesn't scale up linearly though. It's a pity that they don't provide benchmarks on their websites from LLMs. It would probably bait more LLM enthusiasts and get them to implement something.

<!-- gh-comment-id:3538948122 --> @jclsn commented on GitHub (Nov 16, 2025): @ergohaxor Good point! I guess it doesn't scale up linearly though. It's a pity that they don't provide benchmarks on their websites from LLMs. It would probably bait more LLM enthusiasts and get them to implement something.
Author
Owner

@ergohaxor commented on GitHub (Nov 16, 2025):

@jclsn Maybe we get an SDK update and some benchmarks once their M.2 Max card starts shipping.

<!-- gh-comment-id:3538954116 --> @ergohaxor commented on GitHub (Nov 16, 2025): @jclsn Maybe we get an SDK update and some benchmarks once their M.2 Max card starts shipping.
Author
Owner

@jclsn commented on GitHub (Dec 6, 2025):

@ergohaxor Maybe! Currently running custom models seems expermental and quite complicated https://github.com/axelera-ai-hub/voyager-sdk/blob/release/v1.5/docs/tutorials/custom_model.md

<!-- gh-comment-id:3621306440 --> @jclsn commented on GitHub (Dec 6, 2025): @ergohaxor Maybe! Currently running custom models seems expermental and quite complicated https://github.com/axelera-ai-hub/voyager-sdk/blob/release/v1.5/docs/tutorials/custom_model.md
Author
Owner

@Smiril commented on GitHub (Dec 21, 2025):

if there any chance to download a working model ?
@jclsn is there an onnx model in Ollama repo? because the point recompile

any clue how to?

<!-- gh-comment-id:3678691251 --> @Smiril commented on GitHub (Dec 21, 2025): if there any chance to download a working model ? @jclsn is there an onnx model in Ollama repo? [because the point recompile](https://github.com/axelera-ai-hub/voyager-sdk/blob/release/v1.5/docs/reference/compiler_cli.md) any clue how to?
Author
Owner

@jclsn commented on GitHub (Dec 21, 2025):

@Smiril Only the ones in the model zoo. All others would need to get recompiled

<!-- gh-comment-id:3678858785 --> @jclsn commented on GitHub (Dec 21, 2025): @Smiril Only the ones in the [model zoo](https://axelera.ai/ai-software/model-zoo). All others would need to get recompiled
Author
Owner

@Smiril commented on GitHub (Dec 21, 2025):

All others would need to get recompiled

And this is my question : has any one pre or re compiled Models successfuly Build?
Or is this for now a secret from the Axelera Crew?

You may laught but I search a way for „dummies“ …

I Did Not read closer the README but there‘s a Lot Information I Did Not understand for the Moment

I Found this

<!-- gh-comment-id:3678937561 --> @Smiril commented on GitHub (Dec 21, 2025): > All others would need to get recompiled And this is my question : has any one pre or re compiled Models successfuly Build? Or is this for now a secret from the Axelera Crew? You may laught but I search a way for „dummies“ … I Did Not read closer the README but there‘s a Lot Information I Did Not understand for the Moment I Found [this ](https://github.com/axelera-ai-hub/voyager-sdk/blob/release/v1.5/docs/reference/deploy.md)
Author
Owner

@jclsn commented on GitHub (Dec 21, 2025):

I don't own a card, so I can't help you. You should ask in the Axelera forums!

<!-- gh-comment-id:3679109170 --> @jclsn commented on GitHub (Dec 21, 2025): I don't own a card, so I can't help you. You should ask in the Axelera forums!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#69071