[GH-ISSUE #5315] Support for Ascend NPU hardware #65366

Open
opened 2026-05-03 20:55:28 -05:00 by GiteaMirror · 12 comments
Owner

Originally created by @JingWoo on GitHub (Jun 27, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5315

Huawei Ascend AI processor is an AI chip based on Huawei-developed Da Vinci architecture. It performs well in processing large-scale data and complex computing tasks. Currently, the llama.cpp project is adapting to the Ascend series AI processors. I'm also adapting Ollama to support the Ascend series of AI processors to expand the hardware ecosystem Ollama supports.

Originally created by @JingWoo on GitHub (Jun 27, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5315 Huawei Ascend AI processor is an AI chip based on Huawei-developed Da Vinci architecture. It performs well in processing large-scale data and complex computing tasks. Currently, the llama.cpp project is adapting to the Ascend series AI processors. I'm also adapting Ollama to support the Ascend series of AI processors to expand the hardware ecosystem Ollama supports.
GiteaMirror added the feature request label 2026-05-03 20:55:28 -05:00
Author
Owner

@mchiang0610 commented on GitHub (Jun 27, 2024):

Thanks @JingWoo for reaching out! Do you happen to know the Huawei team? We test on devices before adding support. For most of the integrations, we spend the time directly with hardware vendors to test out their drivers prior to a release.

<!-- gh-comment-id:2192973735 --> @mchiang0610 commented on GitHub (Jun 27, 2024): Thanks @JingWoo for reaching out! Do you happen to know the Huawei team? We test on devices before adding support. For most of the integrations, we spend the time directly with hardware vendors to test out their drivers prior to a release.
Author
Owner

@JingWoo commented on GitHub (Jun 27, 2024):

@mchiang0610 Yes, I'm from Huawei and I now trying adding support Ascend for ollama.
I also have some Ascend AI servers in hand, so adding support and testing on them.
If you have some more problems/features needed on Ascend, please feel free to contact with me.

<!-- gh-comment-id:2193137091 --> @JingWoo commented on GitHub (Jun 27, 2024): @mchiang0610 Yes, I'm from Huawei and I now trying adding support Ascend for ollama. I also have some Ascend AI servers in hand, so adding support and testing on them. If you have some more problems/features needed on Ascend, please feel free to contact with me.
Author
Owner

@mchiang0610 commented on GitHub (Jun 27, 2024):

@JingWoo Would it be possible to see if we can add an Ascend AI server for testing? Where could we request hardware for testing?

Every single Ollama release goes through testing before release. We've been working with hardware vendors to build out a test matrix for stability and reliability checks.

<!-- gh-comment-id:2195516804 --> @mchiang0610 commented on GitHub (Jun 27, 2024): @JingWoo Would it be possible to see if we can add an Ascend AI server for testing? Where could we request hardware for testing? Every single Ollama release goes through testing before release. We've been working with hardware vendors to build out a test matrix for stability and reliability checks.
Author
Owner

@cosren commented on GitHub (Jul 26, 2024):

@JingWoo you can setup ollama on the acend npu divice ? Do you have any guide doc for it ?

<!-- gh-comment-id:2251889361 --> @cosren commented on GitHub (Jul 26, 2024): @JingWoo you can setup ollama on the acend npu divice ? Do you have any guide doc for it ?
Author
Owner

@myoss commented on GitHub (Jul 26, 2024):

hope it can support running on Huawei Ascend servers

<!-- gh-comment-id:2251905069 --> @myoss commented on GitHub (Jul 26, 2024): hope it can support running on Huawei Ascend servers
Author
Owner

@JingWoo commented on GitHub (Jul 29, 2024):

@JingWoo you can setup ollama on the acend npu divice ? Do you have any guide doc for it ?

Yes, we have currently adapted Ollama to support Ascend devices,please refer to https://github.com/ollama/ollama/pull/5872

<!-- gh-comment-id:2254782674 --> @JingWoo commented on GitHub (Jul 29, 2024): > @JingWoo you can setup ollama on the acend npu divice ? Do you have any guide doc for it ? Yes, we have currently adapted Ollama to support Ascend devices,please refer to https://github.com/ollama/ollama/pull/5872
Author
Owner

@zhongTao99 commented on GitHub (Jul 29, 2024):

@myoss @cosren We have completed the adaptation of Ascend and verified that ollama uses Ascend NPU devices.
origin

<!-- gh-comment-id:2254883889 --> @zhongTao99 commented on GitHub (Jul 29, 2024): @myoss @cosren We have completed the adaptation of Ascend and verified that ollama uses Ascend NPU devices. ![origin](https://github.com/user-attachments/assets/04031a77-dfe8-47b8-9b58-e9d262c74216)
Author
Owner

@JerryYao80 commented on GitHub (Oct 4, 2024):

@zhongTao99 请教一个问题,在容器内外,是不是310P的CardID会改变?
在容器外:
+--------------------------------------------------------------------------------------------------------+
| npu-smi 24.1.rc2 Version: 24.1.rc2 |
+-------------------------------+-----------------+------------------------------------------------------+
| NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page) |
| Chip Device | Bus-Id | AICore(%) Memory-Usage(MB) |
+===============================+=================+======================================================+
| 1 310P3 | OK | NA 59 0 / 0 |
| 0 0 | 0000:01:00.0 | 0 1397 / 44280 |
+-------------------------------+-----------------+------------------------------------------------------+
| 1 310P3 | OK | NA 58 0 / 0 |
| 1 1 | 0000:01:00.0 | 0 1520 / 43693 |
+===============================+=================+======================================================+
| 2 310P3 | OK | NA 58 0 / 0 |
| 0 2 | 0000:02:00.0 | 0 1599 / 44280 |
+-------------------------------+-----------------+------------------------------------------------------+
| 2 310P3 | OK | NA 56 0 / 0 |
| 1 3 | 0000:02:00.0 | 0 1264 / 43693 |
+===============================+=================+======================================================+
| 4 310P3 | OK | NA 54 0 / 0 |
| 0 4 | 0000:81:00.0 | 0 1254 / 44280 |
+-------------------------------+-----------------+------------------------------------------------------+
| 4 310P3 | OK | NA 55 0 / 0 |
| 1 5 | 0000:81:00.0 | 0 1612 / 43693 |
+===============================+=================+======================================================+
| 5 310P3 | OK | NA 53 0 / 0 |
| 0 6 | 0000:82:00.0 | 0 1355 / 44280 |
+-------------------------------+-----------------+------------------------------------------------------+
| 5 310P3 | OK | NA 51 0 / 0 |
| 1 7 | 0000:82:00.0 | 0 1508 / 43693 |
+===============================+=================+======================================================+

执行docker命令:
docker run -it -e ASCEND_VISIBLE_DEVICES=1,2,4,5 --privileged -u root --ipc=host --network host --device=/dev/davinci0 --device=/dev/davinci1 --device=/dev/davinci_manager --device=/dev/devmm_svm --device=/dev/hisi_hdc -v /etc/localtime:/etc/localtime -v /usr/local/Ascend/driver:/usr/local/Ascend/driver -v /var/log/npu/:/usr/slog -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi -v /usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/common -v /var/log/npu/conf/slog/slog.conf:/var/log/npu/conf/slog/slog.conf -v /var/log/npu:/var/log/npu/dump --name llm_app -v /mnt/store/models:/mnt/store/models swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:1.0.RC2-300I-Duo-arm64 /bin/bash

在容器内就变成了:
+-------------------------------+-----------------+------------------------------------------------------+
| NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page) |
| Chip Device | Bus-Id | AICore(%) Memory-Usage(MB) |
+===============================+=================+======================================================+
| 0 310P3 | OK | NA 59 0 / 0 |
| 0 0 | 0000:01:00.0 | 0 1397 / 44280 |
+-------------------------------+-----------------+------------------------------------------------------+
| 0 310P3 | OK | NA 58 0 / 0 |
| 1 1 | 0000:01:00.0 | 0 1520 / 43693 |
+===============================+=================+======================================================+
| 32 310P3 | OK | NA 58 0 / 0 |
| 0 2 | 0000:02:00.0 | 0 1600 / 44280 |
+-------------------------------+-----------------+------------------------------------------------------+
| 32 310P3 | OK | NA 56 0 / 0 |
| 1 3 | 0000:02:00.0 | 0 1264 / 43693 |
+===============================+=================+======================================================+
| 32768 310P3 | OK | NA 54 0 / 0 |
| 0 4 | 0000:81:00.0 | 0 1255 / 44280 |
+-------------------------------+-----------------+------------------------------------------------------+
| 32768 310P3 | OK | NA 55 0 / 0 |
| 1 5 | 0000:81:00.0 | 0 1611 / 43693 |
+===============================+=================+======================================================+
| 32800 310P3 | OK | NA 52 0 / 0 |
| 0 6 | 0000:82:00.0 | 0 1355 / 44280 |
+-------------------------------+-----------------+------------------------------------------------------+
| 32800 310P3 | OK | NA 51 0 / 0 |
| 1 7 | 0000:82:00.0 | 0 1507 / 43693 |
+===============================+=================+======================================================+

<!-- gh-comment-id:2393719957 --> @JerryYao80 commented on GitHub (Oct 4, 2024): @zhongTao99 请教一个问题,在容器内外,是不是310P的CardID会改变? 在容器外: +--------------------------------------------------------------------------------------------------------+ | npu-smi 24.1.rc2 Version: 24.1.rc2 | +-------------------------------+-----------------+------------------------------------------------------+ | NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page) | | Chip Device | Bus-Id | AICore(%) Memory-Usage(MB) | +===============================+=================+======================================================+ | 1 310P3 | OK | NA 59 0 / 0 | | 0 0 | 0000:01:00.0 | 0 1397 / 44280 | +-------------------------------+-----------------+------------------------------------------------------+ | 1 310P3 | OK | NA 58 0 / 0 | | 1 1 | 0000:01:00.0 | 0 1520 / 43693 | +===============================+=================+======================================================+ | 2 310P3 | OK | NA 58 0 / 0 | | 0 2 | 0000:02:00.0 | 0 1599 / 44280 | +-------------------------------+-----------------+------------------------------------------------------+ | 2 310P3 | OK | NA 56 0 / 0 | | 1 3 | 0000:02:00.0 | 0 1264 / 43693 | +===============================+=================+======================================================+ | 4 310P3 | OK | NA 54 0 / 0 | | 0 4 | 0000:81:00.0 | 0 1254 / 44280 | +-------------------------------+-----------------+------------------------------------------------------+ | 4 310P3 | OK | NA 55 0 / 0 | | 1 5 | 0000:81:00.0 | 0 1612 / 43693 | +===============================+=================+======================================================+ | 5 310P3 | OK | NA 53 0 / 0 | | 0 6 | 0000:82:00.0 | 0 1355 / 44280 | +-------------------------------+-----------------+------------------------------------------------------+ | 5 310P3 | OK | NA 51 0 / 0 | | 1 7 | 0000:82:00.0 | 0 1508 / 43693 | +===============================+=================+======================================================+ 执行docker命令: docker run -it -e ASCEND_VISIBLE_DEVICES=1,2,4,5 --privileged -u root --ipc=host --network host --device=/dev/davinci0 --device=/dev/davinci1 --device=/dev/davinci_manager --device=/dev/devmm_svm --device=/dev/hisi_hdc -v /etc/localtime:/etc/localtime -v /usr/local/Ascend/driver:/usr/local/Ascend/driver -v /var/log/npu/:/usr/slog -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi -v /usr/local/Ascend/driver/lib64/common:/usr/local/Ascend/driver/lib64/common -v /var/log/npu/conf/slog/slog.conf:/var/log/npu/conf/slog/slog.conf -v /var/log/npu:/var/log/npu/dump --name llm_app -v /mnt/store/models:/mnt/store/models swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:1.0.RC2-300I-Duo-arm64 /bin/bash 在容器内就变成了: +-------------------------------+-----------------+------------------------------------------------------+ | NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page) | | Chip Device | Bus-Id | AICore(%) Memory-Usage(MB) | +===============================+=================+======================================================+ | 0 310P3 | OK | NA 59 0 / 0 | | 0 0 | 0000:01:00.0 | 0 1397 / 44280 | +-------------------------------+-----------------+------------------------------------------------------+ | 0 310P3 | OK | NA 58 0 / 0 | | 1 1 | 0000:01:00.0 | 0 1520 / 43693 | +===============================+=================+======================================================+ | 32 310P3 | OK | NA 58 0 / 0 | | 0 2 | 0000:02:00.0 | 0 1600 / 44280 | +-------------------------------+-----------------+------------------------------------------------------+ | 32 310P3 | OK | NA 56 0 / 0 | | 1 3 | 0000:02:00.0 | 0 1264 / 43693 | +===============================+=================+======================================================+ | 32768 310P3 | OK | NA 54 0 / 0 | | 0 4 | 0000:81:00.0 | 0 1255 / 44280 | +-------------------------------+-----------------+------------------------------------------------------+ | 32768 310P3 | OK | NA 55 0 / 0 | | 1 5 | 0000:81:00.0 | 0 1611 / 43693 | +===============================+=================+======================================================+ | 32800 310P3 | OK | NA 52 0 / 0 | | 0 6 | 0000:82:00.0 | 0 1355 / 44280 | +-------------------------------+-----------------+------------------------------------------------------+ | 32800 310P3 | OK | NA 51 0 / 0 | | 1 7 | 0000:82:00.0 | 0 1507 / 43693 | +===============================+=================+======================================================+
Author
Owner

@geweixuan commented on GitHub (Oct 28, 2024):

@myoss @cosren We have completed the adaptation of Ascend and verified that ollama uses Ascend NPU devices. origin origin

大佬是怎么做到的呢,我也很感兴趣

<!-- gh-comment-id:2441050990 --> @geweixuan commented on GitHub (Oct 28, 2024): > @myoss @cosren We have completed the adaptation of Ascend and verified that ollama uses Ascend NPU devices. ![origin](https://private-user-images.githubusercontent.com/56594937/352907391-04031a77-dfe8-47b8-9b58-e9d262c74216.gif?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzAxMDgwNDUsIm5iZiI6MTczMDEwNzc0NSwicGF0aCI6Ii81NjU5NDkzNy8zNTI5MDczOTEtMDQwMzFhNzctZGZlOC00N2I4LTliNTgtZTlkMjYyYzc0MjE2LmdpZj9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDEwMjglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQxMDI4VDA5MjkwNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTYwZDllMjY5MWFkNGNiYjI1YjljMTM0ZWU4ZmZhZmZhZWJlNDFlMzk3NzRkMzUwNzJmM2VhN2E2MjU5YTgxZGQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.mMWdkqIXNAV2PeTtYryWS52zOVAZxheOXvO2V06fEO4) [ ![origin](https://private-user-images.githubusercontent.com/56594937/352907391-04031a77-dfe8-47b8-9b58-e9d262c74216.gif?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzAxMDgwNDUsIm5iZiI6MTczMDEwNzc0NSwicGF0aCI6Ii81NjU5NDkzNy8zNTI5MDczOTEtMDQwMzFhNzctZGZlOC00N2I4LTliNTgtZTlkMjYyYzc0MjE2LmdpZj9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDEwMjglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQxMDI4VDA5MjkwNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTYwZDllMjY5MWFkNGNiYjI1YjljMTM0ZWU4ZmZhZmZhZWJlNDFlMzk3NzRkMzUwNzJmM2VhN2E2MjU5YTgxZGQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.mMWdkqIXNAV2PeTtYryWS52zOVAZxheOXvO2V06fEO4) ](https://private-user-images.githubusercontent.com/56594937/352907391-04031a77-dfe8-47b8-9b58-e9d262c74216.gif?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzAxMDgwNDUsIm5iZiI6MTczMDEwNzc0NSwicGF0aCI6Ii81NjU5NDkzNy8zNTI5MDczOTEtMDQwMzFhNzctZGZlOC00N2I4LTliNTgtZTlkMjYyYzc0MjE2LmdpZj9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDEwMjglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQxMDI4VDA5MjkwNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTYwZDllMjY5MWFkNGNiYjI1YjljMTM0ZWU4ZmZhZmZhZWJlNDFlMzk3NzRkMzUwNzJmM2VhN2E2MjU5YTgxZGQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.mMWdkqIXNAV2PeTtYryWS52zOVAZxheOXvO2V06fEO4) [ ](https://private-user-images.githubusercontent.com/56594937/352907391-04031a77-dfe8-47b8-9b58-e9d262c74216.gif?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzAxMDgwNDUsIm5iZiI6MTczMDEwNzc0NSwicGF0aCI6Ii81NjU5NDkzNy8zNTI5MDczOTEtMDQwMzFhNzctZGZlOC00N2I4LTliNTgtZTlkMjYyYzc0MjE2LmdpZj9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDEwMjglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQxMDI4VDA5MjkwNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTYwZDllMjY5MWFkNGNiYjI1YjljMTM0ZWU4ZmZhZmZhZWJlNDFlMzk3NzRkMzUwNzJmM2VhN2E2MjU5YTgxZGQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.mMWdkqIXNAV2PeTtYryWS52zOVAZxheOXvO2V06fEO4) 大佬是怎么做到的呢,我也很感兴趣
Author
Owner

@HartleyLau commented on GitHub (Nov 22, 2024):

I look forward to this PR 5872 being merged,I'm currently using Huawei Ascend NPU hardware

<!-- gh-comment-id:2493326436 --> @HartleyLau commented on GitHub (Nov 22, 2024): I look forward to this PR [5872](https://github.com/ollama/ollama/pull/5872) being merged,I'm currently using Huawei Ascend NPU hardware
Author
Owner

@James4Ever0 commented on GitHub (Feb 13, 2025):

@JingWoo Would it be possible to see if we can add an Ascend AI server for testing? Where could we request hardware for testing?

Every single Ollama release goes through testing before release. We've been working with hardware vendors to build out a test matrix for stability and reliability checks.

@mchiang0610 @JingWoo

Hello from China! We would like to provide the test environment for Ollama, so here we want to ask a few questions.

What are your specific hardware requirements for the test environment? For example, is remote or cloud environment OK or local only? How many NPU cards are needed? What model of NPU would be required for testing?

<!-- gh-comment-id:2655525573 --> @James4Ever0 commented on GitHub (Feb 13, 2025): > @JingWoo Would it be possible to see if we can add an Ascend AI server for testing? Where could we request hardware for testing? > > Every single Ollama release goes through testing before release. We've been working with hardware vendors to build out a test matrix for stability and reliability checks. @mchiang0610 @JingWoo Hello from China! We would like to provide the test environment for Ollama, so here we want to ask a few questions. What are your specific hardware requirements for the test environment? For example, is remote or cloud environment OK or local only? How many NPU cards are needed? What model of NPU would be required for testing?
Author
Owner

@wangcong19940121 commented on GitHub (Aug 21, 2025):

@JingWoo
您好,请教一个问题。我这边leopony/ollama-cann-300i-duo镜像文件成功运行,启动ollama服务,识别出部署的Ascend310(不是310P)。

容器ollama list成功识别到本地模型deepseek-r1。

出现问题:
run大模型的时候,出现如下错误:Error:llama runner process has terminated:error load model:vector::_M_range_check:_n(which is 6)>=this->size()(which is 6)。

对此不甚理解,是因为Ascend310上没有300i-duo?还是镜像文件的问题?还是适配环境出现的问题?

<!-- gh-comment-id:3209402234 --> @wangcong19940121 commented on GitHub (Aug 21, 2025): @JingWoo 您好,请教一个问题。我这边leopony/ollama-cann-300i-duo镜像文件成功运行,启动ollama服务,识别出部署的Ascend310(不是310P)。 容器ollama list成功识别到本地模型deepseek-r1。 # 出现问题: run大模型的时候,出现如下错误:Error:llama runner process has terminated:error load model:vector::_M_range_check:_n(which is 6)>=this->size()(which is 6)。 对此不甚理解,是因为Ascend310上没有300i-duo?还是镜像文件的问题?还是适配环境出现的问题?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#65366