[PR #7554] feat: Support Moore Threads GPU #38331

Open
opened 2026-04-22 22:59:54 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/7554
Author: @yeahdongcn
Created: 11/7/2024
Status: 🔄 Open

Base: mainHead: v0.4.0_musa


📝 Commits (9)

  • f935f79 gpu_info_cudart: get gpu_name from props
  • 013e8fb Fix typo in llama/runner/README.md
  • ada35d4 musa: add ggml-musa
  • 89981b3 musa: support discover Moore Threads GPU through libmusa/libmusart/libmtml
  • 48b22f6 musa: support building musa_v3
  • 54f5c14 musa: support docker build
  • 6c82771 musa: add build test for musa
  • 2cef5a7 musa: support new arch mp_31
  • a559b2f musa: upgrade to rc4.0.1 (requires ubuntu:22.04)

📊 Changes

25 files changed (+1464 additions, -25 deletions)

View changed files

📝 .github/workflows/test.yaml (+4 -0)
📝 CMakeLists.txt (+30 -0)
📝 CMakePresets.json (+21 -0)
📝 Dockerfile (+30 -1)
📝 discover/gpu.go (+344 -6)
📝 discover/gpu_info.h (+4 -1)
📝 discover/gpu_info_cudart.c (+2 -0)
📝 discover/gpu_info_cudart.h (+1 -1)
discover/gpu_info_mtml.c (+141 -0)
discover/gpu_info_mtml.h (+49 -0)
discover/gpu_info_mtmusa.c (+250 -0)
discover/gpu_info_mtmusa.h (+69 -0)
discover/gpu_info_musart.c (+185 -0)
discover/gpu_info_musart.h (+140 -0)
📝 discover/gpu_info_nvcuda.c (+4 -5)
📝 discover/gpu_info_nvml.c (+1 -1)
📝 discover/gpu_linux.go (+23 -0)
📝 discover/gpu_test.go (+1 -1)
📝 discover/gpu_windows.go (+11 -0)
discover/musa_common.go (+26 -0)

...and 5 more files

📄 Description

This PR introduces support for Moore Threads GPUs, leveraging MUSA (Moore Threads Unified System Architecture)’s capabilities to accelerate LLM inference. Due to significant upstream changes in version 0.4.x, this PR is a fresh submission (refer to https://github.com/ollama/ollama/pull/5556 for additional context) with the following key updates:

Key Updates:

  1. Moore Threads GPU Detection: Detects Moore Threads GPUs using libmusa, libmusart, and libmtml, similar to the existing CUDA implementation.
  2. MUSA 4 flavor: Adds support for building the musa_v4 for MTT GPUs.
  3. Docker Image Build: Provides support for building Docker images alongside CUDA/ROCm integration.

Testing Done:

  1. Local Build on Linux/amd64 Host with MUSA SDK rc4.0.1 installed
    • Successful build with cmake
    • Verified ldd /home/xiaodongye/ws/yeahdongcn/ollama/build/lib/ollama/libggml-musa.so link correctly against MUSA libraries.
    • Ran the qwen2.5 model using Ollama on the host: interactive inference performed as expected, with the model loaded and utilized on the MTT GPU.
    • Ran the llama3.2-vision:11b model using Ollama on the host: interactive inference performed as expected, with the model loaded and utilized on the MTT GPU.
  2. runtime-musa Docker Image Build
    • Executed PLATFORM=linux/amd64 DOCKER_ORG=mthreads PUSH=1 ./scripts/build_docker.sh successfully.
    • Verified container functionality with docker run --env OLLAMA_DEBUG=1 -v ollama:/root/.ollama -it mthreads/ollama:0.5.12-14-g07153c2-musa: the Ollama server runs and MTT S80 GPU is discovered as expected.
    • Inside the container, tested qwen2.5, deepseek-r1 and gemma3 model execution: interactive inference performed as expected, with the model loaded and utilized on the MTT GPU.
  3. Tested the structured outputs feature in version 0.5.x using curl. It works as expected.
  4. Tested newly supported model falcon3:1b/3b. It works as expected.

Please refer to the full logs here:

gpu_discover.log
model_load.log

Run in container

$ docker run --env OLLAMA_DEBUG=1 -d -v ollama:/root/.ollama -p 11434:11434 --name ollama-musa \
  	mthreads/ollama:0.9.0-9-ga559b2f-musa-rc4.0.1

Edit Logs

  • 2024/12/08 - Rebase upstream/main and add build test for musa_v1 runner: test/runners-linux-musa (rc3.1.0)
  • 2025/01/10 - Rebased upstream/main, all tests passed.
  • 2025/01/15 - Rebased upstream/main, all tests passed.
  • 2025/01/21 - Tested deepseek-r1.
  • 2025/01/26 - Rebased upstream/main, all tests passed.
  • 2025/02/14 - Upgraded MUSA SDK to rc3.1.1, all tests passed.
  • 2025/02/27 - Rebased upstream/main, all tests passed.
  • 2025/03/06 - Rebased upstream/main, all tests passed.
  • 2025/03/06 - Rebased upstream/main and added new arch mp_31, all tests passed.
  • 2025/03/14 - Rebased upstream/main, all tests passed (gemma3 not tested yet).
  • 2025/03/31 - Rebased upstream/main, all tests passed (gemma3:4b tested).
  • 2025/04/10 - Rebased upstream/main, all tests passed.
  • 2025/04/21 - Rebased upstream/main, all tests passed.
  • 2025/05/06 - Rebased upstream/main, all tests passed.
  • 2025/05/17 - Rebased upstream/main, all tests passed.
  • 2025/05/26 - Reabsed tag/v0.7.1 and upgrade MUSA SDK to rc4.0.1, all tests passed.
  • 2025/05/26 - Reabsed tag/v0.9.0, all tests passed.

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/7554 **Author:** [@yeahdongcn](https://github.com/yeahdongcn) **Created:** 11/7/2024 **Status:** 🔄 Open **Base:** `main` ← **Head:** `v0.4.0_musa` --- ### 📝 Commits (9) - [`f935f79`](https://github.com/ollama/ollama/commit/f935f7912cadb2a749f337f5938e42031e5a43b8) gpu_info_cudart: get gpu_name from props - [`013e8fb`](https://github.com/ollama/ollama/commit/013e8fb4b5a426b6b15e2846c96f6f8a2969e06d) Fix typo in llama/runner/README.md - [`ada35d4`](https://github.com/ollama/ollama/commit/ada35d4f06670e5112712403ac2860a4622091e7) musa: add ggml-musa - [`89981b3`](https://github.com/ollama/ollama/commit/89981b3175cdce5216f5482452e7b2501982ffce) musa: support discover Moore Threads GPU through libmusa/libmusart/libmtml - [`48b22f6`](https://github.com/ollama/ollama/commit/48b22f6d110c8f8f2195f4686a955e10fc760fdb) musa: support building musa_v3 - [`54f5c14`](https://github.com/ollama/ollama/commit/54f5c14a98364e01b17d12ae27e781808b7b9b34) musa: support docker build - [`6c82771`](https://github.com/ollama/ollama/commit/6c82771e269cde45ae90f117380107c7a3f93add) musa: add build test for musa - [`2cef5a7`](https://github.com/ollama/ollama/commit/2cef5a72b1fc630b4b4fa7f0aa327dfc8854b4d2) musa: support new arch mp_31 - [`a559b2f`](https://github.com/ollama/ollama/commit/a559b2fe4d9cac55a70e14e36ba99404c3bf4f7b) musa: upgrade to rc4.0.1 (requires ubuntu:22.04) ### 📊 Changes **25 files changed** (+1464 additions, -25 deletions) <details> <summary>View changed files</summary> 📝 `.github/workflows/test.yaml` (+4 -0) 📝 `CMakeLists.txt` (+30 -0) 📝 `CMakePresets.json` (+21 -0) 📝 `Dockerfile` (+30 -1) 📝 `discover/gpu.go` (+344 -6) 📝 `discover/gpu_info.h` (+4 -1) 📝 `discover/gpu_info_cudart.c` (+2 -0) 📝 `discover/gpu_info_cudart.h` (+1 -1) ➕ `discover/gpu_info_mtml.c` (+141 -0) ➕ `discover/gpu_info_mtml.h` (+49 -0) ➕ `discover/gpu_info_mtmusa.c` (+250 -0) ➕ `discover/gpu_info_mtmusa.h` (+69 -0) ➕ `discover/gpu_info_musart.c` (+185 -0) ➕ `discover/gpu_info_musart.h` (+140 -0) 📝 `discover/gpu_info_nvcuda.c` (+4 -5) 📝 `discover/gpu_info_nvml.c` (+1 -1) 📝 `discover/gpu_linux.go` (+23 -0) 📝 `discover/gpu_test.go` (+1 -1) 📝 `discover/gpu_windows.go` (+11 -0) ➕ `discover/musa_common.go` (+26 -0) _...and 5 more files_ </details> ### 📄 Description This PR introduces support for [Moore Threads](https://en.mthreads.com/) GPUs, leveraging MUSA (Moore Threads Unified System Architecture)’s capabilities to accelerate LLM inference. Due to significant upstream changes in version 0.4.x, this PR is a fresh submission (refer to https://github.com/ollama/ollama/pull/5556 for additional context) with the following key updates: ### Key Updates: 1. Moore Threads GPU Detection: Detects Moore Threads GPUs using `libmusa`, `libmusart`, and `libmtml`, similar to the existing `CUDA` implementation. 2. `MUSA 4` flavor: Adds support for building the `musa_v4` for MTT GPUs. 3. Docker Image Build: Provides support for building Docker images alongside `CUDA`/`ROCm` integration. ### Testing Done: 1. Local Build on `Linux/amd64` Host with `MUSA SDK rc4.0.1` installed - [x] Successful build with `cmake` - [x] Verified `ldd /home/xiaodongye/ws/yeahdongcn/ollama/build/lib/ollama/libggml-musa.so` link correctly against MUSA libraries. - [x] Ran the `qwen2.5` model using Ollama on the host: interactive inference performed as expected, with the model loaded and utilized on the MTT GPU. - [x] Ran the `llama3.2-vision:11b` model using Ollama on the host: interactive inference performed as expected, with the model loaded and utilized on the MTT GPU. 2. `runtime-musa` Docker Image Build - [x] Executed `PLATFORM=linux/amd64 DOCKER_ORG=mthreads PUSH=1 ./scripts/build_docker.sh` successfully. - [x] Verified container functionality with `docker run --env OLLAMA_DEBUG=1 -v ollama:/root/.ollama -it mthreads/ollama:0.5.12-14-g07153c2-musa`: the Ollama server runs and MTT S80 GPU is discovered as expected. - [x] Inside the container, tested `qwen2.5`, `deepseek-r1` and `gemma3` model execution: interactive inference performed as expected, with the model loaded and utilized on the MTT GPU. 3. Tested the `structured outputs` feature in version `0.5.x` using `curl`. It works as expected. 4. Tested newly supported model `falcon3:1b/3b`. It works as expected. Please refer to the full logs here: [gpu_discover.log](https://github.com/user-attachments/files/17660508/gpu_discover.log) [model_load.log](https://github.com/user-attachments/files/17660507/model_load.log) ### Run in container ```bash $ docker run --env OLLAMA_DEBUG=1 -d -v ollama:/root/.ollama -p 11434:11434 --name ollama-musa \ mthreads/ollama:0.9.0-9-ga559b2f-musa-rc4.0.1 ``` ### Edit Logs - 2024/12/08 - Rebase upstream/main and add build test for musa_v1 runner: test/runners-linux-musa (rc3.1.0) - 2025/01/10 - Rebased `upstream/main`, all tests passed. - 2025/01/15 - Rebased `upstream/main`, all tests passed. - 2025/01/21 - Tested `deepseek-r1`. - 2025/01/26 - Rebased `upstream/main`, all tests passed. - 2025/02/14 - Upgraded MUSA SDK to `rc3.1.1`, all tests passed. - 2025/02/27 - Rebased `upstream/main`, all tests passed. - 2025/03/06 - Rebased `upstream/main`, all tests passed. - 2025/03/06 - Rebased `upstream/main` and added new arch `mp_31`, all tests passed. - 2025/03/14 - Rebased `upstream/main`, all tests passed (`gemma3` not tested yet). - 2025/03/31 - Rebased `upstream/main`, all tests passed (`gemma3:4b` tested). - 2025/04/10 - Rebased `upstream/main`, all tests passed. - 2025/04/21 - Rebased `upstream/main`, all tests passed. - 2025/05/06 - Rebased `upstream/main`, all tests passed. - 2025/05/17 - Rebased `upstream/main`, all tests passed. - 2025/05/26 - Reabsed `tag/v0.7.1` and upgrade MUSA SDK to rc4.0.1, all tests passed. - 2025/05/26 - Reabsed `tag/v0.9.0`, all tests passed. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-22 22:59:54 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#38331