[PR #6760] [MERGED] IBM granite/granitemoe architecture support #22758

Closed
opened 2026-04-19 16:32:55 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/6760
Author: @gabe-l-hart
Created: 9/11/2024
Status: Merged
Merged: 10/17/2024
Merged by: @jessegross

Base: mainHead: IBMGraniteArchitectureSupport


📝 Commits (10+)

  • aebcdfd fix(ext_server): Port llama.cpp sampling refactors to ext_server
  • 0400048 fix(server.cpp): Refactor server.cpp logging for llama.cpp overhaul
  • 61c9667 feat: Bump llama.cpp to the latest master with granite support
  • 66a4081 fix(patches): Update all patches (except solar-pro) to work with bumped llama.cpp
  • 1063e6e fix(solar): Update solar patch for llama.cpp bump
  • 35ed3f9 feat(llama.cpp): Bump llama.cpp for granitemoe support
  • ce4c665 feat(llama.cpp): Bump llama.cpp for granitemoe support
  • 44d25a1 fix(solar): Update the solar-pro patch for latest llama.cpp bump
  • 97a6273 feat(llama.cpp): Bump to the latest master of llama.cpp
  • 519176f fix(patches): Update all patches for latest bump

📊 Changes

263 files changed (+14226 additions, -10838 deletions)

View changed files

📝 llama/build-info.cpp (+1 -1)
📝 llama/clip.cpp (+76 -79)
📝 llama/clip.h (+1 -1)
📝 llama/common.cpp (+359 -1955)
📝 llama/common.h (+123 -56)
📝 llama/ggml-aarch64.c (+2128 -1099)
📝 llama/ggml-aarch64.h (+1 -1)
📝 llama/ggml-alloc.c (+7 -1)
📝 llama/ggml-alloc.h (+1 -1)
📝 llama/ggml-backend-impl.h (+11 -10)
📝 llama/ggml-backend.c (+39 -2)
📝 llama/ggml-backend.h (+3 -2)
📝 llama/ggml-blas.cpp (+2 -1)
📝 llama/ggml-blas.h (+1 -1)
📝 llama/ggml-common.h (+21 -1)
llama/ggml-cpu-impl.h (+640 -0)
📝 llama/ggml-cuda.cu (+122 -21)
📝 llama/ggml-cuda.h (+1 -1)
📝 llama/ggml-cuda/acc.cu (+1 -1)
📝 llama/ggml-cuda/acc.cuh (+1 -1)

...and 80 more files

📄 Description

Special Note

Since this PR bumps llama.cpp past the tip of master (6026da52 as of writing this), it includes the recent changes to overhaul sampling and logging. I updated server.cpp so that it compiles and can run the models successfully. I also updated all of the patches to apply to the updated llama.cpp codebase.

Dependencies

UPDATE: This PR no longer has dependencies. The first llama.cpp PR has been merged to support granite, and given our hope to release soon, we'd like to get this merged without granitemoe support and add that in a follow-up PR.

UPDATE 2: Both granite and granitemoe are now supported in llama.cpp. I've rebased the PR to include them (and to pick up support for chameleon).

This PR is dependent on two PRs in llama.cpp:

Currently, the branch will not build since the submodule points to a commit on my fork and I have not changed the remote url. Once the llama.cpp PRs are merged, I will update the submodule pointer to the mainline.

Description

This PR adds support for IBM's granite architecture. See the llama.cpp PRs for full details on the added architectures.

Testing

In order to test this while it's in draft, I did the following:

# Download the IBM research experimental models (need huggingface-cli in python)
huggingface-cli download ibm/PowerLM-3b --local-dir $HOME/models/powerlm-3b
huggingface-cli download ibm/PowerMoE-3b --local-dir $HOME/models/powermoe-3b

# Convert to GGUF using the latest version of llama.cpp (I'm doing it here in the submodule)
cd llm/llama.cpp
pip install -r requirements/requirements-convert_hf_to_gguf.txt
python convert_to_gguf.py $HOME/models/powerlm-3b
python convert_to_gguf.py $HOME/models/powermoe-3b
cd -

# Build the llama-quantize binary in the submodule
cd llm/build/darwin/arm64_static/
make llama-quantize -j
cd -

# Quantize with the locally built llama-quantize
./llm/build/darwin/arm64_static/bin/llama-quantize $HOME/models/powerlm-3b Q4_K_M
./llm/build/darwin/arm64_static/bin/llama-quantize $HOME/models/powermoe-3b Q4_K_M

# Import to ollama (finally!)
echo "FROM $HOME/models/powerlm-3b/ggml-model-Q4_K_M.gguf" > Modelfile.powerlm-3b
./ollama create -f Modelfile.powerlm-3b powerlm:3b
echo "FROM $HOME/models/powermoe-3b/ggml-model-Q4_K_M.gguf" > Modelfile.powermoe-3b
./ollama create -f Modelfile.powermoe-3b powermoe:3b
Old instructions for building from my fork

build ollama

# Add my personal fork as a remote in the submodule
cd llm/llama.cpp
git remote add gabe https://github.com/gabe-l-hart/llama.cpp.git
git fetch gabe
cd -

# Generate and build like normal
go generate ./...
go build .

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/6760 **Author:** [@gabe-l-hart](https://github.com/gabe-l-hart) **Created:** 9/11/2024 **Status:** ✅ Merged **Merged:** 10/17/2024 **Merged by:** [@jessegross](https://github.com/jessegross) **Base:** `main` ← **Head:** `IBMGraniteArchitectureSupport` --- ### 📝 Commits (10+) - [`aebcdfd`](https://github.com/ollama/ollama/commit/aebcdfdf9d67d3b82422b6f68618b76a3253a60e) fix(ext_server): Port llama.cpp sampling refactors to ext_server - [`0400048`](https://github.com/ollama/ollama/commit/040004844a168530a1545adca6cd7c7cacc854bb) fix(server.cpp): Refactor server.cpp logging for llama.cpp overhaul - [`61c9667`](https://github.com/ollama/ollama/commit/61c96673814937a0fdd45b485fb004e603f7e91c) feat: Bump llama.cpp to the latest master with `granite` support - [`66a4081`](https://github.com/ollama/ollama/commit/66a4081036b96e31f5b82d132157e3e6afe9368d) fix(patches): Update all patches (except solar-pro) to work with bumped llama.cpp - [`1063e6e`](https://github.com/ollama/ollama/commit/1063e6e967edf75ae983bbd32b7b4c4902661dff) fix(solar): Update solar patch for llama.cpp bump - [`35ed3f9`](https://github.com/ollama/ollama/commit/35ed3f92015dc2e08a22943525142cb53143668c) feat(llama.cpp): Bump llama.cpp for granitemoe support - [`ce4c665`](https://github.com/ollama/ollama/commit/ce4c66529063656af24ece0f82de1c444bdb298a) feat(llama.cpp): Bump llama.cpp for granitemoe support - [`44d25a1`](https://github.com/ollama/ollama/commit/44d25a1dbfdf7b24575e3b74a50d80985c346dcc) fix(solar): Update the solar-pro patch for latest llama.cpp bump - [`97a6273`](https://github.com/ollama/ollama/commit/97a627324e5d6746ffa23b9f4661009b7f3cfe6e) feat(llama.cpp): Bump to the latest master of llama.cpp - [`519176f`](https://github.com/ollama/ollama/commit/519176f113f7cc3ed96f8b041fd932a13e4a7a55) fix(patches): Update all patches for latest bump ### 📊 Changes **263 files changed** (+14226 additions, -10838 deletions) <details> <summary>View changed files</summary> 📝 `llama/build-info.cpp` (+1 -1) 📝 `llama/clip.cpp` (+76 -79) 📝 `llama/clip.h` (+1 -1) 📝 `llama/common.cpp` (+359 -1955) 📝 `llama/common.h` (+123 -56) 📝 `llama/ggml-aarch64.c` (+2128 -1099) 📝 `llama/ggml-aarch64.h` (+1 -1) 📝 `llama/ggml-alloc.c` (+7 -1) 📝 `llama/ggml-alloc.h` (+1 -1) 📝 `llama/ggml-backend-impl.h` (+11 -10) 📝 `llama/ggml-backend.c` (+39 -2) 📝 `llama/ggml-backend.h` (+3 -2) 📝 `llama/ggml-blas.cpp` (+2 -1) 📝 `llama/ggml-blas.h` (+1 -1) 📝 `llama/ggml-common.h` (+21 -1) ➕ `llama/ggml-cpu-impl.h` (+640 -0) 📝 `llama/ggml-cuda.cu` (+122 -21) 📝 `llama/ggml-cuda.h` (+1 -1) 📝 `llama/ggml-cuda/acc.cu` (+1 -1) 📝 `llama/ggml-cuda/acc.cuh` (+1 -1) _...and 80 more files_ </details> ### 📄 Description ## Special Note Since this PR bumps `llama.cpp` past the tip of `master` (`6026da52` as of writing this), it includes the recent changes to overhaul `sampling` and logging. I updated `server.cpp` so that it compiles and can run the models successfully. I also updated all of the patches to apply to the updated `llama.cpp` codebase. ## Dependencies ~~UPDATE: This PR no longer has dependencies. The first `llama.cpp` PR has been merged to support `granite`, and given our hope to release soon, we'd like to get this merged without `granitemoe` support and add that in a follow-up PR.~~ UPDATE 2: Both `granite` and `granitemoe` are now supported in `llama.cpp`. I've rebased the PR to include them (and to pick up support for `chameleon`). ~~This PR is dependent on two PRs in `llama.cpp`:~~ * [x] Support for `granite`: https://github.com/ggerganov/llama.cpp/pull/9412 * [x] Support for `granitemoe`: https://github.com/ggerganov/llama.cpp/pull/9438 ~~Currently, the branch will not build since the submodule points to a commit on my fork and I have not changed the remote url. Once the `llama.cpp` PRs are merged, I will update the submodule pointer to the mainline.~~ ## Description This PR adds support for IBM's `granite` architecture. See the `llama.cpp` PRs for full details on the added architectures. ## Testing In order to test this while it's in draft, I did the following: ```sh # Download the IBM research experimental models (need huggingface-cli in python) huggingface-cli download ibm/PowerLM-3b --local-dir $HOME/models/powerlm-3b huggingface-cli download ibm/PowerMoE-3b --local-dir $HOME/models/powermoe-3b # Convert to GGUF using the latest version of llama.cpp (I'm doing it here in the submodule) cd llm/llama.cpp pip install -r requirements/requirements-convert_hf_to_gguf.txt python convert_to_gguf.py $HOME/models/powerlm-3b python convert_to_gguf.py $HOME/models/powermoe-3b cd - # Build the llama-quantize binary in the submodule cd llm/build/darwin/arm64_static/ make llama-quantize -j cd - # Quantize with the locally built llama-quantize ./llm/build/darwin/arm64_static/bin/llama-quantize $HOME/models/powerlm-3b Q4_K_M ./llm/build/darwin/arm64_static/bin/llama-quantize $HOME/models/powermoe-3b Q4_K_M # Import to ollama (finally!) echo "FROM $HOME/models/powerlm-3b/ggml-model-Q4_K_M.gguf" > Modelfile.powerlm-3b ./ollama create -f Modelfile.powerlm-3b powerlm:3b echo "FROM $HOME/models/powermoe-3b/ggml-model-Q4_K_M.gguf" > Modelfile.powermoe-3b ./ollama create -f Modelfile.powermoe-3b powermoe:3b ``` <details> <summary>Old instructions for building from my fork</summary> **build ollama** ```sh # Add my personal fork as a remote in the submodule cd llm/llama.cpp git remote add gabe https://github.com/gabe-l-hart/llama.cpp.git git fetch gabe cd - # Generate and build like normal go generate ./... go build . ``` </details> --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-19 16:32:55 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#22758