[GH-ISSUE #10557] Add support for IBM Granite-4.0-Tiny-Preview #53460

Open
opened 2026-04-29 03:16:53 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @EwoutH on GitHub (May 4, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10557

Granite-4.0-Tiny-Preview is a very interesting new model with a hybrid Mamba-2/Transformer architecture. Adding support for this preview model will help full Granite 4.0 support when the complete Granite 4.0 series launches this summer.

See: https://www.ibm.com/new/announcements/ibm-granite-4-0-tiny-preview-sneak-peek

Potentially relevant issues/PRs from other repos:

Originally created by @EwoutH on GitHub (May 4, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10557 [Granite-4.0-Tiny-Preview](https://huggingface.co/ibm-granite/granite-4.0-tiny-preview) is a very interesting new model with a hybrid Mamba-2/Transformer architecture. Adding support for this preview model will help full Granite 4.0 support when the complete Granite 4.0 series launches this summer. See: https://www.ibm.com/new/announcements/ibm-granite-4-0-tiny-preview-sneak-peek Potentially relevant issues/PRs from other repos: - https://github.com/huggingface/transformers/pull/35894 - https://github.com/vllm-project/vllm/pull/17497
GiteaMirror added the model label 2026-04-29 03:16:53 -05:00
Author
Owner

@alex-jw-brooks commented on GitHub (May 5, 2025):

Thanks for opening this! Linking this PR too, which is the analogous one to the vLLM PR above https://github.com/huggingface/transformers/pull/37658 (GraniteMoeHybrid builds on top of GraniteMoeShared + SSMs)

<!-- gh-comment-id:2849855916 --> @alex-jw-brooks commented on GitHub (May 5, 2025): Thanks for opening this! Linking this PR too, which is the analogous one to the vLLM PR above https://github.com/huggingface/transformers/pull/37658 (GraniteMoeHybrid builds on top of GraniteMoeShared + SSMs)
Author
Owner

@gabe-l-hart commented on GitHub (May 5, 2025):

Hi @EwoutH! This is actively on the TODO list. Here's the corresponding issue for enablement in llama.cpp: https://github.com/ggml-org/llama.cpp/issues/13275

For Ollama integration, we're planning to bring the architecture in via the new engine. There will be a number of steps to get this all working given all of the architectural pieces that the model uses. The upside is that it should bring a number of other architectures along for the ride (mamba, mamba2, jamba, and bamba). Here's a list of tasks we'll be working through:

<!-- gh-comment-id:2851841216 --> @gabe-l-hart commented on GitHub (May 5, 2025): Hi @EwoutH! This is actively on the TODO list. Here's the corresponding issue for enablement in `llama.cpp`: https://github.com/ggml-org/llama.cpp/issues/13275 For Ollama integration, we're planning to bring the architecture in via the [new engine](https://github.com/ollama/ollama/tree/main/model). There will be a number of steps to get this all working given all of the architectural pieces that the model uses. The upside is that it should bring a number of other architectures along for the ride (`mamba`, `mamba2`, `jamba`, and `bamba`). Here's a list of tasks we'll be working through: - [ ] Port `granite` architecture to the new engine - Open PR: https://github.com/ollama/ollama/pull/9966 - [ ] Port `granitemoe` architecture to the new engine - Open PR: https://github.com/ollama/ollama/pull/9966 - [ ] Add inline conversion for `granitemoe` architecture - Open PR: https://github.com/ollama/ollama/pull/10362 - [ ] Support `granitemoeshared` architecture in the new architecture - WIP Branch: https://github.com/gabe-l-hart/ollama/tree/GraniteMoEShared - `llama.cpp` PR: https://github.com/ggml-org/llama.cpp/pull/13269 - [ ] Support `mamba` / `mamba2` mixer models - Related open issues: https://github.com/ollama/ollama/issues?q=is%3Aissue%20state%3Aopen%20mamba - `llama.cpp` PR: - [ ] Support hybrid attention / recurrent models - [ ] Support use of `NoPE` instead of `RoPE` (may be a [no-op](https://github.com/ggml-org/llama.cpp/issues/13275#issuecomment-2848684563)) - [ ] Glue it all together for `granitemoehybrid`
Author
Owner

@EwoutH commented on GitHub (Jul 11, 2025):

Thanks for this awesome work! I think we're now a lot further with Granite 4 support, right?

<!-- gh-comment-id:3061509893 --> @EwoutH commented on GitHub (Jul 11, 2025): Thanks for this awesome work! I think we're now a lot further with Granite 4 support, right?
Author
Owner

@gabe-l-hart commented on GitHub (Jul 11, 2025):

@EwoutH Yes! We are much closer now. After discussions with the core Ollama team, we decided to initially bring Granite Four in via llama.cpp since we were going to be doing that work anyway to support other platforms. I'm happy to say that, as of last night, Granite Four has been merged to master in llama.cpp!

This means that the next step for Ollama support is bumping llama.cpp to the latest tip of master. I've drafted that work already (https://github.com/ollama/ollama/pull/11195), but it needs to be redone now that it's officially merged. As with any llama.cpp bump, there are a lot of changes besides the core model support, so getting this tested and merged will take some time. In the meantime, anyone interested is welcome to play with my fork and poke holes at the version with the bumped llama.cpp. The more eyes, the better!

<!-- gh-comment-id:3062815231 --> @gabe-l-hart commented on GitHub (Jul 11, 2025): @EwoutH Yes! We are much closer now. After discussions with the core Ollama team, we decided to initially bring Granite Four in via `llama.cpp` since we were going to be doing that work anyway to support other platforms. I'm happy to say that, as of last night, [Granite Four](https://github.com/ggml-org/llama.cpp/pull/13550) has been merged to `master` in `llama.cpp`! This means that the next step for Ollama support is bumping `llama.cpp` to the latest tip of `master`. I've drafted that work already (https://github.com/ollama/ollama/pull/11195), but it needs to be redone now that it's officially merged. As with any `llama.cpp` bump, there are a _lot_ of changes besides the core model support, so getting this tested and merged will take some time. In the meantime, anyone interested is welcome to play with my fork and poke holes at the version with the bumped `llama.cpp`. The more eyes, the better!
Author
Owner

@gabe-l-hart commented on GitHub (Jul 14, 2025):

llama.cpp bump PR is now updated to point at the mainline master with Granite 4 support. All CI is green, so now we need to thoroughly test everything. Any assistance the community can provide in testing would be much appreciated!

<!-- gh-comment-id:3069970685 --> @gabe-l-hart commented on GitHub (Jul 14, 2025): `llama.cpp` bump PR is now updated to point at the mainline `master` with Granite 4 support. All CI is green, so now we need to thoroughly test everything. Any assistance the community can provide in testing would be much appreciated!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#53460