[PR #14356] [MERGED] models: add nemotron architecture support #45886

Closed
opened 2026-04-25 01:29:52 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/14356
Author: @jmorganca
Created: 2/22/2026
Status: Merged
Merged: 2/22/2026
Merged by: @jmorganca

Base: mainHead: ollama-nemotron


📝 Commits (6)

📊 Changes

22 files changed (+3196 additions, -4 deletions)

View changed files

📝 convert/convert.go (+5 -2)
convert/convert_nemotron_h.go (+385 -0)
convert/convert_nemotron_h_test.go (+230 -0)
convert/json_compat.go (+97 -0)
convert/json_compat_test.go (+46 -0)
📝 fs/ggml/ggml.go (+23 -0)
kvcache/recurrent.go (+752 -0)
kvcache/recurrent_checkpoints.go (+561 -0)
kvcache/recurrent_checkpoints_test.go (+288 -0)
llama/patches/0034-ggml-metal-guard-mul_mat_id-map0-and-add-ne20-22-spe.patch (+37 -0)
📝 ml/backend.go (+1 -0)
📝 ml/backend/ggml/ggml.go (+7 -0)
📝 ml/backend/ggml/ggml/src/ggml-metal/ggml-metal-embed.metal (+1 -0)
📝 ml/backend/ggml/ggml/src/ggml-metal/ggml-metal-ops.cpp (+2 -1)
📝 ml/backend/ggml/ggml/src/ggml-metal/ggml-metal.metal (+1 -0)
📝 model/model_test.go (+1 -0)
📝 model/models/models.go (+1 -0)
model/models/nemotronh/attention.go (+88 -0)
model/models/nemotronh/cache.go (+55 -0)
model/models/nemotronh/mamba2.go (+197 -0)

...and 2 more files

📄 Description

This PR takes the first step to create a unified recurrent cache that can be shared with the Qwen3.5 "Qwen Next" and LFM architectures.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/14356 **Author:** [@jmorganca](https://github.com/jmorganca) **Created:** 2/22/2026 **Status:** ✅ Merged **Merged:** 2/22/2026 **Merged by:** [@jmorganca](https://github.com/jmorganca) **Base:** `main` ← **Head:** `ollama-nemotron` --- ### 📝 Commits (6) - [`1820b91`](https://github.com/ollama/ollama/commit/1820b914829d8de3782692f071d1549ea9cbb7f2) model: add nemotronh support - [`2f07a8e`](https://github.com/ollama/ollama/commit/2f07a8e170fdf17686f0d814c129788b39c1e09b) fix repeat messages causing crash - [`d429254`](https://github.com/ollama/ollama/commit/d4292542da664197f44ac3531853cf60665b1100) simplify recurrent kv cache code - [`8a43ae0`](https://github.com/ollama/ollama/commit/8a43ae02e2b36bcdc2ee5ac60e4476a9f6f1a737) lint - [`50c6227`](https://github.com/ollama/ollama/commit/50c6227bab903339606fc32aac0809993abbf5cf) simplify metal patch - [`b1665c8`](https://github.com/ollama/ollama/commit/b1665c82dc2528d9af361ed84bef5facee7199d6) handle kv shift ### 📊 Changes **22 files changed** (+3196 additions, -4 deletions) <details> <summary>View changed files</summary> 📝 `convert/convert.go` (+5 -2) ➕ `convert/convert_nemotron_h.go` (+385 -0) ➕ `convert/convert_nemotron_h_test.go` (+230 -0) ➕ `convert/json_compat.go` (+97 -0) ➕ `convert/json_compat_test.go` (+46 -0) 📝 `fs/ggml/ggml.go` (+23 -0) ➕ `kvcache/recurrent.go` (+752 -0) ➕ `kvcache/recurrent_checkpoints.go` (+561 -0) ➕ `kvcache/recurrent_checkpoints_test.go` (+288 -0) ➕ `llama/patches/0034-ggml-metal-guard-mul_mat_id-map0-and-add-ne20-22-spe.patch` (+37 -0) 📝 `ml/backend.go` (+1 -0) 📝 `ml/backend/ggml/ggml.go` (+7 -0) 📝 `ml/backend/ggml/ggml/src/ggml-metal/ggml-metal-embed.metal` (+1 -0) 📝 `ml/backend/ggml/ggml/src/ggml-metal/ggml-metal-ops.cpp` (+2 -1) 📝 `ml/backend/ggml/ggml/src/ggml-metal/ggml-metal.metal` (+1 -0) 📝 `model/model_test.go` (+1 -0) 📝 `model/models/models.go` (+1 -0) ➕ `model/models/nemotronh/attention.go` (+88 -0) ➕ `model/models/nemotronh/cache.go` (+55 -0) ➕ `model/models/nemotronh/mamba2.go` (+197 -0) _...and 2 more files_ </details> ### 📄 Description This PR takes the first step to create a unified recurrent cache that can be shared with the Qwen3.5 "Qwen Next" and LFM architectures. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-25 01:29:52 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#45886