[PR #13691] [CLOSED] modernbert: add ModernBERT architecture support #24882

Closed
opened 2026-04-19 17:51:56 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/13691
Author: @hansolosan
Created: 1/12/2026
Status: Closed

Base: mainHead: add_modernbert


📝 Commits (1)

  • 28ae23b modernbert: add ModernBERT architecture support

📊 Changes

19 files changed (+1605 additions, -35 deletions)

View changed files

GRANITE_MODERNBERT_PLAN.md (+417 -0)
IMPLEMENTATION_SUMMARY.md (+149 -0)
📝 convert/convert.go (+4 -0)
convert/convert_modernbert.go (+470 -0)
📝 fs/ggml/ggml.go (+2 -2)
📝 llama/llama.cpp/src/llama-arch.cpp (+23 -2)
📝 llama/llama.cpp/src/llama-arch.h (+6 -0)
📝 llama/llama.cpp/src/llama-batch.cpp (+0 -2)
📝 llama/llama.cpp/src/llama-graph.cpp (+1 -1)
📝 llama/llama.cpp/src/llama-hparams.h (+7 -0)
📝 llama/llama.cpp/src/llama-model.cpp (+65 -6)
📝 llama/llama.cpp/src/llama-model.h (+1 -0)
📝 llama/llama.cpp/src/models/bert.cpp (+124 -19)
📝 llama/llama.cpp/src/models/models.h (+4 -0)
llama/llama.cpp/src/models/modernbert.cpp (+290 -0)
📝 llama/llama.go (+1 -1)
📝 llm/server.go (+28 -1)
📝 ml/backend/ggml/ggml/src/ggml-cuda/CMakeLists.txt (+1 -1)
📝 server/routes.go (+12 -0)

📄 Description

  • Add ModernBERT architecture support for embedding models
  • Implement alternating local/global attention patterns with sliding window attention
  • Support dual RoPE theta values (local: 10000, global: 80000)
  • Add GeGLU (gated FFN) support with proper tensor splitting
  • Configure CLS pooling and GPT-2 tokenizer for sentence-transformers compatibility
  • Handle 8192 context length with proper truncation support
  • Fix tensor loading for output_norm and layer_out_norm tensors

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/13691 **Author:** [@hansolosan](https://github.com/hansolosan) **Created:** 1/12/2026 **Status:** ❌ Closed **Base:** `main` ← **Head:** `add_modernbert` --- ### 📝 Commits (1) - [`28ae23b`](https://github.com/ollama/ollama/commit/28ae23bf78a0805ece9f546da063a201ede8cb48) modernbert: add ModernBERT architecture support ### 📊 Changes **19 files changed** (+1605 additions, -35 deletions) <details> <summary>View changed files</summary> ➕ `GRANITE_MODERNBERT_PLAN.md` (+417 -0) ➕ `IMPLEMENTATION_SUMMARY.md` (+149 -0) 📝 `convert/convert.go` (+4 -0) ➕ `convert/convert_modernbert.go` (+470 -0) 📝 `fs/ggml/ggml.go` (+2 -2) 📝 `llama/llama.cpp/src/llama-arch.cpp` (+23 -2) 📝 `llama/llama.cpp/src/llama-arch.h` (+6 -0) 📝 `llama/llama.cpp/src/llama-batch.cpp` (+0 -2) 📝 `llama/llama.cpp/src/llama-graph.cpp` (+1 -1) 📝 `llama/llama.cpp/src/llama-hparams.h` (+7 -0) 📝 `llama/llama.cpp/src/llama-model.cpp` (+65 -6) 📝 `llama/llama.cpp/src/llama-model.h` (+1 -0) 📝 `llama/llama.cpp/src/models/bert.cpp` (+124 -19) 📝 `llama/llama.cpp/src/models/models.h` (+4 -0) ➕ `llama/llama.cpp/src/models/modernbert.cpp` (+290 -0) 📝 `llama/llama.go` (+1 -1) 📝 `llm/server.go` (+28 -1) 📝 `ml/backend/ggml/ggml/src/ggml-cuda/CMakeLists.txt` (+1 -1) 📝 `server/routes.go` (+12 -0) </details> ### 📄 Description - Add ModernBERT architecture support for embedding models - Implement alternating local/global attention patterns with sliding window attention - Support dual RoPE theta values (local: 10000, global: 80000) - Add GeGLU (gated FFN) support with proper tensor splitting - Configure CLS pooling and GPT-2 tokenizer for sentence-transformers compatibility - Handle 8192 context length with proper truncation support - Fix tensor loading for output_norm and layer_out_norm tensors --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-19 17:51:56 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#24882