[PR #12385] [MERGED] Grace/deepseek v3 migration #13806

Closed
opened 2026-04-13 00:37:06 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/12385
Author: @gr4ceG
Created: 9/23/2025
Status: Merged
Merged: 9/24/2025
Merged by: @gr4ceG

Base: mainHead: grace/deepseek-v3-migration


📝 Commits (10+)

  • 5aa50ef init deepseek model file
  • 769ca7a temp removal of flash attention implementation
  • d685091 shapes and proper, can make a pass
  • 120e4ae query, key, value have good cosine similarity, but the max diff is a bit high
  • 91564c4 Attention block is working! ** with eager for now, have not added the mask line
  • ca72877 Attention block is working! ** with eager for now, have not added the mask line
  • 15458e0 working MoE at around 0.95 cosine sim
  • e566e81 added cosine similarity function
  • 3acaa70 Starting end to end structure
  • 9fd62e2 Trying (and failing) to get rope to work, going to test full thing on tater

📊 Changes

2 files changed (+325 additions, -0 deletions)

View changed files

model/models/deepseek2/model.go (+324 -0)
📝 model/models/models.go (+1 -0)

📄 Description

Testing that Ive performed:

  • Inference with long contexts
  • Comparing tokens/second performance
  • With multi-regex tokenizer
  • nvidia, amd, mac
  • deepseek v3.1 - so 3 arch is slightly different than 3.1 arch - i'll work ontop of this pr to support v3.1
  • long context evals

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/12385 **Author:** [@gr4ceG](https://github.com/gr4ceG) **Created:** 9/23/2025 **Status:** ✅ Merged **Merged:** 9/24/2025 **Merged by:** [@gr4ceG](https://github.com/gr4ceG) **Base:** `main` ← **Head:** `grace/deepseek-v3-migration` --- ### 📝 Commits (10+) - [`5aa50ef`](https://github.com/ollama/ollama/commit/5aa50ef6b7b838c79bd029ca2746e62c72ccad04) init deepseek model file - [`769ca7a`](https://github.com/ollama/ollama/commit/769ca7a87942fb1f0ab74bafb61b4eea77848ea3) temp removal of flash attention implementation - [`d685091`](https://github.com/ollama/ollama/commit/d685091d6742f522e4d9230d639cf81b73dcedb6) shapes and proper, can make a pass - [`120e4ae`](https://github.com/ollama/ollama/commit/120e4ae3ed73053875b3f12fea8c688c294d6af2) query, key, value have good cosine similarity, but the max diff is a bit high - [`91564c4`](https://github.com/ollama/ollama/commit/91564c40640b1b1086400aaee01311ca937594b4) Attention block is working! ** with eager for now, have not added the mask line - [`ca72877`](https://github.com/ollama/ollama/commit/ca728778131216975b0d9cd59d70e1367f1e1f9f) Attention block is working! ** with eager for now, have not added the mask line - [`15458e0`](https://github.com/ollama/ollama/commit/15458e07e3237452ab494ce93e54267c487b6f2b) working MoE at around 0.95 cosine sim - [`e566e81`](https://github.com/ollama/ollama/commit/e566e815d12092ac1c3b3996258e3941b32cc789) added cosine similarity function - [`3acaa70`](https://github.com/ollama/ollama/commit/3acaa709edc9c757a8bb68ffb81aec5e4e117671) Starting end to end structure - [`9fd62e2`](https://github.com/ollama/ollama/commit/9fd62e254470aea297906b163a1b1b3070956e14) Trying (and failing) to get rope to work, going to test full thing on tater ### 📊 Changes **2 files changed** (+325 additions, -0 deletions) <details> <summary>View changed files</summary> ➕ `model/models/deepseek2/model.go` (+324 -0) 📝 `model/models/models.go` (+1 -0) </details> ### 📄 Description Testing that Ive performed: - [x] Inference with long contexts - [x] Comparing tokens/second performance - [x] With multi-regex tokenizer - [x] nvidia, amd, mac - [ ] deepseek v3.1 - so 3 arch is slightly different than 3.1 arch - i'll work ontop of this pr to support v3.1 - [ ] long context evals --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-13 00:37:06 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#13806