[PR #13607] fix: avoid Nemotron v2 MoE crash (llama.cpp #18309, 0b0ceb5) #24830

Open
opened 2026-04-19 17:49:58 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/13607
Author: @quidscio
Created: 1/2/2026
Status: 🔄 Open

Base: mainHead: fix/nemotron-moe-pr


📝 Commits (1)

  • 739880b fix: avoid Nemotron v2 MoE crash (llama.cpp #18309, 0b0ceb5)

📊 Changes

1 file changed (+3 additions, -3 deletions)

View changed files

📝 llama/llama.cpp/src/llama-model.cpp (+3 -3)

📄 Description

Pull upstream llama.cpp fix for Nemotron v2 MoE parameter handling

This PR pulls the upstream llama.cpp fix that resolves incorrect parameter handling for Nemotron v2 MoE models.

Upstream fix:
https://github.com/ggml-org/llama.cpp/pull/18309

Impact

  • Fixes 500 Internal Server Error / runner exit status 2 when loading Nemotron v2 MoE models
  • No behavioral or performance regressions observed in non-MoE models
  • Unblocks Nemotron v2 usage without downstream workarounds

Verification

  • mirage335/NVIDIA-Nemotron-Nano-9B-v2-virtuoso now loads and runs correctly
  • Regression-tested against:
    • gpt-oss:20b
    • deepseek-r1:14b
    • devstral:latest

Full reproduction details and before/after evidence are documented in issue #13547.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/13607 **Author:** [@quidscio](https://github.com/quidscio) **Created:** 1/2/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `fix/nemotron-moe-pr` --- ### 📝 Commits (1) - [`739880b`](https://github.com/ollama/ollama/commit/739880b79fe49cacfdae1c421bf7f12d38a8cd61) fix: avoid Nemotron v2 MoE crash (llama.cpp #18309, 0b0ceb5) ### 📊 Changes **1 file changed** (+3 additions, -3 deletions) <details> <summary>View changed files</summary> 📝 `llama/llama.cpp/src/llama-model.cpp` (+3 -3) </details> ### 📄 Description ### Pull upstream llama.cpp fix for Nemotron v2 MoE parameter handling This PR pulls the upstream llama.cpp fix that resolves incorrect parameter handling for Nemotron v2 MoE models. Upstream fix: https://github.com/ggml-org/llama.cpp/pull/18309 Impact - Fixes 500 Internal Server Error / runner exit status 2 when loading Nemotron v2 MoE models - No behavioral or performance regressions observed in non-MoE models - Unblocks Nemotron v2 usage without downstream workarounds Verification - mirage335/NVIDIA-Nemotron-Nano-9B-v2-virtuoso now loads and runs correctly - Regression-tested against: - gpt-oss:20b - deepseek-r1:14b - devstral:latest Full reproduction details and before/after evidence are documented in issue #13547. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-19 17:49:58 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#24830