[PR #13893] [MERGED] glm4moelite: fix attention scale calculation #14435

Closed
opened 2026-04-13 00:54:02 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/13893
Author: @jmorganca
Created: 1/25/2026
Status: Merged
Merged: 1/25/2026
Merged by: @jmorganca

Base: mainHead: fix-glm4moelite-attention-scale


📝 Commits (1)

  • f919535 glm4moelite: fix attention scale calculation

📊 Changes

1 file changed (+1 additions, -6 deletions)

View changed files

📝 model/models/glm4moelite/model.go (+1 -6)

📄 Description

Use the original key dimension (qkNopeHeadDim + qkRopeHeadDim = 256) for the attention scale instead of the MLA absorbed dimension (kvLoraRank + qkRopeHeadDim = 576).

MLA absorption is a mathematically equivalent reorganization of the attention computation - it should not change the effective attention scale. The scale should match training, which uses 1/sqrt(256).

This improves tool calling and model looping issues.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/13893 **Author:** [@jmorganca](https://github.com/jmorganca) **Created:** 1/25/2026 **Status:** ✅ Merged **Merged:** 1/25/2026 **Merged by:** [@jmorganca](https://github.com/jmorganca) **Base:** `main` ← **Head:** `fix-glm4moelite-attention-scale` --- ### 📝 Commits (1) - [`f919535`](https://github.com/ollama/ollama/commit/f9195359f04de4d8fa289aef9d3605f1ffe8f2cb) glm4moelite: fix attention scale calculation ### 📊 Changes **1 file changed** (+1 additions, -6 deletions) <details> <summary>View changed files</summary> 📝 `model/models/glm4moelite/model.go` (+1 -6) </details> ### 📄 Description Use the original key dimension (qkNopeHeadDim + qkRopeHeadDim = 256) for the attention scale instead of the MLA absorbed dimension (kvLoraRank + qkRopeHeadDim = 576). MLA absorption is a mathematically equivalent reorganization of the attention computation - it should not change the effective attention scale. The scale should match training, which uses 1/sqrt(256). This improves tool calling and model looping issues. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-13 00:54:02 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#14435