[PR #11207] [MERGED] skip quantizing per_layer_token_embd #75774

Closed
opened 2026-05-05 08:12:08 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/11207
Author: @mxyng
Created: 6/26/2025
Status: Merged
Merged: 6/27/2025
Merged by: @mxyng

Base: mainHead: mxyng/skip-quantize


📝 Commits (1)

  • 7ef0510 skip quantizing per_layer_token_embd

📊 Changes

1 file changed (+2 additions, -0 deletions)

View changed files

📝 server/quantization.go (+2 -0)

📄 Description

this tensor isn't compatible with cuda when quantized to q4_K so skip it


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/11207 **Author:** [@mxyng](https://github.com/mxyng) **Created:** 6/26/2025 **Status:** ✅ Merged **Merged:** 6/27/2025 **Merged by:** [@mxyng](https://github.com/mxyng) **Base:** `main` ← **Head:** `mxyng/skip-quantize` --- ### 📝 Commits (1) - [`7ef0510`](https://github.com/ollama/ollama/commit/7ef0510db45a270ee0287969a453ca42e2215399) skip quantizing per_layer_token_embd ### 📊 Changes **1 file changed** (+2 additions, -0 deletions) <details> <summary>View changed files</summary> 📝 `server/quantization.go` (+2 -0) </details> ### 📄 Description this tensor isn't compatible with cuda when quantized to q4_K so skip it --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 08:12:08 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#75774