[PR #11929] [MERGED] gpt-oss: disable quantized kv cache #44910

Closed
opened 2026-04-25 00:35:14 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/11929
Author: @mxyng
Created: 8/15/2025
Status: Merged
Merged: 8/15/2025
Merged by: @mxyng

Base: mainHead: mxyng/gpt-oss-cache-types


📝 Commits (1)

  • 2d31289 gpt-oss: disable quantized kv cache

📊 Changes

1 file changed (+5 additions, -0 deletions)

View changed files

📝 fs/ggml/ggml.go (+5 -0)

📄 Description

quantized kv cache for gpt-oss is much slower than with regular f16 cache type due to the model using attention with sinks. this isn't supported on backends such as cuda which forces it onto the cpu dramatically reducing performance


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/11929 **Author:** [@mxyng](https://github.com/mxyng) **Created:** 8/15/2025 **Status:** ✅ Merged **Merged:** 8/15/2025 **Merged by:** [@mxyng](https://github.com/mxyng) **Base:** `main` ← **Head:** `mxyng/gpt-oss-cache-types` --- ### 📝 Commits (1) - [`2d31289`](https://github.com/ollama/ollama/commit/2d31289d4450628fb51ea7b6878406a1b37c69e1) gpt-oss: disable quantized kv cache ### 📊 Changes **1 file changed** (+5 additions, -0 deletions) <details> <summary>View changed files</summary> 📝 `fs/ggml/ggml.go` (+5 -0) </details> ### 📄 Description quantized kv cache for gpt-oss is much slower than with regular f16 cache type due to the model using attention with sinks. this isn't supported on backends such as cuda which forces it onto the cpu dramatically reducing performance --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-25 00:35:14 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#44910