[PR #7983] Add K/V cache quantization config to Modelfile (Follow-Up to PR #6279) #12582

Open
opened 2026-04-13 00:03:50 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/7983
Author: @dmatora
Created: 12/7/2024
Status: 🔄 Open

Base: mainHead: main


📝 Commits (1)

  • a2426b6 Add K/V cache quantization config to Modelfile, API, and CLI

📊 Changes

7 files changed (+39 additions, -15 deletions)

View changed files

📝 api/types.go (+1 -0)
📝 cmd/interactive.go (+1 -0)
📝 docs/api.md (+1 -0)
📝 docs/faq.md (+5 -5)
📝 llm/memory.go (+13 -6)
📝 llm/server.go (+17 -4)
📝 parser/parser_test.go (+1 -0)

📄 Description

This PR attempts to add support for setting the K/V cache quantization type directly from a model’s Modelfile, building on the recently merged K/V cache quantization feature introduced in PR #6279. The original contributor, @sammcj, spent months navigating architectural challenges and extensive review cycles to bring K/V cache quantization into Ollama. Given the complexity and the Ollama team’s reluctance to modify the API, @sammcj understandably decided not to pursue Modelfile support further.

The core issue is that introducing Modelfile parameters for kv_cache_type without changing the API would require hacky workarounds—such as writing the Modelfile’s kv_cache_type value into an environment variable in server/routes.go:modelOptions and then relying on envconfig.KvCacheType() downstream. Since the Ollama team has clearly indicated they’re not willing to expand or alter the API for this feature at this time, providing a clean, fully integrated solution is not feasible.

This PR, therefore, is not expected to be merged. Instead, it serves as a reference point or temporary measure for users who can’t wait for a more elegant, long-term architectural change and are willing to maintain a custom build of Ollama. It acknowledges the trade-offs and imperfection in the approach, hoping that future involvement from the Ollama team will enable a more robust solution eventually.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/7983 **Author:** [@dmatora](https://github.com/dmatora) **Created:** 12/7/2024 **Status:** 🔄 Open **Base:** `main` ← **Head:** `main` --- ### 📝 Commits (1) - [`a2426b6`](https://github.com/ollama/ollama/commit/a2426b6ced9f443bce06e634c89185314da86252) Add K/V cache quantization config to Modelfile, API, and CLI ### 📊 Changes **7 files changed** (+39 additions, -15 deletions) <details> <summary>View changed files</summary> 📝 `api/types.go` (+1 -0) 📝 `cmd/interactive.go` (+1 -0) 📝 `docs/api.md` (+1 -0) 📝 `docs/faq.md` (+5 -5) 📝 `llm/memory.go` (+13 -6) 📝 `llm/server.go` (+17 -4) 📝 `parser/parser_test.go` (+1 -0) </details> ### 📄 Description This PR attempts to add support for setting the K/V cache quantization type directly from a model’s Modelfile, building on the recently merged K/V cache quantization feature introduced in PR #6279. The original contributor, @sammcj, spent months navigating architectural challenges and extensive review cycles to bring K/V cache quantization into Ollama. Given the complexity and the Ollama team’s reluctance to modify the API, @sammcj understandably decided not to pursue Modelfile support further. The core issue is that introducing Modelfile parameters for `kv_cache_type` without changing the API would require hacky workarounds—such as writing the Modelfile’s `kv_cache_type` value into an environment variable in `server/routes.go:modelOptions` and then relying on `envconfig.KvCacheType()` downstream. Since the Ollama team has clearly indicated they’re not willing to expand or alter the API for this feature at this time, providing a clean, fully integrated solution is not feasible. This PR, therefore, is not expected to be merged. Instead, it serves as a reference point or temporary measure for users who can’t wait for a more elegant, long-term architectural change and are willing to maintain a custom build of Ollama. It acknowledges the trade-offs and imperfection in the approach, hoping that future involvement from the Ollama team will enable a more robust solution eventually. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-13 00:03:50 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#12582