[GH-ISSUE #8332] Allow set the type of K/V cache separately #31100

Open
opened 2026-04-22 11:15:39 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @ag2s20150909 on GitHub (Jan 7, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8332

Allow set the type of K/V cache separately

On Qwen2-7B,
when K/V cache both q4_0 produces weird results.
when k is q4_0 and v is q8_0 produces weird results.
when k is q8_0 and v is q4_0 produces normal results.

Originally created by @ag2s20150909 on GitHub (Jan 7, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8332 Allow set the type of K/V cache separately On Qwen2-7B, when K/V cache both `q4_0` produces weird results. when k is `q4_0` and v is `q8_0` produces weird results. when k is `q8_0` and v is `q4_0` produces normal results.
GiteaMirror added the feature request label 2026-04-22 11:15:39 -05:00
Author
Owner

@youcefs21 commented on GitHub (Jan 21, 2025):

I feel like the same can be said about DeepSeek-R1-Distill-Qwen-32B

<!-- gh-comment-id:2604823072 --> @youcefs21 commented on GitHub (Jan 21, 2025): I feel like the same can be said about DeepSeek-R1-Distill-Qwen-32B
Author
Owner

@Kaylebor commented on GitHub (Mar 4, 2026):

For starters it'd be nice to at least be able to set them per model https://github.com/ollama/ollama/pull/7983

<!-- gh-comment-id:4000796191 --> @Kaylebor commented on GitHub (Mar 4, 2026): For starters it'd be nice to at least be able to set them per model https://github.com/ollama/ollama/pull/7983
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#31100