[GH-ISSUE #5425] Does having the default quant type being Q4_0 (a legacy format) on the model hub still make sense? #65432

Closed
opened 2026-05-03 21:16:23 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @sammcj on GitHub (Jul 2, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5425

The Ollama model hub still has the default quant type of Q4_0 which is a legacy format that under-performs compared to K-quants (Qn_K, e.g. Q4_K_M, Q6_K, Q5_K_L etc...).

  • Would it perhaps make sense to change the default quant to Q4_K_M for future models uploaded to the hub?

Reference

image

image

image

image

(Sorry if an issue already exists for this - if it did my search-foo let me down)

Originally created by @sammcj on GitHub (Jul 2, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5425 The Ollama model hub still has the default quant type of Q4_0 which is a legacy format that under-performs compared to K-quants (Qn_K, e.g. Q4_K_M, Q6_K, Q5_K_L etc...). - Would it perhaps make sense to change the default quant to Q4_K_M for future models uploaded to the hub? Reference - https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix (Note that the legacy quant types don't even appear on the feature matrix). - https://github.com/ggerganov/llama.cpp/wiki/Tensor-Encoding-Schemes - https://www.reddit.com/r/LocalLLaMA/comments/1ba55rj/overview_of_gguf_quantization_methods/ - https://github.com/ggerganov/llama.cpp/discussions/406#discussioncomment-6176448 - https://github.com/ggerganov/llama.cpp/discussions/2094 - https://huggingface.co/datasets/christopherthompson81/quant_exploration ![image](https://github.com/ollama/ollama/assets/862951/201f324e-d27e-4fd9-b6d4-408699191685) <img width="684" alt="image" src="https://github.com/ollama/ollama/assets/862951/2ab178bb-399e-4fb6-a628-9e60ffb694ab"> ![image](https://github.com/ollama/ollama/assets/862951/5b8b200f-a68e-4178-a3a8-43f81fb7753b) <img width="949" alt="image" src="https://github.com/ollama/ollama/assets/862951/520fcec7-1808-4893-ae02-401fee430e6e"> (Sorry if an issue already exists for this - if it did my search-foo let me down)
GiteaMirror added the feature request label 2026-05-03 21:16:23 -05:00
Author
Owner

@DuckyBlender commented on GitHub (Jul 16, 2024):

I 100% agree on this. This decision should have been made a long time ago.
The default on all of my models on Ollama is q4_K_M for this reason

<!-- gh-comment-id:2231873450 --> @DuckyBlender commented on GitHub (Jul 16, 2024): I 100% agree on this. This decision should have been made a long time ago. The default on all of my models on Ollama is q4_K_M for this reason
Author
Owner

@mahenning commented on GitHub (Sep 25, 2024):

Any updates on this? Would be great if the k-quants will be handled as defaults, as I personally see no reason for the q_0 quants to remain default. It's more typing to get the k-quants right now, and users with less experience in quantization miss out on an arguably better model if they just use the default model names. If the decision went against k-quants as default, I'd be interested in the reason.

<!-- gh-comment-id:2374078514 --> @mahenning commented on GitHub (Sep 25, 2024): Any updates on this? Would be great if the k-quants will be handled as defaults, as I personally see no reason for the q_0 quants to remain default. It's more typing to get the k-quants right now, and users with less experience in quantization miss out on an arguably better model if they just use the default model names. If the decision went against k-quants as default, I'd be interested in the reason.
Author
Owner

@jtsorlinis commented on GitHub (Dec 9, 2024):

Has this been done? I've noticed some models now default to Q4_K_M?

<!-- gh-comment-id:2527236942 --> @jtsorlinis commented on GitHub (Dec 9, 2024): Has this been done? I've noticed some models now default to Q4_K_M?
Author
Owner

@mahenning commented on GitHub (Dec 10, 2024):

It seems it is in transition. Qwen 2.5, Llama 3.1+ (among others) are q4_K_M now, but older models like Llama 3, Qwen 1/2, or Mistral 7b are still pointing to q_4. I hope they change it for every (half new) model, otherwise it is a bit half-done. Although I do see the argument that people very rarely use qwen 1 or llama 2 with the availability of its sucessors.
I'm pleased that at least for newer models the change is done. Maybe it gets its way into the changelogs of a new release.

<!-- gh-comment-id:2530662647 --> @mahenning commented on GitHub (Dec 10, 2024): It seems it is in transition. Qwen 2.5, Llama 3.1+ (among others) are q4_K_M now, but older models like Llama 3, Qwen 1/2, or Mistral 7b are still pointing to q_4. I hope they change it for every (half new) model, otherwise it is a bit half-done. Although I do see the argument that people very rarely use qwen 1 or llama 2 with the availability of its sucessors. I'm pleased that at least for newer models the change is done. Maybe it gets its way into the changelogs of a new release.
Author
Owner

@jmorganca commented on GitHub (Dec 29, 2024):

Merging with https://github.com/ollama/ollama/issues/1543

<!-- gh-comment-id:2564817406 --> @jmorganca commented on GitHub (Dec 29, 2024): Merging with https://github.com/ollama/ollama/issues/1543
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#65432