[GH-ISSUE #1543] Better model quantization defaults from ollama.com #62879

Closed
opened 2026-05-03 10:35:41 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @knoopx on GitHub (Dec 15, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1543

Is there a reason latest tag on the model hub points, by default, to the older q4_0 quants? The newer k_m/s are
supposedly better and the size difference is usually just a few hundred megabytes, it would be nice if it defaulted to those instead.

Originally created by @knoopx on GitHub (Dec 15, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1543 Is there a reason `latest` tag on the model hub points, by default, to the older `q4_0` quants? The newer `k_m/s` are supposedly better and the size difference is usually just a few hundred megabytes, it would be nice if it defaulted to those instead.
GiteaMirror added the ollama.com label 2026-05-03 10:35:41 -05:00
Author
Owner

@choltha commented on GitHub (Dec 22, 2023):

Difference in perplexity (can be used to measure model response quality) is listed here:
https://github.com/ggerganov/llama.cpp/discussions/2094#discussioncomment-6351796
Relevant lines:

Q4_0 : 3.50G, +0.2499 ppl @ 7B
Q3_K_L : 3.35G, +0.1803 ppl @ 7B
Q4_K_S : 3.56G, +0.1149 ppl @ 7B
Q4_K_M : 3.80G, +0.0535 ppl @ 7B
F16 : 13.00G @ 7B

F16 is the baseline reference for perplexity

Surprisingly even Q3_K_L is listed with lower perplexity than Q4_0 while being smaller. This is also the recommendation by "the bloke":

Q4_0: legacy; small, very high quality loss - prefer using Q3_K_M

I would support to switch to Q4_K_M by default (best compromise)

If not wanted because of size-increase, at least switch to Q3_K_M which would be better by any measure.

<!-- gh-comment-id:1868047871 --> @choltha commented on GitHub (Dec 22, 2023): Difference in perplexity (can be used to measure model response quality) is listed here: https://github.com/ggerganov/llama.cpp/discussions/2094#discussioncomment-6351796 Relevant lines: > Q4_0 : 3.50G, +0.2499 ppl @ 7B > Q3_K_L : 3.35G, +0.1803 ppl @ 7B > Q4_K_S : 3.56G, +0.1149 ppl @ 7B > Q4_K_M : 3.80G, +0.0535 ppl @ 7B > F16 : 13.00G @ 7B F16 is the baseline reference for perplexity Surprisingly even Q3_K_L is listed with lower perplexity than Q4_0 while being smaller. This is also the recommendation by "the bloke": > Q4_0: legacy; small, very high quality loss - prefer using Q3_K_M I would support to switch to Q4_K_M by default (best compromise) If not wanted because of size-increase, at least switch to Q3_K_M which would be better by any measure.
Author
Owner

@choltha commented on GitHub (Dec 22, 2023):

I would create a PR if someone points me to the place where the defaults for this are set.

<!-- gh-comment-id:1868052626 --> @choltha commented on GitHub (Dec 22, 2023): I would create a PR if someone points me to the place where the defaults for this are set.
Author
Owner

@mchiang0610 commented on GitHub (Mar 11, 2024):

Thank you for sharing this. We will be working to have more sane defaults from ollama.com; sorry about this.

<!-- gh-comment-id:1989199393 --> @mchiang0610 commented on GitHub (Mar 11, 2024): Thank you for sharing this. We will be working to have more sane defaults from ollama.com; sorry about this.
Author
Owner

@choltha commented on GitHub (Mar 12, 2024):

Thanks for your work!

<!-- gh-comment-id:1991021091 --> @choltha commented on GitHub (Mar 12, 2024): Thanks for your work!
Author
Owner

@jtsorlinis commented on GitHub (Jun 14, 2024):

Any updates on this? Would be great to default to something like q4_k_m

<!-- gh-comment-id:2167499436 --> @jtsorlinis commented on GitHub (Jun 14, 2024): Any updates on this? Would be great to default to something like q4_k_m
Author
Owner

@DuckyBlender commented on GitHub (Jul 16, 2024):

related to #5425

<!-- gh-comment-id:2231874313 --> @DuckyBlender commented on GitHub (Jul 16, 2024): related to #5425
Author
Owner

@jmorganca commented on GitHub (Dec 29, 2024):

The default is now q4_k_m - and in the future additional, higher-quality 4-bit quantization formats are looking to be supported (e.g. AWQ)

<!-- gh-comment-id:2564817498 --> @jmorganca commented on GitHub (Dec 29, 2024): The default is now q4_k_m - and in the future additional, higher-quality 4-bit quantization formats are looking to be supported (e.g. AWQ)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#62879