[GH-ISSUE #1388] Description of models in the ollama page #733

Closed
opened 2026-04-12 10:24:22 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @lfoppiano on GitHub (Dec 5, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1388

I cannot find the meaning part of the naming convention of the models. For example:

https://ollama.ai/library/starling-lm:7b-alpha-q4_K_M

It's clear that q4 indicate the bit of quantization, but what do K and M mean?

Thanks

Originally created by @lfoppiano on GitHub (Dec 5, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1388 I cannot find the meaning part of the naming convention of the models. For example: https://ollama.ai/library/starling-lm:7b-alpha-q4_K_M It's clear that q4 indicate the bit of quantization, but what do K and M mean? Thanks
Author
Owner

@easp commented on GitHub (Dec 5, 2023):

S, M & L stand for small, medium and large. K refers, I think, to the quantization method, which, I think, uses K-means clustering.

<!-- gh-comment-id:1841539587 --> @easp commented on GitHub (Dec 5, 2023): S, M & L stand for small, medium and large. K refers, I think, to the quantization method, which, I think, uses K-means clustering.
Author
Owner

@lfoppiano commented on GitHub (Dec 5, 2023):

Thanks! but I'm wondering, for such models, what small, medium and large mean? I though that the number of parameters was the scale indication 🤔

<!-- gh-comment-id:1841807063 --> @lfoppiano commented on GitHub (Dec 5, 2023): Thanks! but I'm wondering, for such models, what small, medium and large mean? I though that the number of parameters was the scale indication 🤔
Author
Owner

@easp commented on GitHub (Dec 6, 2023):

Good question. I should have been more clear. The sizes are relevant to the quantization. So, within the 4-bit K-quantization there are the option of Small and Medium variants. For the 3-bit, there are S, M & L,

From: 5f6e0c0dff/examples/make-ggml.py (L16C1-L37C51)

Old quant types (some base model types require these):
- Q4_0: small, very high quality loss - legacy, prefer using Q3_K_M
- Q4_1: small, substantial quality loss - legacy, prefer using Q3_K_L
- Q5_0: medium, balanced quality - legacy, prefer using Q4_K_M
- Q5_1: medium, low quality loss - legacy, prefer using Q5_K_M

New quant types (recommended):
- Q2_K: smallest, extreme quality loss - not recommended
- Q3_K: alias for Q3_K_M
- Q3_K_S: very small, very high quality loss
- Q3_K_M: very small, very high quality loss
- Q3_K_L: small, substantial quality loss
- Q4_K: alias for Q4_K_M
- Q4_K_S: small, significant quality loss
- Q4_K_M: medium, balanced quality - recommended
- Q5_K: alias for Q5_K_M
- Q5_K_S: large, low quality loss - recommended
- Q5_K_M: large, very low quality loss - recommended
- Q6_K: very large, extremely low quality loss
- Q8_0: very large, extremely low quality loss - not recommended
- F16: extremely large, virtually no quality loss - not recommended
- F32: absolutely huge, lossless - not recommended
<!-- gh-comment-id:1841890505 --> @easp commented on GitHub (Dec 6, 2023): Good question. I should have been more clear. The sizes are relevant to the quantization. So, within the 4-bit K-quantization there are the option of Small and Medium variants. For the 3-bit, there are S, M & L, From: https://github.com/ggerganov/llama.cpp/blob/5f6e0c0dff1e7a89331e6b25eca9a9fd71324069/examples/make-ggml.py#L16C1-L37C51 ``` Old quant types (some base model types require these): - Q4_0: small, very high quality loss - legacy, prefer using Q3_K_M - Q4_1: small, substantial quality loss - legacy, prefer using Q3_K_L - Q5_0: medium, balanced quality - legacy, prefer using Q4_K_M - Q5_1: medium, low quality loss - legacy, prefer using Q5_K_M New quant types (recommended): - Q2_K: smallest, extreme quality loss - not recommended - Q3_K: alias for Q3_K_M - Q3_K_S: very small, very high quality loss - Q3_K_M: very small, very high quality loss - Q3_K_L: small, substantial quality loss - Q4_K: alias for Q4_K_M - Q4_K_S: small, significant quality loss - Q4_K_M: medium, balanced quality - recommended - Q5_K: alias for Q5_K_M - Q5_K_S: large, low quality loss - recommended - Q5_K_M: large, very low quality loss - recommended - Q6_K: very large, extremely low quality loss - Q8_0: very large, extremely low quality loss - not recommended - F16: extremely large, virtually no quality loss - not recommended - F32: absolutely huge, lossless - not recommended ```
Author
Owner

@lfoppiano commented on GitHub (Dec 6, 2023):

I understand, thanks.

<!-- gh-comment-id:1841932262 --> @lfoppiano commented on GitHub (Dec 6, 2023): I understand, thanks.
Author
Owner

@b-a0 commented on GitHub (Jul 25, 2024):

And is there a difference between upper and lower case letters?

image

<!-- gh-comment-id:2250831755 --> @b-a0 commented on GitHub (Jul 25, 2024): And is there a difference between upper and lower case letters? ![image](https://github.com/user-attachments/assets/b8a450e5-8cfe-447e-8a56-a9b68eeaa74d)
Author
Owner

@github12101 commented on GitHub (Jan 27, 2025):

Good question!

<!-- gh-comment-id:2617092250 --> @github12101 commented on GitHub (Jan 27, 2025): Good question!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#733