[GH-ISSUE #2495] Llama2: q4_km as default? #1457

Closed
opened 2026-04-12 11:21:14 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @matthiasgeihs on GitHub (Feb 14, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2495

Just saw this table: https://github.com/ggerganov/llama.cpp/pull/1684#issuecomment-1579252501

Perplexity loss is considerably higher for q4_0 compared to q4_km. km provides the best tradeoff between size and performance, as noted also here.

ollama currently uses q4_0 as the default for llama2: https://ollama.com/library/llama2:latest (as observed by comparing the ID 78e26419b446).

Suggestion:
Use q4_km as the default instead.

Originally created by @matthiasgeihs on GitHub (Feb 14, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2495 Just saw this table: https://github.com/ggerganov/llama.cpp/pull/1684#issuecomment-1579252501 Perplexity loss is considerably higher for `q4_0` compared to `q4_km`. `km` provides the best tradeoff between size and performance, as noted also [here](https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF). `ollama` currently uses `q4_0` as the default for `llama2`: https://ollama.com/library/llama2:latest (as observed by comparing the ID 78e26419b446). **Suggestion:** Use [`q4_km`](https://ollama.com/library/llama2:7b-chat-q4_K_M) as the default instead.
GiteaMirror added the feature request label 2026-04-12 11:21:14 -05:00
Author
Owner

@pdevine commented on GitHub (May 17, 2024):

Hey @matthiasgeihs you can pull 7b-chat-q4_K_M which is the model you want. The ollama cp command will allow you to change it on your own host to the default. We've been debating internally for a while about changing the default, but we'll see.

I'll go ahead and close the issue though.

<!-- gh-comment-id:2116460497 --> @pdevine commented on GitHub (May 17, 2024): Hey @matthiasgeihs you can pull [7b-chat-q4_K_M](https://ollama.com/library/llama2:7b-chat-q4_K_M) which is the model you want. The `ollama cp` command will allow you to change it on your own host to the default. We've been debating internally for a while about changing the default, but we'll see. I'll go ahead and close the issue though.
Author
Owner

@matthiasgeihs commented on GitHub (May 17, 2024):

Thx for the update. An option to change the default quantization on my machine would be great. Are you saying ollama cp can do that? How?

<!-- gh-comment-id:2116484865 --> @matthiasgeihs commented on GitHub (May 17, 2024): Thx for the update. An option to change the default quantization on my machine would be great. Are you saying `ollama cp` can do that? How?
Author
Owner

@pdevine commented on GitHub (May 19, 2024):

Ultimately I'd love you to be able to just set a default and not have to worry about tags any more (I demo'd something like this at the Ollama + Friends meetup in Paris last month). For now if you don't want to type out the whole tag name, just ollama cp llama2:7b-chat-q4_K_M mymodel which would allow you to ollama run mymodel as a convenient shortcut.

<!-- gh-comment-id:2119098752 --> @pdevine commented on GitHub (May 19, 2024): Ultimately I'd love you to be able to just set a default and not have to worry about tags any more (I demo'd something like this at the Ollama + Friends meetup in Paris last month). For now if you don't want to type out the whole tag name, just `ollama cp llama2:7b-chat-q4_K_M mymodel` which would allow you to `ollama run mymodel` as a convenient shortcut.
Author
Owner

@ncoquelet commented on GitHub (Jun 7, 2024):

👍 Thanks for the tips on the ollama cp command. but that double our models list, not really convincing with many models
🤞 For an option to customise the default quantization tag for all models at once when pull and run

<!-- gh-comment-id:2154369308 --> @ncoquelet commented on GitHub (Jun 7, 2024): :+1: Thanks for the tips on the `ollama cp` command. but that double our models list, not really convincing with many models :crossed_fingers: For an option to customise the default quantization tag for all models at once when pull and run
Author
Owner

@ncoquelet commented on GitHub (Jun 7, 2024):

linked open issue https://github.com/ollama/ollama/issues/1543

<!-- gh-comment-id:2154382739 --> @ncoquelet commented on GitHub (Jun 7, 2024): linked open issue https://github.com/ollama/ollama/issues/1543
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#1457