[GH-ISSUE #4626] about model quantization #2905

Closed
opened 2026-04-12 13:15:34 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @andyyumiao on GitHub (May 25, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4626

What are the quantization parameters used for the llama3 model in Ollama? For example, llama3 version, quantization parameters, etc
The llama3 8b version that I quantified using llama.cpp myself is not as good as the llama3 8b version that comes with Ollama, so I want to know the reason.

Originally created by @andyyumiao on GitHub (May 25, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4626 What are the quantization parameters used for the llama3 model in Ollama? For example, llama3 version, quantization parameters, etc The llama3 8b version that I quantified using llama.cpp myself is not as good as the llama3 8b version that comes with Ollama, so I want to know the reason.
GiteaMirror added the question label 2026-04-12 13:15:34 -05:00
Author
Owner

@umiyuki commented on GitHub (May 25, 2024):

Conversely, Phi-3-medium is degraded in the ollama version. it has lower benchmark scores than Llama.cpp. several people have tried Phi-3-medium on ollama and found the performance to be poor. Why is this happening? Either the inference parameter settings or the chat template is wrong or something is broken in quantization.

<!-- gh-comment-id:2130726686 --> @umiyuki commented on GitHub (May 25, 2024): Conversely, Phi-3-medium is degraded in the ollama version. it has lower benchmark scores than Llama.cpp. several people have tried Phi-3-medium on ollama and found the performance to be poor. Why is this happening? Either the inference parameter settings or the chat template is wrong or something is broken in quantization.
Author
Owner

@skye0402 commented on GitHub (May 25, 2024):

@umiyuki I found the 4K 14b model to be working quite well, the 128k 14b model definitely has problems. I used 6 bit quantization for both to compare with identical input.

<!-- gh-comment-id:2131227446 --> @skye0402 commented on GitHub (May 25, 2024): @umiyuki I found the 4K 14b model to be working quite well, the 128k 14b model definitely has problems. I used 6 bit quantization for both to compare with identical input.
Author
Owner

@pdevine commented on GitHub (May 28, 2024):

@andyyumiao the default is Q4_0, which you can see here:
Screenshot 2024-05-28 at 13 33 02
Screenshot 2024-05-28 at 13 31 27

It's possible (and more likely) that your prompt template is off on the version that you converted yourself. I'll close out the issue.

<!-- gh-comment-id:2136068708 --> @pdevine commented on GitHub (May 28, 2024): @andyyumiao the default is `Q4_0`, which you can see here: <img width="776" alt="Screenshot 2024-05-28 at 13 33 02" src="https://github.com/ollama/ollama/assets/75239/a5c09e23-f853-4600-9326-4b62be121c21"> <img width="777" alt="Screenshot 2024-05-28 at 13 31 27" src="https://github.com/ollama/ollama/assets/75239/5fc036a5-c844-4dad-89ff-de5c1990ff58"> It's possible (and more likely) that your prompt template is off on the version that you converted yourself. I'll close out the issue.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#2905