[GH-ISSUE #6352] The quality of answer significantly deteriorates after Automatic Quantization #66022

Closed
opened 2026-05-03 23:38:31 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @garyyang85 on GitHub (Aug 14, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6352

What is the issue?

Using gemma2-27b for test, after ollama create -q Q8_0, the quality of answer is not very good. The accuracy seems more worse than origin gemma2-9b in hf. What's the principle behind Ollama's quantization? Is it something like post-training quantization? Thanks.

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.3.5

Originally created by @garyyang85 on GitHub (Aug 14, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6352 ### What is the issue? Using gemma2-27b for test, after ollama create -q Q8_0, the quality of answer is not very good. The accuracy seems more worse than origin gemma2-9b in hf. What's the principle behind Ollama's quantization? Is it something like post-training quantization? Thanks. ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.3.5
GiteaMirror added the bug label 2026-05-03 23:38:31 -05:00
Author
Owner

@rick-github commented on GitHub (Jan 11, 2026):

It reduces the size of the model by converting fp16/fp32 to smaller data types. This results in a loss of precision and range which affects the quality of the output, much like compressing an image reduces detail and causes artifacts.

<!-- gh-comment-id:3736281168 --> @rick-github commented on GitHub (Jan 11, 2026): It reduces the size of the model by converting fp16/fp32 to smaller data types. This results in a loss of precision and range which affects the quality of the output, much like compressing an image reduces detail and causes artifacts.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#66022