[GH-ISSUE #9769] for Gemma 3,which one has better performance, 27b(4bit) or 12b(fp16) #6387

Closed
opened 2026-04-12 17:53:55 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @zhaojigang on GitHub (Mar 14, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9769

for Gemma 3,which one has better performance, 27b(4bit) or 12b(fp16)

Originally created by @zhaojigang on GitHub (Mar 14, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9769 for Gemma 3,which one has better performance, 27b(4bit) or 12b(fp16)
GiteaMirror added the question label 2026-04-12 17:53:55 -05:00
Author
Owner

@pdevine commented on GitHub (Mar 14, 2025):

You know, this is a great question. Back of the napkin math is the 12b fp16 model is going to be larger (24GB vs 17GB), but we don't have definitive numbers on which speed/perplexity. In both cases the vision encoder/projector is going to be unquantized so will be identical between the two; only the text part is different.

<!-- gh-comment-id:2725967645 --> @pdevine commented on GitHub (Mar 14, 2025): You know, this is a great question. Back of the napkin math is the 12b fp16 model is going to be larger (24GB vs 17GB), but we don't have definitive numbers on which speed/perplexity. In both cases the vision encoder/projector is going to be unquantized so will be identical between the two; only the text part is different.
Author
Owner

@zhaojigang commented on GitHub (Mar 15, 2025):

You know, this is a great question. Back of the napkin math is the 12b fp16 model is going to be larger (24GB vs 17GB), but we don't have definitive numbers on which speed/perplexity. In both cases the vision encoder/projector is going to be unquantized so will be identical between the two; only the text part is different.

Okay, got it. Thank you for your reply

<!-- gh-comment-id:2726174748 --> @zhaojigang commented on GitHub (Mar 15, 2025): > You know, this is a great question. Back of the napkin math is the 12b fp16 model is going to be larger (24GB vs 17GB), but we don't have definitive numbers on which speed/perplexity. In both cases the vision encoder/projector is going to be unquantized so will be identical between the two; only the text part is different. Okay, got it. Thank you for your reply
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#6387