[PR #9776] [MERGED] fix: gemma3 quantization #38930

Closed
opened 2026-04-22 23:35:02 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/9776
Author: @pdevine
Created: 3/14/2025
Status: Merged
Merged: 3/15/2025
Merged by: @pdevine

Base: mainHead: pdevine/gemma3-quant


📝 Commits (1)

📊 Changes

5 files changed (+149 additions, -0 deletions)

View changed files

📝 llama/llama.cpp/src/llama-arch.cpp (+19 -0)
📝 llama/llama.cpp/src/llama-arch.h (+1 -0)
📝 llama/llama.cpp/src/llama-model.cpp (+7 -0)
📝 llama/llama.cpp/src/llama-quant.cpp (+9 -0)
llama/patches/0021-gemma3-quantization.patch (+113 -0)

📄 Description

This change allows users to use ollama create --quantize <level> to quantize gemma3 models.

We still use llama.cpp in order to do quantization with ollama create and do not make calls directly to ggml (as with ollama run) so we need to define the architecture so that llama.cpp can correctly do the quantization. In the future we can make a call to quantize each tensor directly.

Note that the vision part of the model is now included in the GGUF file and does not get quantized along with the text part of the model.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/9776 **Author:** [@pdevine](https://github.com/pdevine) **Created:** 3/14/2025 **Status:** ✅ Merged **Merged:** 3/15/2025 **Merged by:** [@pdevine](https://github.com/pdevine) **Base:** `main` ← **Head:** `pdevine/gemma3-quant` --- ### 📝 Commits (1) - [`a4203fa`](https://github.com/ollama/ollama/commit/a4203fabc4ca811104474d32c8819e013df89196) gemma3 quantization ### 📊 Changes **5 files changed** (+149 additions, -0 deletions) <details> <summary>View changed files</summary> 📝 `llama/llama.cpp/src/llama-arch.cpp` (+19 -0) 📝 `llama/llama.cpp/src/llama-arch.h` (+1 -0) 📝 `llama/llama.cpp/src/llama-model.cpp` (+7 -0) 📝 `llama/llama.cpp/src/llama-quant.cpp` (+9 -0) ➕ `llama/patches/0021-gemma3-quantization.patch` (+113 -0) </details> ### 📄 Description This change allows users to use `ollama create --quantize <level>` to quantize gemma3 models. We still use llama.cpp in order to do quantization with `ollama create` and do not make calls directly to ggml (as with `ollama run`) so we need to define the architecture so that llama.cpp can correctly do the quantization. In the future we can make a call to quantize each tensor directly. Note that the vision part of the model is now included in the GGUF file and does *not* get quantized along with the text part of the model. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-22 23:35:02 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#38930