[GH-ISSUE #10121] Can't use official QAT GGUF of Gemma-3-27b-it #6640

Closed
opened 2026-04-12 18:19:20 -05:00 by GiteaMirror · 15 comments
Owner

Originally created by @vYLQs6 on GitHub (Apr 4, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10121

What is the issue?

Google recently published QAT gguf of Gemma-3-27b-it
https://huggingface.co/google/gemma-3-27b-it-qat-q4_0-gguf

Which should offer better performance at Q4 compare to normal gguf

But I get these errors when I import the gguf into ollama and try to run it:

ollama show gemma-3-27b-it-q4_0-QAT:latest
Error: model 'gemma-3-27b-it-q4_0-QAT:latest' not found
ollama run gemma-3-27b-it-q4_0-QAT:latest
pulling manifest
Error: pull model manifest: file does not exist
[GIN] 2025/04/04 - 09:57:54 | 404 |     13.0001ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/04/04 - 09:58:06 | 200 |            0s |       127.0.0.1 | HEAD     "/"
time=2025-04-04T09:58:06.848+08:00 level=ERROR source=images.go:92 msg="couldn't open model file" error="open : The system cannot find the file specified."
[GIN] 2025/04/04 - 09:58:06 | 404 |     11.9479ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2025/04/04 - 09:58:08 | 200 |    1.5073083s |       127.0.0.1 | POST     "/api/pull"
[GIN] 2025/04/04 - 10:01:22 | 200 |            0s |       127.0.0.1 | GET      "/api/version"

But the model shows up in the ollama model list

ollama list
NAME                                                           ID              SIZE       MODIFIED
gemma-3-27b-it-q4_0-QAT:latest                                 e0bc88736cd6    17 GB      8 minutes ago

ollama Modelfile

FROM gemma-3-27b-it-q4_0-QAT.gguf
TEMPLATE """{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 }}
{{- if or (eq .Role "user") (eq .Role "system") }}<start_of_turn>user
{{ .Content }}<end_of_turn>
{{ if $last }}<start_of_turn>model
{{ end }}
{{- else if eq .Role "assistant" }}<start_of_turn>model
{{ .Content }}{{ if not $last }}<end_of_turn>
{{ end }}
{{- end }}
{{- end }}"""
PARAMETER stop                           "<end_of_turn>"
PARAMETER temperature                    1
PARAMETER top_k                          64
PARAMETER top_p                          0.95

gguf: https://huggingface.co/google/gemma-3-27b-it-qat-q4_0-gguf/blob/main/gemma-3-27b-it-q4_0.gguf

OS

Windows

GPU

Nvidia

CPU

AMD

Ollama version

0.6.4

Originally created by @vYLQs6 on GitHub (Apr 4, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10121 ### What is the issue? Google recently published QAT gguf of Gemma-3-27b-it https://huggingface.co/google/gemma-3-27b-it-qat-q4_0-gguf Which should offer better performance at Q4 compare to normal gguf But I get these errors when I import the gguf into ollama and try to run it: ``` ollama show gemma-3-27b-it-q4_0-QAT:latest Error: model 'gemma-3-27b-it-q4_0-QAT:latest' not found ``` ``` ollama run gemma-3-27b-it-q4_0-QAT:latest pulling manifest Error: pull model manifest: file does not exist ``` ``` [GIN] 2025/04/04 - 09:57:54 | 404 | 13.0001ms | 127.0.0.1 | POST "/api/show" [GIN] 2025/04/04 - 09:58:06 | 200 | 0s | 127.0.0.1 | HEAD "/" time=2025-04-04T09:58:06.848+08:00 level=ERROR source=images.go:92 msg="couldn't open model file" error="open : The system cannot find the file specified." [GIN] 2025/04/04 - 09:58:06 | 404 | 11.9479ms | 127.0.0.1 | POST "/api/show" [GIN] 2025/04/04 - 09:58:08 | 200 | 1.5073083s | 127.0.0.1 | POST "/api/pull" [GIN] 2025/04/04 - 10:01:22 | 200 | 0s | 127.0.0.1 | GET "/api/version" ``` But the model shows up in the ollama model list ``` ollama list NAME ID SIZE MODIFIED gemma-3-27b-it-q4_0-QAT:latest e0bc88736cd6 17 GB 8 minutes ago ``` #### ollama `Modelfile` ``` FROM gemma-3-27b-it-q4_0-QAT.gguf TEMPLATE """{{- range $i, $_ := .Messages }} {{- $last := eq (len (slice $.Messages $i)) 1 }} {{- if or (eq .Role "user") (eq .Role "system") }}<start_of_turn>user {{ .Content }}<end_of_turn> {{ if $last }}<start_of_turn>model {{ end }} {{- else if eq .Role "assistant" }}<start_of_turn>model {{ .Content }}{{ if not $last }}<end_of_turn> {{ end }} {{- end }} {{- end }}""" PARAMETER stop "<end_of_turn>" PARAMETER temperature 1 PARAMETER top_k 64 PARAMETER top_p 0.95 ``` gguf: https://huggingface.co/google/gemma-3-27b-it-qat-q4_0-gguf/blob/main/gemma-3-27b-it-q4_0.gguf ### OS Windows ### GPU Nvidia ### CPU AMD ### Ollama version 0.6.4
GiteaMirror added the bug label 2026-04-12 18:19:20 -05:00
Author
Owner

@Kwisss commented on GitHub (Apr 4, 2025):

Same here.

<!-- gh-comment-id:2779059609 --> @Kwisss commented on GitHub (Apr 4, 2025): Same here.
Author
Owner

@SingularityMan commented on GitHub (Apr 4, 2025):

Same. Exact. Issue.

Fortunately one workaround is to use that same model which was uploaded yesterday by another user:

https://ollama.com/eramax/gemma-3-27b-it-qat

Now someone just made an even better version

https://huggingface.co/stduhpf/google-gemma-3-27b-it-qat-q4_0-gguf-small

ollama run JollyLlama/gemma-3-27b-it-q4_0_Small-QAT
<!-- gh-comment-id:2779359856 --> @SingularityMan commented on GitHub (Apr 4, 2025): Same. Exact. Issue. Fortunately one workaround is to use that same model which was uploaded yesterday by another user: https://ollama.com/eramax/gemma-3-27b-it-qat Now someone just made an even better version https://huggingface.co/stduhpf/google-gemma-3-27b-it-qat-q4_0-gguf-small ``` ollama run JollyLlama/gemma-3-27b-it-q4_0_Small-QAT ```
Author
Owner

@samyon7 commented on GitHub (Apr 4, 2025):

Same. Exact. Issue.

Fortunately one workaround is to use that same model which was uploaded yesterday by another user:

https://ollama.com/eramax/gemma-3-27b-it-qat

No. We actually need the way to create the model locally, so need no download

<!-- gh-comment-id:2779486887 --> @samyon7 commented on GitHub (Apr 4, 2025): > Same. Exact. Issue. > > Fortunately one workaround is to use that same model which was uploaded yesterday by another user: > > https://ollama.com/eramax/gemma-3-27b-it-qat No. We actually need the way to create the model locally, so need no download
Author
Owner

@hiarcs commented on GitHub (Apr 6, 2025):

Same issue

<!-- gh-comment-id:2781366006 --> @hiarcs commented on GitHub (Apr 6, 2025): Same issue
Author
Owner

@xxxpsyduck commented on GitHub (Apr 9, 2025):

You can run this model like this

ollama run hf.co/google/gemma-3-12b-it-qat-q4_0-gguf

This is a gated model so remember to gain the access and add ssh-key to ollama before pulling

<!-- gh-comment-id:2788919402 --> @xxxpsyduck commented on GitHub (Apr 9, 2025): You can run this model like this ``` ollama run hf.co/google/gemma-3-12b-it-qat-q4_0-gguf ``` This is a gated model so remember to gain the access and add ssh-key to ollama before pulling
Author
Owner

@aa956 commented on GitHub (Apr 9, 2025):

You can run this model like this

ollama run hf.co/google/gemma-3-12b-it-qat-q4_0-gguf

This is a gated model so remember to gain the access and add ssh-key to ollama before pulling

Is it possible to run it from local filesystem if the models are already downloaded using huggingface-cli?

E.g.:

$ ollama run /mnt/e/cache/huggingface/hub/models--google--gemma-3-27b-it-qat-q4_0-gguf/snapshots/08a55151ac69bb134e4b18ccb47f1f6dfd3caf75/gemma-3-27b-it-q4_0.gguf
Error: invalid model path
$ ollama run /mnt/e/cache/huggingface/hub/models--google--gemma-3-27b-it-qat-q4_0-gguf/snapshots/08a55151ac69bb134e4b18ccb47f1f6dfd3caf75/gemma-3-27b-it-q4_0.gguf /mnt/e/cache/huggingface/hub/models--google--gemma-3-27b-it-qat-q4_0-gguf/snapshots/08a55151ac69bb134e4b18ccb47f1f6dfd3caf75/mmproj-model-f16-27B.gguf
Error: invalid model path

Tried to create a modelfile but it does not work:

FROM /mnt/e/cache/huggingface/hub/models--google--gemma-3-27b-it-qat-q4_0-gguf/snapshots/08a55151ac69bb134e4b18ccb47f1f6dfd3caf75/gemma-3-27b-it-q4_0.gguf
FROM /mnt/e/cache/huggingface/hub/models--google--gemma-3-27b-it-qat-q4_0-gguf/snapshots/08a55151ac69bb134e4b18ccb47f1f6dfd3caf75/mmproj-model-f16-27B.gguf
TEMPLATE """{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 }}
{{- if or (eq .Role "user") (eq .Role "system") }}<start_of_turn>user
{{ .Content }}<end_of_turn>
{{ if $last }}<start_of_turn>model
{{ end }}
{{- else if eq .Role "assistant" }}<start_of_turn>model
{{ .Content }}{{ if not $last }}<end_of_turn>
{{ end }}
{{- end }}
{{- end }}"""
PARAMETER stop <end_of_turn>
$ ollama create gemma3l -f modelfile-gemma3-27b-q4
gathering model components
copying file sha256:45e586879bc5f5d7a5b6527e812952057ce916d9fc7ba16f7262ec9972c9e2a2 100%
copying file sha256:54cb61c842fe49ac3c89bc1a614a2778163eb49f3dec2b90ff688b4c0392cb48 100%
parsing GGUF
using existing layer sha256:45e586879bc5f5d7a5b6527e812952057ce916d9fc7ba16f7262ec9972c9e2a2
using existing layer sha256:54cb61c842fe49ac3c89bc1a614a2778163eb49f3dec2b90ff688b4c0392cb48
using existing layer sha256:e0a42594d802e5d31cdc786deb4823edb8adff66094d49de8fffe976d753e348
using existing layer sha256:d3a76cb8c4a07d0a6c82ac6e839f98816b5077699d393b2cc77008c16d8078ac
writing manifest
success
$ ollama ls
NAME                 ID              SIZE     MODIFIED
gemma3l:latest       2de635245b1f    18 GB    10 seconds ago
gemma3:latest        2de635245b1f    18 GB    9 minutes ago
gemma-3:27b-it-q4    2de635245b1f    18 GB    13 minutes ago
$ ollama run gemma3l
pulling manifest
Error: pull model manifest: file does not exist

Edit: just in case, these same files work with llama.cpp:

$./build/bin/llama-gemma3-cli --gpu-layers 63 --flash-attn --device CUDA0 \
--cache-type-k q8_0 --cache-type-v q8_0 --ctx-size 16384 --temp 1.0 \
--top-k 64 --top-p 0.95 --min-p 0.01 \
--model /mnt/e/cache/huggingface/hub/models--google--gemma-3-27b-it-qat-q4_0-gguf/snapshots/08a55151ac69bb134e4b18ccb47f1f6dfd3caf75/gemma-3-27b-it-q4_0.gguf \
--mmproj /mnt/e/cache/huggingface/hub/models--google--gemma-3-27b-it-qat-q4_0-gguf/snapshots/08a55151ac69bb134e4b18ccb47f1f6dfd3caf75/mmproj-model-f16-27B.gguf

... a lot of logs ...

alloc_compute_meta:        CPU compute buffer size =     9.19 MiB
main: /mnt/e/cache/huggingface/hub/models--google--gemma-3-27b-it-qat-q4_0-gguf/snapshots/08a55151ac69bb134e4b18ccb47f1f6dfd3caf75/gemma-3-27b-it-q4_0.gguf

 Running in chat mode, available commands:
   /image <path>    load an image
   /clear           clear the chat history
   /quit or /exit   exit the program

> Describe yourself
Hello! I'm Gemma, an open-weights AI assistant. I'm a large language model trained by Google DeepMind.

Here's a bit more about me:

....
/quit
$
<!-- gh-comment-id:2789152883 --> @aa956 commented on GitHub (Apr 9, 2025): > You can run this model like this > > ``` > ollama run hf.co/google/gemma-3-12b-it-qat-q4_0-gguf > ``` > > This is a gated model so remember to gain the access and add ssh-key to ollama before pulling Is it possible to run it from local filesystem if the models are already downloaded using `huggingface-cli`? E.g.: ```console $ ollama run /mnt/e/cache/huggingface/hub/models--google--gemma-3-27b-it-qat-q4_0-gguf/snapshots/08a55151ac69bb134e4b18ccb47f1f6dfd3caf75/gemma-3-27b-it-q4_0.gguf Error: invalid model path $ ollama run /mnt/e/cache/huggingface/hub/models--google--gemma-3-27b-it-qat-q4_0-gguf/snapshots/08a55151ac69bb134e4b18ccb47f1f6dfd3caf75/gemma-3-27b-it-q4_0.gguf /mnt/e/cache/huggingface/hub/models--google--gemma-3-27b-it-qat-q4_0-gguf/snapshots/08a55151ac69bb134e4b18ccb47f1f6dfd3caf75/mmproj-model-f16-27B.gguf Error: invalid model path ``` Tried to create a modelfile but it does not work: ```text FROM /mnt/e/cache/huggingface/hub/models--google--gemma-3-27b-it-qat-q4_0-gguf/snapshots/08a55151ac69bb134e4b18ccb47f1f6dfd3caf75/gemma-3-27b-it-q4_0.gguf FROM /mnt/e/cache/huggingface/hub/models--google--gemma-3-27b-it-qat-q4_0-gguf/snapshots/08a55151ac69bb134e4b18ccb47f1f6dfd3caf75/mmproj-model-f16-27B.gguf TEMPLATE """{{- range $i, $_ := .Messages }} {{- $last := eq (len (slice $.Messages $i)) 1 }} {{- if or (eq .Role "user") (eq .Role "system") }}<start_of_turn>user {{ .Content }}<end_of_turn> {{ if $last }}<start_of_turn>model {{ end }} {{- else if eq .Role "assistant" }}<start_of_turn>model {{ .Content }}{{ if not $last }}<end_of_turn> {{ end }} {{- end }} {{- end }}""" PARAMETER stop <end_of_turn> ``` ```console $ ollama create gemma3l -f modelfile-gemma3-27b-q4 gathering model components copying file sha256:45e586879bc5f5d7a5b6527e812952057ce916d9fc7ba16f7262ec9972c9e2a2 100% copying file sha256:54cb61c842fe49ac3c89bc1a614a2778163eb49f3dec2b90ff688b4c0392cb48 100% parsing GGUF using existing layer sha256:45e586879bc5f5d7a5b6527e812952057ce916d9fc7ba16f7262ec9972c9e2a2 using existing layer sha256:54cb61c842fe49ac3c89bc1a614a2778163eb49f3dec2b90ff688b4c0392cb48 using existing layer sha256:e0a42594d802e5d31cdc786deb4823edb8adff66094d49de8fffe976d753e348 using existing layer sha256:d3a76cb8c4a07d0a6c82ac6e839f98816b5077699d393b2cc77008c16d8078ac writing manifest success $ ollama ls NAME ID SIZE MODIFIED gemma3l:latest 2de635245b1f 18 GB 10 seconds ago gemma3:latest 2de635245b1f 18 GB 9 minutes ago gemma-3:27b-it-q4 2de635245b1f 18 GB 13 minutes ago $ ollama run gemma3l pulling manifest Error: pull model manifest: file does not exist ``` Edit: just in case, these same files work with llama.cpp: ```console $./build/bin/llama-gemma3-cli --gpu-layers 63 --flash-attn --device CUDA0 \ --cache-type-k q8_0 --cache-type-v q8_0 --ctx-size 16384 --temp 1.0 \ --top-k 64 --top-p 0.95 --min-p 0.01 \ --model /mnt/e/cache/huggingface/hub/models--google--gemma-3-27b-it-qat-q4_0-gguf/snapshots/08a55151ac69bb134e4b18ccb47f1f6dfd3caf75/gemma-3-27b-it-q4_0.gguf \ --mmproj /mnt/e/cache/huggingface/hub/models--google--gemma-3-27b-it-qat-q4_0-gguf/snapshots/08a55151ac69bb134e4b18ccb47f1f6dfd3caf75/mmproj-model-f16-27B.gguf ... a lot of logs ... alloc_compute_meta: CPU compute buffer size = 9.19 MiB main: /mnt/e/cache/huggingface/hub/models--google--gemma-3-27b-it-qat-q4_0-gguf/snapshots/08a55151ac69bb134e4b18ccb47f1f6dfd3caf75/gemma-3-27b-it-q4_0.gguf Running in chat mode, available commands: /image <path> load an image /clear clear the chat history /quit or /exit exit the program > Describe yourself Hello! I'm Gemma, an open-weights AI assistant. I'm a large language model trained by Google DeepMind. Here's a bit more about me: .... /quit $ ```
Author
Owner

@xxxpsyduck commented on GitHub (Apr 9, 2025):

@aa956 How about

ollama run gemma31:latest
<!-- gh-comment-id:2789197634 --> @xxxpsyduck commented on GitHub (Apr 9, 2025): @aa956 How about ``` ollama run gemma31:latest ```
Author
Owner

@aa956 commented on GitHub (Apr 9, 2025):

@aa956 How about

ollama run gemma31:latest

Tried this too, still looks like local files are not even considered, ollama tries to download something from remote repository:

$ ollama ls
NAME                 ID              SIZE     MODIFIED
gemma3l:latest       2de635245b1f    18 GB    23 minutes ago
gemma3:latest        2de635245b1f    18 GB    32 minutes ago
gemma-3:27b-it-q4    2de635245b1f    18 GB    36 minutes ago
$ ollama run gemma3l:latest
pulling manifest
Error: pull model manifest: file does not exist
$ ollama run 2de635245b1f
pulling manifest
Error: pull model manifest: file does not exist
$ ollama -v
ollama version is 0.6.5
$
<!-- gh-comment-id:2789216869 --> @aa956 commented on GitHub (Apr 9, 2025): > [@aa956](https://github.com/aa956) How about > > ``` > ollama run gemma31:latest > ``` Tried this too, still looks like local files are not even considered, ollama tries to download something from remote repository: ```console $ ollama ls NAME ID SIZE MODIFIED gemma3l:latest 2de635245b1f 18 GB 23 minutes ago gemma3:latest 2de635245b1f 18 GB 32 minutes ago gemma-3:27b-it-q4 2de635245b1f 18 GB 36 minutes ago $ ollama run gemma3l:latest pulling manifest Error: pull model manifest: file does not exist $ ollama run 2de635245b1f pulling manifest Error: pull model manifest: file does not exist $ ollama -v ollama version is 0.6.5 $ ```
Author
Owner

@rick-github commented on GitHub (Apr 9, 2025):

The local files are identified as projector weights rather than model weights because the import procedure checks for a KV entry of vision.block_count. Because the gemma3 models combine both model and projector into a single file, that file is identified as a projector and not a model. #10162 to fix that. In the meantime, after you have created the model, go to the manifest file ($OLLAMA_MODELS/manifests/registry.ollama.ai/library/gemma-3-27b-it-q4_0-QAT/latest, the last two elements of the path depend on what you are naming the new model) and change image.projector to image.model.

<!-- gh-comment-id:2790188581 --> @rick-github commented on GitHub (Apr 9, 2025): The local files are identified as projector weights rather than model weights because the import procedure checks for a KV entry of vision.block_count. Because the gemma3 models combine both model and projector into a single file, that file is identified as a projector and not a model. #10162 to fix that. In the meantime, after you have created the model, go to the manifest file (`$OLLAMA_MODELS/manifests/registry.ollama.ai/library/gemma-3-27b-it-q4_0-QAT/latest`, the last two elements of the path depend on what you are naming the new model) and change `image.projector` to `image.model`.
Author
Owner

@aa956 commented on GitHub (Apr 9, 2025):

and change image.projector to image.model

Thank you, this worked:

$ ollama run gemma3l:latest
>>> Describe yourself in single sentence
I am Gemma, an open-weights AI assistant, a large language model trained by Google DeepMind, widely available to the public, and skilled in 
understanding and generating human-like text from both text and image inputs.
>>> /bye
$
<!-- gh-comment-id:2790209136 --> @aa956 commented on GitHub (Apr 9, 2025): > and change `image.projector` to `image.model` Thank you, this worked: ```console $ ollama run gemma3l:latest >>> Describe yourself in single sentence I am Gemma, an open-weights AI assistant, a large language model trained by Google DeepMind, widely available to the public, and skilled in understanding and generating human-like text from both text and image inputs. >>> /bye $ ```
Author
Owner

@martinezhermes commented on GitHub (Apr 19, 2025):

and change image.projector to image.model

Thank you, this worked:

$ ollama run gemma3l:latest

Describe yourself in single sentence
I am Gemma, an open-weights AI assistant, a large language model trained by Google DeepMind, widely available to the public, and skilled in
understanding and generating human-like text from both text and image inputs.
/bye
$

how do you know this thing is running a QAT q4?

<!-- gh-comment-id:2816650087 --> @martinezhermes commented on GitHub (Apr 19, 2025): > > and change `image.projector` to `image.model` > > Thank you, this worked: > > $ ollama run gemma3l:latest > >>> Describe yourself in single sentence > I am Gemma, an open-weights AI assistant, a large language model trained by Google DeepMind, widely available to the public, and skilled in > understanding and generating human-like text from both text and image inputs. > >>> /bye > $ how do you know this thing is running a QAT q4?
Author
Owner

@aa956 commented on GitHub (Apr 19, 2025):

how do you know this thing is running a QAT q4?

Because modelfile is created from locally downloaded QAT q4 files?

See modelfile in this comment: https://github.com/ollama/ollama/issues/10121#issuecomment-2789152883

Model weights were downloaded using huggingface-cli download google/gemma-3-27b-it-qat-q4_0-gguf

Problem was that manually downloaded files (successfully used by llama.cpp's llama-gemma3-cli) were not working in ollama.

After getting this issue solved and finding out here that image input is not working at the moment I've switched to koboldcpp for now as all I've needed was working API for image captioning on local machine.

<!-- gh-comment-id:2816654197 --> @aa956 commented on GitHub (Apr 19, 2025): > how do you know this thing is running a QAT q4? Because modelfile is created from locally downloaded QAT q4 files? See modelfile in this comment: https://github.com/ollama/ollama/issues/10121#issuecomment-2789152883 Model weights were downloaded using `huggingface-cli download google/gemma-3-27b-it-qat-q4_0-gguf` Problem was that manually downloaded files (successfully used by llama.cpp's `llama-gemma3-cli`) were not working in ollama. After [getting this issue solved](https://github.com/ollama/ollama/issues/10121#issuecomment-2790188581) and finding out [here that image input is not working at the moment](https://github.com/ollama/ollama/pull/10162#issuecomment-2783272360) I've switched to `koboldcpp` for now as all I've needed was working API for image captioning on local machine.
Author
Owner

@martinezhermes commented on GitHub (Apr 19, 2025):

how do you know this thing is running a QAT q4?

Because modelfile is created from locally downloaded QAT q4 files?

See modelfile in this comment: #10121 (comment)

Model weights were downloaded using huggingface-cli download google/gemma-3-27b-it-qat-q4_0-gguf

Problem was that manually downloaded files (successfully used by llama.cpp's llama-gemma3-cli) were not working in ollama.

After getting this issue solved and finding out here that image input is not working at the moment I've switched to koboldcpp for now as all I've needed was working API for image captioning on local machine.

Right! got it, have you tried the recently uploaded QAT? there are files in gemma3 that correspond to fixes by Google, uploaded yesterday, I got to confirm if image work with those, every version is there now, QAT do not show quant size, but we know they (google) applied q4 quantization after training with QAT, so no need for being redundant, I guess.

Edit: they do show it, in the menu, not in the name

<!-- gh-comment-id:2816703200 --> @martinezhermes commented on GitHub (Apr 19, 2025): > > how do you know this thing is running a QAT q4? > > Because modelfile is created from locally downloaded QAT q4 files? > > See modelfile in this comment: [#10121 (comment)](https://github.com/ollama/ollama/issues/10121#issuecomment-2789152883) > > Model weights were downloaded using `huggingface-cli download google/gemma-3-27b-it-qat-q4_0-gguf` > > Problem was that manually downloaded files (successfully used by llama.cpp's `llama-gemma3-cli`) were not working in ollama. > > After [getting this issue solved](https://github.com/ollama/ollama/issues/10121#issuecomment-2790188581) and finding out [here that image input is not working at the moment](https://github.com/ollama/ollama/pull/10162#issuecomment-2783272360) I've switched to `koboldcpp` for now as all I've needed was working API for image captioning on local machine. Right! got it, have you tried the recently uploaded QAT? there are files in gemma3 that correspond to fixes by Google, uploaded yesterday, I got to confirm if image work with those, every version is there now, QAT do not show quant size, but we know they (google) applied q4 quantization after training with QAT, so no need for being redundant, I guess. _Edit: they do show it, in the menu, not in the name_
Author
Owner

@SingularityMan commented on GitHub (Apr 19, 2025):

UPDATE: Google fixed it on Ollama. Its multimodal now.

https://ollama.com/library/gemma3

Make sure to upgrade Ollama to minimize OOM and fiddle with KV Cache.

<!-- gh-comment-id:2816703354 --> @SingularityMan commented on GitHub (Apr 19, 2025): UPDATE: Google fixed it on Ollama. Its multimodal now. https://ollama.com/library/gemma3 Make sure to upgrade Ollama to minimize OOM and fiddle with KV Cache.
Author
Owner

@rick-github commented on GitHub (Apr 19, 2025):

https://ollama.com/library/gemma3:12b-it-qat

<!-- gh-comment-id:2816815008 --> @rick-github commented on GitHub (Apr 19, 2025): https://ollama.com/library/gemma3:12b-it-qat
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#6640