[GH-ISSUE #10792] Gemma 3n #32845

New Issue

GiteaMirror · 2026-04-22T14:42:46-05:00

GiteaMirror commented

2026-04-22 14:42:46 -05:00

Originally created by @diegovalenzuelaiturra on GitHub (May 21, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10792

Originally assigned to: @mxyng on GitHub.

Would it be possible to support Gemma 3n ?

Originally created by @diegovalenzuelaiturra on GitHub (May 21, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10792 Originally assigned to: @mxyng on GitHub. Would it be possible to support Gemma 3n ?

GiteaMirror added the model label 2026-04-22 14:42:46 -05:00

GiteaMirror commented

2026-04-22 14:42:48 -05:00

@groundstation1 commented on GitHub (May 27, 2025):

made llama.cpp request on their idea board: https://github.com/ggml-org/llama.cpp/discussions/13831
(hope that's the right place. That's where the bulk of the work has to be done, right?)

@groundstation1 commented on GitHub (May 27, 2025): made llama.cpp request on their idea board: https://github.com/ggml-org/llama.cpp/discussions/13831 (hope that's the right place. That's where the bulk of the work has to be done, right?)

GiteaMirror commented

2026-04-22 14:42:49 -05:00

@chriszs commented on GitHub (Jun 11, 2025):

When asked about whether the current LiteRT (formerly Tensorflow) weights would ever be released in GGUF, one Google developer relations person said, "Yes, this repository is just a preview. Stay tuned!" More recently, another said, "We are working hard on making Gemma 3n 4B available in popular open source frameworks! Stay tuned." So, I guess we should "stay tuned."

@chriszs commented on GitHub (Jun 11, 2025): When asked about whether the current LiteRT (formerly Tensorflow) weights would ever be released in GGUF, one Google developer relations person [said](https://huggingface.co/google/gemma-3n-E4B-it-litert-preview/discussions/11#68344aef0c0aff775f05b1b4), "Yes, this repository is just a preview. Stay tuned!" More recently, another [said](https://x.com/_philschmid/status/1932333817789870329), "We are working hard on making Gemma 3n 4B available in popular open source frameworks! Stay tuned." So, I guess we should "stay tuned."

GiteaMirror commented

2026-04-22 14:42:50 -05:00

@rick-github commented on GitHub (Jun 26, 2025):

https://ollama.com/library/gemma3n. Requires 0.9.3. Note that this version is not multi-modal, text generation only.

@rick-github commented on GitHub (Jun 26, 2025): https://ollama.com/library/gemma3n. Requires [0.9.3](https://github.com/ollama/ollama/releases/tag/v0.9.3). Note that this version is not multi-modal, text generation only.

GiteaMirror commented

2026-04-22 14:42:51 -05:00

@feynon commented on GitHub (Jun 26, 2025):

Do we expect multimodal generation soon? Is it blocked on downstream llama.cpp in any way?

@feynon commented on GitHub (Jun 26, 2025): Do we expect multimodal generation soon? Is it blocked on downstream llama.cpp in any way?

GiteaMirror commented

2026-04-22 14:42:52 -05:00

@The-Best-Codes commented on GitHub (Jun 26, 2025):

Would love to see support now that Gemma 3n is stable:
https://huggingface.co/collections/google/gemma-3n-685065323f5984ef315c93f4

There are also community GGUFs:
https://huggingface.co/unsloth/gemma-3n-E4B-it-GGUF

@The-Best-Codes commented on GitHub (Jun 26, 2025): Would love to see support now that Gemma 3n is stable: https://huggingface.co/collections/google/gemma-3n-685065323f5984ef315c93f4 There are also community GGUFs: https://huggingface.co/unsloth/gemma-3n-E4B-it-GGUF

GiteaMirror commented

2026-04-22 14:42:57 -05:00

@mxyng commented on GitHub (Jun 26, 2025):

Vision is a work in progress

@mxyng commented on GitHub (Jun 26, 2025): Vision is a work in progress

GiteaMirror commented

2026-04-22 14:42:59 -05:00

@jonigl commented on GitHub (Jun 26, 2025):

https://ollama.com/library/gemma3n. Requires 0.9.3. Note that this version is not multi-modal, text generation only.

Would love to see Tools support added to this model! Are there any plans to support it?

@jonigl commented on GitHub (Jun 26, 2025): > https://ollama.com/library/gemma3n. Requires [0.9.3](https://github.com/ollama/ollama/releases/tag/v0.9.3). Note that this version is not multi-modal, text generation only. Would love to see Tools support added to this model! Are there any plans to support it?

GiteaMirror commented

2026-04-22 14:43:00 -05:00

@rick-github commented on GitHub (Jun 26, 2025):

The model doesn't support tool use. You can hack it in like was done for gemma3, but the quality of response may be poor.

@rick-github commented on GitHub (Jun 26, 2025): The model doesn't support tool use. You can hack it in like was done for [gemma3](https://ollama.com/search?q=gemma3+tool), but the quality of response may be poor.

GiteaMirror commented

2026-04-22 14:43:01 -05:00

@rick-github commented on GitHub (Jun 26, 2025):

As I suspected, it's not a great tool user, but it actually did better than I expected. One thing I noticed is that it gets noticeably slower as the context increases, so perhaps not good for multi-turn tools calls.

FROM gemma3n:latest
TEMPLATE """{{- if or .System .Tools }}<start_of_turn>user
{{ .System }}
{{- if .Tools }}

You can use these tools to help answer the user's question:
{{- range .Tools }}
{{ . }}
{{- end }}

When you need to use a tool, format your response as JSON as follows:
```tool
{"name": "tool_name", "parameters": {"param1": "value1", "param2": "value2"}}
```
{{- end }}<end_of_turn>
<start_of_turn>model
I understand.
<end_of_turn>
{{- end }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 }}
{{- if eq .Role "user" }}<start_of_turn>user
{{ .Content }}<end_of_turn>
{{ if $last }}<start_of_turn>model
{{ end }}
{{- else if eq .Role "assistant" }}<start_of_turn>model
{{- if .ToolCalls }}
```tool
{{- range .ToolCalls }}
{"name": "{{ .Function.Name }}", "parameters": {{ .Function.Arguments }}}
{{- end }}
```
{{- else }}
{{ .Content }}
{{- end }}{{ if not $last }}<end_of_turn>
{{ end }}
{{- else if eq .Role "tool" }}<start_of_turn>user
<tool_response>
{{ .Content }}
</tool_response><end_of_turn>
{{ if $last }}<start_of_turn>model
{{ end }}
{{- end }}
{{- end }}"""

$ ./ollama-tool.py --model gemma3n-tools --tool all --test
>>> what time is it?
calling get_datetime({'timezone_name': None})
The current time is Thursday, June 26, 2025 at 22:27.
>>> what is 10 ^ 1.34?
calling power({'x': '10', 'y': '1.34'})
21.87761623949553
>>> what operating system are you running on?
calling get_operating_system({})
I am running on Linux-6.8.0-47-generic-x86_64-with-glibc2.39.
>>> what is my ip address?
calling add({})
calling add({})
calling add({})
calling add({})
calling add({})
You are right to point that out. It seems I am stuck in a loop. I am unable to determine your IP address with the tools available. I am a language model and do not have the ability to directly access your IP address. I apologize for the repeated error.

To find your IP address, you would typically:

*   **Search on Google:** Simply search "what is my ip" on Google.
*   **Use a website:** Visit a website like [https://www.whatismyip.com/](https://www.whatismyip.com/)

I am limited in what I can do, and I am still under development. I appreciate you bringing this to my attention.




>>> what is my location?
calling get_location({})
The location is:

*   **IP Address:** 
*   **Latitude:** 
*   **Longitude:** 
*   **Country:** Switzerland
*   **Region:** Zurich
*   **City:** Zurich

This information is based on the IP address lookup provided in the HTML response.
>>> What's the weather where I am?
calling get_weather({'city': 'Zurich', 'country': 'Switzerland', 'unit': 'unit'})
The weather in Zurich, Switzerland is mild. The current temperature is 20°C, but it feels like 20°C. The forecast for today is 29/15°C with wind from the South at 9 km/h. The humidity is 58%.

@rick-github commented on GitHub (Jun 26, 2025): As I suspected, it's not a great tool user, but it actually did better than I expected. One thing I noticed is that it gets noticeably slower as the context increases, so perhaps not good for multi-turn tools calls. ````dockerfile FROM gemma3n:latest TEMPLATE """{{- if or .System .Tools }}<start_of_turn>user {{ .System }} {{- if .Tools }} You can use these tools to help answer the user's question: {{- range .Tools }} {{ . }} {{- end }} When you need to use a tool, format your response as JSON as follows: ```tool {"name": "tool_name", "parameters": {"param1": "value1", "param2": "value2"}} ``` {{- end }}<end_of_turn> <start_of_turn>model I understand. <end_of_turn> {{- end }} {{- range $i, $_ := .Messages }} {{- $last := eq (len (slice $.Messages $i)) 1 }} {{- if eq .Role "user" }}<start_of_turn>user {{ .Content }}<end_of_turn> {{ if $last }}<start_of_turn>model {{ end }} {{- else if eq .Role "assistant" }}<start_of_turn>model {{- if .ToolCalls }} ```tool {{- range .ToolCalls }} {"name": "{{ .Function.Name }}", "parameters": {{ .Function.Arguments }}} {{- end }} ``` {{- else }} {{ .Content }} {{- end }}{{ if not $last }}<end_of_turn> {{ end }} {{- else if eq .Role "tool" }}<start_of_turn>user <tool_response> {{ .Content }} </tool_response><end_of_turn> {{ if $last }}<start_of_turn>model {{ end }} {{- end }} {{- end }}""" ```` ```console $ ./ollama-tool.py --model gemma3n-tools --tool all --test >>> what time is it? calling get_datetime({'timezone_name': None}) The current time is Thursday, June 26, 2025 at 22:27. >>> what is 10 ^ 1.34? calling power({'x': '10', 'y': '1.34'}) 21.87761623949553 >>> what operating system are you running on? calling get_operating_system({}) I am running on Linux-6.8.0-47-generic-x86_64-with-glibc2.39. >>> what is my ip address? calling add({}) calling add({}) calling add({}) calling add({}) calling add({}) You are right to point that out. It seems I am stuck in a loop. I am unable to determine your IP address with the tools available. I am a language model and do not have the ability to directly access your IP address. I apologize for the repeated error. To find your IP address, you would typically: * **Search on Google:** Simply search "what is my ip" on Google. * **Use a website:** Visit a website like [https://www.whatismyip.com/](https://www.whatismyip.com/) I am limited in what I can do, and I am still under development. I appreciate you bringing this to my attention. >>> what is my location? calling get_location({}) The location is: * **IP Address:** * **Latitude:** * **Longitude:** * **Country:** Switzerland * **Region:** Zurich * **City:** Zurich This information is based on the IP address lookup provided in the HTML response. >>> What's the weather where I am? calling get_weather({'city': 'Zurich', 'country': 'Switzerland', 'unit': 'unit'}) The weather in Zurich, Switzerland is mild. The current temperature is 20°C, but it feels like 20°C. The forecast for today is 29/15°C with wind from the South at 9 km/h. The humidity is 58%. ```

GiteaMirror commented

2026-04-22 14:43:02 -05:00

@quaintdev commented on GitHub (Jul 8, 2025):

Any update on this? Gemma 3n on mobile is fantastic. See below.

@quaintdev commented on GitHub (Jul 8, 2025): Any update on this? Gemma 3n on mobile is fantastic. See below. <img width="1080" height="2400" alt="Image" src="https://github.com/user-attachments/assets/33072912-fc8c-4194-bc3a-36c85a476137" />

GiteaMirror commented

2026-04-22 14:43:03 -05:00

@KopfKrieg commented on GitHub (Jul 8, 2025):

What App is that 👀

@KopfKrieg commented on GitHub (Jul 8, 2025): What App is that 👀

GiteaMirror commented

2026-04-22 14:43:04 -05:00

@quaintdev commented on GitHub (Jul 8, 2025):

It's Google's AI Edge Gallery app. Check it out https://github.com/google-ai-edge/gallery

@quaintdev commented on GitHub (Jul 8, 2025): It's Google's AI Edge Gallery app. Check it out https://github.com/google-ai-edge/gallery

GiteaMirror commented

2026-04-22 14:43:04 -05:00

@barnabywalters commented on GitHub (Jul 15, 2025):

Are there any plans/roadmap to get gemma3n’s audio input capabilities working with ollama?

@barnabywalters commented on GitHub (Jul 15, 2025): Are there any plans/roadmap to get gemma3n’s audio input capabilities working with ollama?

GiteaMirror commented

2026-04-22 14:43:06 -05:00

@rick-github commented on GitHub (Jul 15, 2025):

Audio is a different modality to those currently supported by ollama. The developers have indicated adding other types of data input is on the roadmap. Whether that includes gemma3n remains to be seen.

@rick-github commented on GitHub (Jul 15, 2025): Audio is a different modality to those currently supported by ollama. The developers have indicated adding other types of data input is on the roadmap. Whether that includes gemma3n remains to be seen.

GiteaMirror commented

2026-04-22 14:43:06 -05:00

@jy1989 commented on GitHub (Jul 17, 2025):

I think it should be marked on the official website that multimodality is not supported. There is no error when identifying images, and I thought there was a bug with my program. 😂

@jy1989 commented on GitHub (Jul 17, 2025): I think it should be marked on the official website that multimodality is not supported. There is no error when identifying images, and I thought there was a bug with my program. 😂

GiteaMirror commented

2026-04-22 14:43:08 -05:00

@rick-github commented on GitHub (Jul 17, 2025):

The model as listed in the ollama library does not have the vision tag. When the model supports vision, the tag will be added.

@rick-github commented on GitHub (Jul 17, 2025): The model as listed in the ollama library does not have the `vision` tag. When the model supports vision, the tag will be added.

GiteaMirror commented

2026-04-22 14:43:09 -05:00

@luewolf commented on GitHub (Jul 17, 2025):

One of the main selling points of Gemma3n for me is the selective parameter activation technology to reduce resource requirements. Does Ollama support this feature yet?

@luewolf commented on GitHub (Jul 17, 2025): One of the main selling points of Gemma3n for me is the selective parameter activation technology to reduce resource requirements. Does Ollama support this feature yet?

GiteaMirror commented

2026-04-22 14:43:09 -05:00

@quaintdev commented on GitHub (Jul 18, 2025):

Google has created it's own runtime for models like Gemma3n. It's called LiteRT-LM and you can use it on desktop. I had raise issue with them regarding vision modality. They said it will be supported by end of this month

https://github.com/google-ai-edge/LiteRT-LM

@quaintdev commented on GitHub (Jul 18, 2025): Google has created it's own runtime for models like Gemma3n. It's called LiteRT-LM and you can use it on desktop. I had raise issue with them regarding vision modality. They said it will be supported by end of this month https://github.com/google-ai-edge/LiteRT-LM

GiteaMirror commented

2026-04-22 14:43:10 -05:00

@QtRoS commented on GitHub (Jul 18, 2025):

+1 to @luewolf

It's said that:

A single 4B model natively includes a 2B submodel, allowing you to dynamically trade off performance and quality on the fly

While their raw parameter count is 5B and 8B respectively, architectural innovations allow them to run with a memory footprint comparable to traditional 2B and 4B models, operating with as little as 2GB (E2B) and 3GB (E4B) of memory

It's there any chance that these options will be supported by Ollama?
Sources: one, two

@QtRoS commented on GitHub (Jul 18, 2025): +1 to @luewolf It's said that: > A single 4B model natively includes a 2B submodel, allowing you to dynamically trade off performance and quality on the fly > While their raw parameter count is 5B and 8B respectively, architectural innovations allow them to run with a memory footprint comparable to traditional 2B and 4B models, operating with as little as 2GB (E2B) and 3GB (E4B) of memory It's there any chance that these options will be supported by Ollama? Sources: [one](https://www.kaggle.com/competitions/google-gemma-3n-hackathon), [two](https://developers.googleblog.com/en/introducing-gemma-3n-developer-guide/)

GiteaMirror commented

2026-04-22 14:43:11 -05:00

@Larry-Gan commented on GitHub (Jul 25, 2025):

Dang kinda sucks that nobody supports gemma 3n multimodal anywhere lol, even google doesn't have it on ai studio...

@Larry-Gan commented on GitHub (Jul 25, 2025): Dang kinda sucks that nobody supports gemma 3n multimodal anywhere lol, even google doesn't have it on ai studio...

GiteaMirror commented

2026-04-22 14:43:11 -05:00

@bmachek commented on GitHub (Jul 25, 2025):

Dang kinda sucks that nobody supports gemma 3n multimodal anywhere lol, even google doesn't have it on ai studio...

I think LM Studio already got it done.

@bmachek commented on GitHub (Jul 25, 2025): > Dang kinda sucks that nobody supports gemma 3n multimodal anywhere lol, even google doesn't have it on ai studio... I think LM Studio already got it done.

GiteaMirror commented

2026-04-22 14:43:12 -05:00

@The-Best-Codes commented on GitHub (Jul 25, 2025):

@Larry-Gan @bmachek

The model is here on Ollama:
https://ollama.com/library/gemma3n

It works with llama.cpp as well.

@The-Best-Codes commented on GitHub (Jul 25, 2025): @Larry-Gan @bmachek The model is here on Ollama: https://ollama.com/library/gemma3n It works with llama.cpp as well.

GiteaMirror commented

2026-04-22 14:43:12 -05:00

@bmachek commented on GitHub (Jul 25, 2025):

@Larry-Gan @bmachek

The model is here on Ollama: https://ollama.com/library/gemma3n

It works with llama.cpp as well.

@The-Best-Codes But multimodal functionality is only there in LM Studio for now.

@bmachek commented on GitHub (Jul 25, 2025): > [@Larry-Gan](https://github.com/Larry-Gan) [@bmachek](https://github.com/bmachek) > > The model is here on Ollama: https://ollama.com/library/gemma3n > > It works with llama.cpp as well. @The-Best-Codes But multimodal functionality is only there in LM Studio for now.

GiteaMirror commented

2026-04-22 14:43:12 -05:00

@QtRoS commented on GitHub (Jul 25, 2025):

One of the main selling points of Gemma3n for me is the selective parameter activation technology to reduce resource requirements. Does Ollama support this feature yet?

Seems that's only about GPU memory requirements:

With Per-Layer Embeddings, you can use Gemma 3n E2B while only having ~2B parameters loaded in your accelerator.

And it already works in LLama.CPP: https://github.com/ggml-org/llama.cpp/pull/14400#issuecomment-3011838558

@QtRoS commented on GitHub (Jul 25, 2025): > One of the main selling points of Gemma3n for me is the selective parameter activation technology to reduce resource requirements. Does Ollama support this feature yet? Seems that's only about [GPU memory requirements](https://developers.googleblog.com/en/introducing-gemma-3n-developer-guide/): > With Per-Layer Embeddings, you can use Gemma 3n E2B while only having ~2B parameters loaded in your accelerator. And it already works in LLama.CPP: https://github.com/ggml-org/llama.cpp/pull/14400#issuecomment-3011838558

GiteaMirror commented

2026-04-22 14:43:13 -05:00

@Larry-Gan commented on GitHub (Aug 4, 2025):

@bmachek
Are you referring to this LM studio?
https://lmstudio.ai/models/google/gemma-3n-e4b

It's telling me:
GGUFs are currently text-only. We are working to expand capabilities and remove this limitation.

@Larry-Gan commented on GitHub (Aug 4, 2025): @bmachek Are you referring to this LM studio? https://lmstudio.ai/models/google/gemma-3n-e4b It's telling me: GGUFs are currently text-only. We are working to expand capabilities and remove this limitation.

GiteaMirror commented

2026-04-22 14:43:13 -05:00

@bmachek commented on GitHub (Aug 4, 2025):

@Larry-Gan I was using the MLX version with LM studio.

@bmachek commented on GitHub (Aug 4, 2025): @Larry-Gan I was using the MLX version with LM studio.

GiteaMirror commented

2026-04-22 14:43:14 -05:00

@curious-boy-007 commented on GitHub (Sep 14, 2025):

@diegovalenzuelaiturra @Larry-Gan
I notice there is a GGUF solution for Gemma3n multimodal input:

@curious-boy-007 commented on GitHub (Sep 14, 2025): @diegovalenzuelaiturra @Larry-Gan I notice there is a GGUF solution for Gemma3n multimodal input: - https://sdk.nexa.ai/model/Gemma3n-E4B - https://nexa.ai/blogs/gemma3n

GiteaMirror commented

2026-04-22 14:43:14 -05:00

@recrudesce commented on GitHub (Sep 16, 2025):

Gemma3n can be pulled via Ollama from the models library and works fine - I just tested it.

@recrudesce commented on GitHub (Sep 16, 2025): Gemma3n can be pulled via Ollama from the models library and works fine - I just tested it.

GiteaMirror commented

2026-04-22 14:43:15 -05:00

@Jeremy-Developer-Page commented on GitHub (Sep 17, 2025):

@recrudesce

Gemma3n can be pulled via Ollama from the models library and works fine - I just tested it.

Does it work well for image analysis too? Because I understand that it only works perfectly for text-only prompts.

@Jeremy-Developer-Page commented on GitHub (Sep 17, 2025): @recrudesce > Gemma3n can be pulled via Ollama from the models library and works fine - I just tested it. Does it work well for image analysis too? Because I understand that it only works perfectly for text-only prompts.

GiteaMirror commented

2026-04-22 14:43:15 -05:00

@Jeremy-Developer-Page commented on GitHub (Sep 17, 2025):

The problem in this case for the multimodal support of gemma3n and gemma3n-it is that the base for the image recognition is ImageNet, that if I've understood well isn't supported at this time from Ollama. So when a model is converted in gguf model from ollama it looses the vision capabilities but if the same model is converted in mlx-vlm have the vision functions. Now, since MLX is restricted to Mac users only, and since I have noticed that it has a performance degradation of a few tokens/s compared to the version running on Ollama (5 on average on a Mac Mini M4). I would say that the key to making it work is:

Understanding how to insert the ImageNet functionality into the Ollama engine.
Understanding why, from version 0.11.0 onwards, all multimodal (vision+text) model conversions do not work and lose the vision part.
(For this second point, I would ask for help to @dhiltgen , as you probably know more about it. Since I saw that he had released 0.11.0, then I don't know, so let me know and correct me if I'm wrong)

Thank you all in advance to all, and good luck in solving the problems and helping as many people as possible.

Jeremy

@Jeremy-Developer-Page commented on GitHub (Sep 17, 2025): The problem in this case for the multimodal support of `gemma3n` and `gemma3n-it` is that the base for the image recognition is ImageNet, that if I've understood well isn't supported at this time from Ollama. So when a model is converted in gguf model from ollama it looses the vision capabilities but if the same model is converted in mlx-vlm have the vision functions. Now, since MLX is restricted to Mac users only, and since I have noticed that it has a performance degradation of a few tokens/s compared to the version running on Ollama (5 on average on a Mac Mini M4). I would say that the key to making it work is: 1. Understanding how to insert the ImageNet functionality into the Ollama engine. 2. Understanding why, from version 0.11.0 onwards, all multimodal (vision+text) model conversions do not work and lose the vision part. (For this second point, I would ask for help to @dhiltgen , as you probably know more about it. Since I saw that he had released 0.11.0, then I don't know, so let me know and correct me if I'm wrong) Thank you all in advance to all, and good luck in solving the problems and helping as many people as possible. Jeremy

GiteaMirror commented

2026-04-22 14:43:16 -05:00

@pdevine commented on GitHub (Sep 18, 2025):

@Jeremy-Developer-Page for ImageNet/MobileNet it would need to be implemented in /model/models/gemma3n. @mxyng I think has a partial implementation.

For the second point, it was a little tricky to pin down, but I figured out that ollama convert for gemma3 was leaving all of the image weights as BrainFloat16 instead of converting them to FP16. Unfortunately on metal the GGML library is missing a BrainFloat16 kernel for the im2col function (it's only implemented in FP32 and FP16) so after converting the weights it will blow up. I have a fix for the conversion (#12324) which is now in main and should be fixed in the next release.

@pdevine commented on GitHub (Sep 18, 2025): @Jeremy-Developer-Page for ImageNet/MobileNet it would need to be implemented in `/model/models/gemma3n`. @mxyng I think has a partial implementation. For the second point, it was a little tricky to pin down, but I figured out that `ollama convert` for gemma3 was leaving all of the image weights as BrainFloat16 instead of converting them to FP16. Unfortunately on metal the GGML library is missing a BrainFloat16 kernel for the `im2col` function (it's only implemented in FP32 and FP16) so after converting the weights it will blow up. I have a fix for the conversion (#12324) which is now in `main` and should be fixed in the next release.

GiteaMirror commented

2026-04-22 14:43:16 -05:00

@Jeremy-Developer-Page commented on GitHub (Sep 19, 2025):

Okay, so @pdevine and @mxyng, the files to create and configure would be “gemma3n/model vision.go,” “ gemma3n/process image.go,” and “ gemma3n/embed.go.” Am I correct, or is it different since the gemma3n model is particularly different from the standard gemma3? But this is the first time I've dealt with these files, which is why I'm asking.

@Jeremy-Developer-Page commented on GitHub (Sep 19, 2025): Okay, so @pdevine and @mxyng, the files to create and configure would be “gemma3n/model vision.go,” “ gemma3n/process image.go,” and “ gemma3n/embed.go.” Am I correct, or is it different since the gemma3n model is particularly different from the standard gemma3? But this is the first time I've dealt with these files, which is why I'm asking.

GiteaMirror commented

2026-04-22 14:43:17 -05:00

@pdevine commented on GitHub (Sep 19, 2025):

embed.go includes the forward pass for generating embeddings for the embeddinggemma model. gemma3n/model_vision.go would need to be implemented to include the forward pass for MobileNet-V5 which is different than gemma3 which implements SigLip.

You can find the white paper for gemma3 here, and more about MobileNet-V5 here.

@pdevine commented on GitHub (Sep 19, 2025): `embed.go` includes the forward pass for generating embeddings for the `embeddinggemma` model. `gemma3n/model_vision.go` would need to be implemented to include the forward pass for MobileNet-V5 which is different than gemma3 which implements SigLip. You can find the white paper for gemma3 [here](https://arxiv.org/pdf/2503.19786), and more about MobileNet-V5 [here](https://developers.googleblog.com/en/introducing-gemma-3n-developer-guide/#mobilenet-v5:-new-state-of-the-art-vision-encoder).

GiteaMirror commented

2026-04-22 14:43:18 -05:00

@Jeremy-Developer-Page commented on GitHub (Sep 20, 2025):

I tried to implement that part, but with poor results, since I can't get the program to use the vision model.

@Jeremy-Developer-Page commented on GitHub (Sep 20, 2025): I tried to implement that part, but with poor results, since I can't get the program to use the vision model.

GiteaMirror commented

2026-04-22 14:43:18 -05:00

@pdevine commented on GitHub (Sep 20, 2025):

@Jeremy-Developer-Page yes it's not an easy implementation. SigLip and MobileNet are quite different from each other.

@pdevine commented on GitHub (Sep 20, 2025): @Jeremy-Developer-Page yes it's not an easy implementation. SigLip and MobileNet are quite different from each other.

GiteaMirror commented

2026-04-22 14:43:19 -05:00

@Jeremy-Developer-Page commented on GitHub (Sep 23, 2025):

Hi people, there are some news from other devs?

@Jeremy-Developer-Page commented on GitHub (Sep 23, 2025): Hi people, there are some news from other devs?

GiteaMirror commented

2026-04-22 14:43:20 -05:00

@Android-Artisan commented on GitHub (Oct 17, 2025):

Vision is a work in progress

@mxyng any news

@Android-Artisan commented on GitHub (Oct 17, 2025): > Vision is a work in progress @mxyng any news

GiteaMirror commented

2026-04-22 14:43:20 -05:00

@chllei commented on GitHub (Jan 9, 2026):

Are there any updates regarding this model’s multimodal and tool-calling capabilities?

@chllei commented on GitHub (Jan 9, 2026): Are there any updates regarding this model’s multimodal and tool-calling capabilities?

GiteaMirror commented

2026-04-22 14:43:21 -05:00

@reedmayhew18 commented on GitHub (Feb 15, 2026):

Checking in on multi-modal support for both images and audio on this model. Thank you!

If there's any specific areas that you are stuck on with this process that you could use help on, please let me/us know.

@reedmayhew18 commented on GitHub (Feb 15, 2026): Checking in on multi-modal support for both images and audio on this model. Thank you! If there's any specific areas that you are stuck on with this process that you could use help on, please let me/us know.

Sign in to join this conversation.

Branches Tags

main

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-remove-claude-desktop-launch

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#32845