[GH-ISSUE #11222] Gemma3n is not Multimodal #7394

Closed
opened 2026-04-12 19:28:52 -05:00 by GiteaMirror · 8 comments
Owner

Originally created by @Android-Artisan on GitHub (Jun 27, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11222

Originally assigned to: @mxyng on GitHub.

What is the issue?

Gemma3n is not Multimodal

Relevant log output


OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.9.3

Originally created by @Android-Artisan on GitHub (Jun 27, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11222 Originally assigned to: @mxyng on GitHub. ### What is the issue? Gemma3n is not Multimodal ### Relevant log output ```shell ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.9.3
GiteaMirror added the bug label 2026-04-12 19:28:52 -05:00
Author
Owner

@rick-github commented on GitHub (Jun 27, 2025):

Gemma3n on ollama currently only supports text generation.

<!-- gh-comment-id:3012439652 --> @rick-github commented on GitHub (Jun 27, 2025): Gemma3n on ollama currently only supports text generation.
Author
Owner

@ywythu commented on GitHub (Jun 27, 2025):

Gemma3n on ollama currently only supports text generation.

Is there a roadmap for image and audio support?

<!-- gh-comment-id:3012728248 --> @ywythu commented on GitHub (Jun 27, 2025): > Gemma3n on ollama currently only supports text generation. Is there a roadmap for image and audio support?
Author
Owner

@rick-github commented on GitHub (Jun 27, 2025):

https://github.com/ollama/ollama/issues/10792#issuecomment-3009619264

<!-- gh-comment-id:3012735601 --> @rick-github commented on GitHub (Jun 27, 2025): https://github.com/ollama/ollama/issues/10792#issuecomment-3009619264
Author
Owner

@freddyaboulton commented on GitHub (Jun 27, 2025):

Would love multimodal support!

<!-- gh-comment-id:3012965267 --> @freddyaboulton commented on GitHub (Jun 27, 2025): Would love multimodal support!
Author
Owner

@pdevine commented on GitHub (Jul 1, 2025):

Vision is close, but audio is a ways off. The vision model is based on mobilenet which is pretty significantly different than siglip (which is what gemma3 is based on) so it took a little while.

<!-- gh-comment-id:3025054995 --> @pdevine commented on GitHub (Jul 1, 2025): Vision is close, but audio is a ways off. The vision model is based on mobilenet which is pretty significantly different than siglip (which is what gemma3 is based on) so it took a little while.
Author
Owner

@Cherchercher commented on GitHub (Jul 4, 2025):

Is there a guide on implementing the audio component on ollama?

<!-- gh-comment-id:3037237692 --> @Cherchercher commented on GitHub (Jul 4, 2025): Is there a guide on implementing the audio component on ollama?
Author
Owner

@rick-github commented on GitHub (Jul 4, 2025):

Gemma3n on ollama currently only supports text generation.

<!-- gh-comment-id:3037239490 --> @rick-github commented on GitHub (Jul 4, 2025): Gemma3n on ollama currently only supports text generation.
Author
Owner

@Android-Artisan commented on GitHub (Oct 17, 2025):

any news?

<!-- gh-comment-id:3414418968 --> @Android-Artisan commented on GitHub (Oct 17, 2025): any news?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#7394