[GH-ISSUE #11715] feat: support multi-modal chat (both input and output) like Gemini #54995

New Issue

GiteaMirror · 2026-05-05T16:59:41-05:00

GiteaMirror commented

2026-05-05 16:59:41 -05:00

Originally created by @TissueC on GitHub (Mar 15, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/11715

Check Existing Issues

I have searched the existing issues and discussions.

Problem Description

Now the WebUI supports multi-modal input (e.g., text + image), like GPT-4o.
And WebUI also supports one-turn image generation like dall-e-3.
However, as the newest Gemini (e.g. Gemini 2.0 Flash Experimental) can natively understand and generate an image, Would you consider further supporting multi-turn multi-modal chats using Gemini?

Thanks a lot!

ref: https://ai.google.dev/gemini-api/docs/image-generation

Desired Solution you'd like

Support multimodal chats.

Alternatives Considered

No response

Additional Context

No response

Originally created by @TissueC on GitHub (Mar 15, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/11715 ### Check Existing Issues - [x] I have searched the existing issues and discussions. ### Problem Description Now the WebUI supports multi-modal input (e.g., text + image), like GPT-4o. And WebUI also supports one-turn image generation like dall-e-3. However, as the newest Gemini (e.g. Gemini 2.0 Flash Experimental) can natively understand and generate an image, Would you consider further supporting multi-turn multi-modal chats using Gemini? Thanks a lot! ref: https://ai.google.dev/gemini-api/docs/image-generation ### Desired Solution you'd like Support multimodal chats. ### Alternatives Considered _No response_ ### Additional Context _No response_

GiteaMirror closed this issue

2026-05-05 16:59:41 -05:00

GiteaMirror commented

2026-05-05 16:59:43 -05:00

@tjbck commented on GitHub (Mar 15, 2025):

Is supported.

@tjbck commented on GitHub (Mar 15, 2025): Is supported.

GiteaMirror commented

2026-05-05 16:59:45 -05:00

@TissueC commented on GitHub (Mar 17, 2025):

@tjbck May you elaborate a bit that how multimodal chats (in both input and output) are supported? I don't find such an implementation. Thank you a lot!

@TissueC commented on GitHub (Mar 17, 2025): @tjbck May you elaborate a bit that how multimodal chats (in both input and **output**) are supported? I don't find such an implementation. Thank you a lot!

GiteaMirror commented

2026-05-05 16:59:46 -05:00

@TissueC commented on GitHub (Mar 25, 2025):

I hope that this issue could be re-opened because the feature is NOT supported.
I am not asking for supporting Gemini but asking for supporting sending pictures by native abilities of LLMs (e.g. the newest Gemini, Gemini 2.0 Flash Experimental).

@TissueC commented on GitHub (Mar 25, 2025): I hope that this issue could be re-opened because the feature is NOT supported. I am not asking for supporting Gemini but asking for supporting sending pictures by native abilities of LLMs (e.g. the newest Gemini, Gemini 2.0 Flash Experimental).

GiteaMirror commented

2026-05-05 16:59:47 -05:00

@TissueC commented on GitHub (Mar 28, 2025):

As the newest GPT-4o with image generation feature emerges, this becomes more important.

@TissueC commented on GitHub (Mar 28, 2025): As the newest GPT-4o with image generation feature emerges, this becomes more important.

GiteaMirror commented

2026-05-05 16:59:48 -05:00

@tjbck commented on GitHub (Mar 28, 2025):

Gemini implementation is available as community Functions.

@tjbck commented on GitHub (Mar 28, 2025): Gemini implementation is available as community Functions.

GiteaMirror commented

2026-05-05 16:59:49 -05:00

@Classic298 commented on GitHub (Mar 28, 2025):

Gemini can also be implemented using LiteLLM, but is the multimodal output also supported by the API?

@Classic298 commented on GitHub (Mar 28, 2025): Gemini can also be implemented using LiteLLM, but is the multimodal output also supported by the API?

GiteaMirror commented

2026-05-05 16:59:50 -05:00

@TissueC commented on GitHub (Apr 9, 2025):

Gemini implementation is available as community Functions.

Thanks a lot! I found it in the community.
https://openwebui.com/f/jscheah/gemini_2_0_flash_native_image_gen

@TissueC commented on GitHub (Apr 9, 2025): > Gemini implementation is available as community Functions. Thanks a lot! I found it in the community. https://openwebui.com/f/jscheah/gemini_2_0_flash_native_image_gen

Sign in to join this conversation.

Branches Tags

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/open-webui#54995