[GH-ISSUE #11715] feat: support multi-modal chat (both input and output) like Gemini #54995

Closed
opened 2026-05-05 16:59:41 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @TissueC on GitHub (Mar 15, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/11715

Check Existing Issues

  • I have searched the existing issues and discussions.

Problem Description

Now the WebUI supports multi-modal input (e.g., text + image), like GPT-4o.
And WebUI also supports one-turn image generation like dall-e-3.
However, as the newest Gemini (e.g. Gemini 2.0 Flash Experimental) can natively understand and generate an image, Would you consider further supporting multi-turn multi-modal chats using Gemini?

Thanks a lot!

ref: https://ai.google.dev/gemini-api/docs/image-generation

Desired Solution you'd like

Support multimodal chats.

Alternatives Considered

No response

Additional Context

No response

Originally created by @TissueC on GitHub (Mar 15, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/11715 ### Check Existing Issues - [x] I have searched the existing issues and discussions. ### Problem Description Now the WebUI supports multi-modal input (e.g., text + image), like GPT-4o. And WebUI also supports one-turn image generation like dall-e-3. However, as the newest Gemini (e.g. Gemini 2.0 Flash Experimental) can natively understand and generate an image, Would you consider further supporting multi-turn multi-modal chats using Gemini? Thanks a lot! ref: https://ai.google.dev/gemini-api/docs/image-generation ### Desired Solution you'd like Support multimodal chats. ### Alternatives Considered _No response_ ### Additional Context _No response_
Author
Owner

@tjbck commented on GitHub (Mar 15, 2025):

Is supported.

<!-- gh-comment-id:2726464857 --> @tjbck commented on GitHub (Mar 15, 2025): Is supported.
Author
Owner

@TissueC commented on GitHub (Mar 17, 2025):

@tjbck May you elaborate a bit that how multimodal chats (in both input and output) are supported? I don't find such an implementation. Thank you a lot!

<!-- gh-comment-id:2729103258 --> @TissueC commented on GitHub (Mar 17, 2025): @tjbck May you elaborate a bit that how multimodal chats (in both input and **output**) are supported? I don't find such an implementation. Thank you a lot!
Author
Owner

@TissueC commented on GitHub (Mar 25, 2025):

I hope that this issue could be re-opened because the feature is NOT supported.
I am not asking for supporting Gemini but asking for supporting sending pictures by native abilities of LLMs (e.g. the newest Gemini, Gemini 2.0 Flash Experimental).

<!-- gh-comment-id:2750960867 --> @TissueC commented on GitHub (Mar 25, 2025): I hope that this issue could be re-opened because the feature is NOT supported. I am not asking for supporting Gemini but asking for supporting sending pictures by native abilities of LLMs (e.g. the newest Gemini, Gemini 2.0 Flash Experimental).
Author
Owner

@TissueC commented on GitHub (Mar 28, 2025):

As the newest GPT-4o with image generation feature emerges, this becomes more important.

<!-- gh-comment-id:2760043687 --> @TissueC commented on GitHub (Mar 28, 2025): As the newest GPT-4o with image generation feature emerges, this becomes more important.
Author
Owner

@tjbck commented on GitHub (Mar 28, 2025):

Gemini implementation is available as community Functions.

<!-- gh-comment-id:2760048824 --> @tjbck commented on GitHub (Mar 28, 2025): Gemini implementation is available as community Functions.
Author
Owner

@Classic298 commented on GitHub (Mar 28, 2025):

Gemini can also be implemented using LiteLLM, but is the multimodal output also supported by the API?

<!-- gh-comment-id:2760481793 --> @Classic298 commented on GitHub (Mar 28, 2025): Gemini can also be implemented using LiteLLM, but is the multimodal output also supported by the API?
Author
Owner

@TissueC commented on GitHub (Apr 9, 2025):

Gemini implementation is available as community Functions.

Thanks a lot! I found it in the community.
https://openwebui.com/f/jscheah/gemini_2_0_flash_native_image_gen

<!-- gh-comment-id:2788182107 --> @TissueC commented on GitHub (Apr 9, 2025): > Gemini implementation is available as community Functions. Thanks a lot! I found it in the community. https://openwebui.com/f/jscheah/gemini_2_0_flash_native_image_gen
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#54995