mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-08 12:58:11 -05:00
[GH-ISSUE #11715] feat: support multi-modal chat (both input and output) like Gemini #54995
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @TissueC on GitHub (Mar 15, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/11715
Check Existing Issues
Problem Description
Now the WebUI supports multi-modal input (e.g., text + image), like GPT-4o.
And WebUI also supports one-turn image generation like dall-e-3.
However, as the newest Gemini (e.g. Gemini 2.0 Flash Experimental) can natively understand and generate an image, Would you consider further supporting multi-turn multi-modal chats using Gemini?
Thanks a lot!
ref: https://ai.google.dev/gemini-api/docs/image-generation
Desired Solution you'd like
Support multimodal chats.
Alternatives Considered
No response
Additional Context
No response
@tjbck commented on GitHub (Mar 15, 2025):
Is supported.
@TissueC commented on GitHub (Mar 17, 2025):
@tjbck May you elaborate a bit that how multimodal chats (in both input and output) are supported? I don't find such an implementation. Thank you a lot!
@TissueC commented on GitHub (Mar 25, 2025):
I hope that this issue could be re-opened because the feature is NOT supported.
I am not asking for supporting Gemini but asking for supporting sending pictures by native abilities of LLMs (e.g. the newest Gemini, Gemini 2.0 Flash Experimental).
@TissueC commented on GitHub (Mar 28, 2025):
As the newest GPT-4o with image generation feature emerges, this becomes more important.
@tjbck commented on GitHub (Mar 28, 2025):
Gemini implementation is available as community Functions.
@Classic298 commented on GitHub (Mar 28, 2025):
Gemini can also be implemented using LiteLLM, but is the multimodal output also supported by the API?
@TissueC commented on GitHub (Apr 9, 2025):
Thanks a lot! I found it in the community.
https://openwebui.com/f/jscheah/gemini_2_0_flash_native_image_gen