mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 11:28:35 -05:00
[GH-ISSUE #8083] Curious about image generation and underlying support #53658
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @jtslear on GitHub (Dec 26, 2024).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/8083
Feature Request
Is your feature request related to a problem? Please describe.
No. Just a curiosity really. I'd like to know what the future path to image generation is for Open WebUI. I find the current method of configuration a tad limiting but I see a few problems we'd want to solve before attempting to go down a target path that enables most users to be happy with the potential outcome of what they may use Open WebUI.
Describe the solution you'd like
Open WebUI allows an admin to configure whatever supported image generator they prefer. Via the use of Pipelines, or Functions or Tools then extends any of those implementations to achieve what the user is attempting to accomplish such as iteration of a target image or expanding their generated content further.
Describe alternatives you've considered
No Open WebUI.
Currently the support for image generation is very limiting. There are quite a few tools, currently supported to some extent:
We've got open issues to add another tool, Foocus
I'd argue there should be an issue to add MidJourney into the mix as well!
But the issue here is not the generation of images (we already know this works), but the capability to provide customized capabilities within each of these tools. Think about MidJourney as an example. One could ask for a set of images, choose a few and iterate until a user gets to their final desired state. The problem I see with Open WebUI is that there is not clear way to implement something similar. Instead, we generate one image, then tweak our prompt, maybe toy with the settings that are provided by the tool, try again. But we don't know all the many combinations of potential capabilities of a given image generator, nor do we know what the user is going to ask next. And since each tool is going to have its own API and its own capabilities, I think the current setup is going to styme the future of Open WebUI and image generation. Professionals will likely stick to their tool of choice directly for very good reason. But if you are that in-between person like me, and maybe repeats a lot of tasks, I'd then argue, Open WebUI can provide such a middle ground.
If I wanted what I have seen from MidJourney, I'd likely leverage ComfyUI, using the many workflows and customize those workflows to my needs to achieve what I'm after. MidJourney has made this easy, starting off in Discord, later rocking out their own WebUI with the same, potentially more features. How can we do the same with Open WebUI? Currently, all of what I'm about to describe below would likely be handled only inside of ComfyUI using a large workflow, or multiple.
Would Pipelines be the answer to this? Let's continue to use ComfyUI as my target and use MidJourney as the case study.
workflow0- a custom workflow developed by the end user - This is currently the only possible support that I see today.workflow1in ComfyUI. Instead the user must take the result of step 1, loading it into ComfyUI usingworkflow1and continuing from there.workflow3.Could a Pipeline/Function provide a means to enable the user to respond in a way in which they can manipulate the image in some way, potentially going further and asking for a target workflow?
The downside I see with all of this is that now we are making the assumption that users are keen with two tools. Setting up Open WebUI, plus diving deep into the internals of their image generation tool as well. Hence this issue. Where does Open WebUI want to stop versus provide expansion. And if that expansion is interesting, how could we go about it?