[GH-ISSUE #8083] Curious about image generation and underlying support #53658

Closed
opened 2026-05-05 15:05:45 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @jtslear on GitHub (Dec 26, 2024).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/8083

Feature Request

Is your feature request related to a problem? Please describe.
No. Just a curiosity really. I'd like to know what the future path to image generation is for Open WebUI. I find the current method of configuration a tad limiting but I see a few problems we'd want to solve before attempting to go down a target path that enables most users to be happy with the potential outcome of what they may use Open WebUI.

Describe the solution you'd like
Open WebUI allows an admin to configure whatever supported image generator they prefer. Via the use of Pipelines, or Functions or Tools then extends any of those implementations to achieve what the user is attempting to accomplish such as iteration of a target image or expanding their generated content further.

Describe alternatives you've considered
No Open WebUI.

Currently the support for image generation is very limiting. There are quite a few tools, currently supported to some extent:

We've got open issues to add another tool, Foocus

I'd argue there should be an issue to add MidJourney into the mix as well!

But the issue here is not the generation of images (we already know this works), but the capability to provide customized capabilities within each of these tools. Think about MidJourney as an example. One could ask for a set of images, choose a few and iterate until a user gets to their final desired state. The problem I see with Open WebUI is that there is not clear way to implement something similar. Instead, we generate one image, then tweak our prompt, maybe toy with the settings that are provided by the tool, try again. But we don't know all the many combinations of potential capabilities of a given image generator, nor do we know what the user is going to ask next. And since each tool is going to have its own API and its own capabilities, I think the current setup is going to styme the future of Open WebUI and image generation. Professionals will likely stick to their tool of choice directly for very good reason. But if you are that in-between person like me, and maybe repeats a lot of tasks, I'd then argue, Open WebUI can provide such a middle ground.

If I wanted what I have seen from MidJourney, I'd likely leverage ComfyUI, using the many workflows and customize those workflows to my needs to achieve what I'm after. MidJourney has made this easy, starting off in Discord, later rocking out their own WebUI with the same, potentially more features. How can we do the same with Open WebUI? Currently, all of what I'm about to describe below would likely be handled only inside of ComfyUI using a large workflow, or multiple.

Would Pipelines be the answer to this? Let's continue to use ComfyUI as my target and use MidJourney as the case study.

  1. Ask Open WebUI for an image - this triggers workflow0 - a custom workflow developed by the end user - This is currently the only possible support that I see today.
  2. Ask Open WebUI to tweak the image from the above step - this is not possible, there's no such thing as an image input that triggers workflow1 in ComfyUI. Instead the user must take the result of step 1, loading it into ComfyUI using workflow1 and continuing from there.
  3. Ask Open WebUI to generate an image and later converting that image into a video. Again, this is not possible as we can't ask for different workflows. In this case, it'd be something like image + text outputting a video, but we'd also need to tell ComfyUI about workflow3.
  4. Ask Open WebUI to generate a 3D model and later asking for minor tweaks, or export for specific file types. Certainly not today, due to limitations described in bullet point 3.

Could a Pipeline/Function provide a means to enable the user to respond in a way in which they can manipulate the image in some way, potentially going further and asking for a target workflow?

The downside I see with all of this is that now we are making the assumption that users are keen with two tools. Setting up Open WebUI, plus diving deep into the internals of their image generation tool as well. Hence this issue. Where does Open WebUI want to stop versus provide expansion. And if that expansion is interesting, how could we go about it?

Originally created by @jtslear on GitHub (Dec 26, 2024). Original GitHub issue: https://github.com/open-webui/open-webui/issues/8083 # Feature Request **Is your feature request related to a problem? Please describe.** No. Just a curiosity really. I'd like to know what the future path to image generation is for Open WebUI. I find the current method of configuration a tad limiting but I see a few problems we'd want to solve before attempting to go down a target path that enables most users to be happy with the potential outcome of what they may use Open WebUI. **Describe the solution you'd like** Open WebUI allows an admin to configure whatever supported image generator they prefer. Via the use of Pipelines, or Functions or Tools then extends any of those implementations to achieve what the user is attempting to accomplish such as iteration of a target image or expanding their generated content further. **Describe alternatives you've considered** No Open WebUI. Currently the support for image generation is very limiting. There are quite a few tools, currently supported to _some_ extent: * [AUTOMATIC1111](https://github.com/AUTOMATIC1111) * [ComfyUI](https://github.com/comfyanonymous/ComfyUI) * OpenAI We've got open issues to add another tool, [Foocus](https://github.com/open-webui/open-webui/issues/2648) I'd argue there should be an issue to add MidJourney into the mix as well! But the issue here is not the generation of images (we already know this works), but the capability to provide customized capabilities within each of these tools. Think about MidJourney as an example. One could ask for a set of images, choose a few and iterate until a user gets to their final desired state. The problem I see with Open WebUI is that there is not clear way to implement something similar. Instead, we generate one image, then tweak our prompt, maybe toy with the settings that are provided by the tool, try again. But we don't know all the many combinations of potential capabilities of a given image generator, nor do we know what the user is going to ask next. And since each tool is going to have its own API and its own capabilities, I think the current setup is going to styme the future of Open WebUI and image generation. Professionals will likely stick to their tool of choice directly for very good reason. But if you are that in-between person like me, and maybe repeats a lot of tasks, I'd then argue, Open WebUI can provide such a middle ground. If I wanted what I have seen from MidJourney, I'd likely leverage ComfyUI, using the many workflows and customize those workflows to my needs to achieve what I'm after. MidJourney has made this easy, starting off in Discord, later rocking out their own WebUI with the same, potentially more features. How can we do the same with Open WebUI? Currently, all of what I'm about to describe below would likely be handled _only_ inside of ComfyUI using a large workflow, or multiple. Would Pipelines be the answer to this? Let's continue to use ComfyUI as my target and use MidJourney as the case study. 1. Ask Open WebUI for an image - this triggers `workflow0` - a custom workflow developed by the end user - This is currently the only possible support that I see today. 2. Ask Open WebUI to tweak the image from the above step - this is not possible, there's no such thing as an image input that triggers `workflow1` in ComfyUI. Instead the user must take the result of step 1, loading it into ComfyUI using `workflow1` and continuing from there. 3. Ask Open WebUI to generate an image and later converting that image into a video. Again, this is not possible as we can't ask for different workflows. In this case, it'd be something like image + text outputting a video, but we'd also need to tell ComfyUI about `workflow3`. 4. Ask Open WebUI to generate a 3D model and later asking for minor tweaks, or export for specific file types. Certainly not today, due to limitations described in bullet point 3. Could a Pipeline/Function provide a means to enable the user to respond in a way in which they can manipulate the image in some way, potentially going further and asking for a target workflow? The downside I see with all of this is that now we are making the assumption that users are keen with two tools. Setting up Open WebUI, plus diving deep into the internals of their image generation tool as well. Hence this issue. Where does Open WebUI want to stop versus provide expansion. And if that expansion is interesting, how could we go about it?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#53658