mirror of
https://github.com/open-webui/open-webui.git
synced 2026-03-22 06:02:06 -05:00
feat: Comfy UI Improvements to support Keywords, which opens the door for Audio and Video Generation. #5362
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @digitalassassins on GitHub (May 28, 2025).
Check Existing Issues
Problem Description
The Comfy UI implementation is very restrictive. It only allows us to choose a default model and size from within the options, and if you want to change the model or the size, you must go into the settings and change it manually.
This isn't good for the admin, and it's impossible for a user to do anything other than generate a single image using the default model selected in the options at the time of image generation. This is very restrictive.
Desired Solution you'd like
I already messed about with the code and wrote my own implementation of converting keywords in the chat response into workflows and it works very well and unlocks unlimited possibilities.
https://github.com/open-webui/open-webui/discussions/14130
Just for proof of concept, I used local text files within the Docker container. but it works brilliantly.
What I Did:
in the file
middleware.pythe function
python chat_image_generation_handlerI added
on line 480
That adds the original user response to the end of the AI-generated response, if they have the AI image prompt generation setting turned on.
in the file
comfyui.pyI included the filecomfyuicat.pyThen added this change to
comfyui.pyon line 124
This runs the
get_workflow_based_on_prompt()function, and if it catches a keyword and returns a workflow, it uses the workflow attached to the keyword. If it doesn't, it returns null and proceeds with the default workflow in OpenWebUI.then the
comfyuicat.pyfile I used:This detects if the [OM:] original message is sent then matches against it then strips the original message and adds they AI generated message to the workflow. If the standard message is sent and not generated by the AI. It uses the user message.
Matches each word against keywords that trigger the loading of specific workflows based on keywords.
If the user sends
seed:followed by a number, this regex captures the seed from the message, updates the seed and then removes the seed from the prompt, allowing inline chat seed generation.Alternatives Considered
No response
Additional Context
Why would this be beneficial? This allows the user to set workflows based on specific keywords e.g. if the user sends the keyword Anime. This loads a workflow with
aamXLAnimeMix_v10.safetensorsmodel, with the correct image size, sampler, scheduler, and VAE already set up for that model.For instance, I have it set up so if a user types
as quickly as possibleorturboit loads in a workflow that usesSDXL-Turbo.safetensorsimage size 512x512 and the sampler euler with an upscaler, as the images fromSDXL-Turboare too small.If the user types
photographyorphoto realistic, this auto loads the workflow withJuggernaut-XL_v9_RunDiffusionPhoto_v2.safetensorsand the samplerdpmpp_3m_sdeand the schedulerkarrasthis model isn't very good with faces so the workflow has loras forBetter Eyes V3andBetter FacesandBetter SkinAll this is impossible with the way the workflows load currently in OpenWebUI, but with keyword-to-workflow matching, it allows unlimited possibilities.
Also Ive tested by adding the keywords
audioandvideoand loading in an audio/video workflow. This works and Comfy UI can generate audio and video, but OpenWebUI isn't waiting for a video or audio file to be returned, so it fails. But it should be possible.. You could then copy and paste the keyword capture code and add a section to the settings in OpenWebUI to allow audio and video generation via Comfy UI.I'll upload a zip file of the changes, with an installer for Docker, so you can try it out in a Docker container of v0.6.10
@digitalassassins commented on GitHub (May 28, 2025):
Here is the code:
comfy-ui-category.zip
@digitalassassins commented on GitHub (May 28, 2025):
Then you could have something like:
@silenceroom commented on GitHub (May 28, 2025):
This looks awesome.
@bmabir17 commented on GitHub (Jun 13, 2025):
I can understand that you are using prompt's keyword to choose which workflow file should be used to generate image/video.
But each workflow also requires certain input and output that needs to be mapped and openWebUi needs to know those before sending any generation requests to comfyUI. How are you handing that?
From my experience, multiple nodes can have
inputs/outputskey in them. So how would the openwebUI figure out which is the actual nodes thatinputsvalue needs to be set andoutputsneeds to be extracted from?@digitalassassins commented on GitHub (Jun 13, 2025):
Yes that's what the whole point of the post was for to demonstrate that comfy UI can be passed workflows as part of keyword requests.
As shown in the screenshots you would need buttons adding to the UI to be able to know which filetypes to be listening for. This isn't implemented but was the whole premise of my post. That with a few changes in the code base.
E.g.
incorporate my code into the ComfyUI file in open Web UI to send different workflows based on keywords.
developers would need to add sections to the settings page for video and audio. So OpenWebUI would be listening for the correct file format returned from comfy. incorporate a HTML5 player for audio and video files in the chat.
(the bit that is missing)
Open Web UI when retrieving the file from Comfy, bypassed Ollama and posts the response in the window. You can tell this because AI models "say here you go, here is your image" then describe a completely different image to the one returned from Comfy..
So it would be the same premise. Except embedding HTML5 media instead of just an image.
I haven't implemented Audio and Video into Open Web UI. Just the keyword to workflow generator which makes it super convenient and supercharges the image generation ability in open Web UI.
I would edit the code and submit it as a pull request but I'm finishing off an AI powered plugin for Calibre eBook software for personal use to supercharge my RAG performance and building plugins for Tdarr so a little too busy.
Just put the idea out there, incase others wanted to take the code and implement it officially.
If you check in the settings page it already asks for the input id numbers from your comfy JSON to know which one is the correct input node.. the output in the API is sent automatically as a Post request, so OpenWebUI just downloads the file sent back and then does a check on the file type.. currently it only wants images. So discards any video or audio..
But that could be changed..