feat: Comfy UI Improvements to support Keywords, which opens the door for Audio and Video Generation. #5362

Open
opened 2025-11-11 16:18:46 -06:00 by GiteaMirror · 5 comments
Owner

Originally created by @digitalassassins on GitHub (May 28, 2025).

Check Existing Issues

  • I have searched the existing issues and discussions.

Problem Description

The Comfy UI implementation is very restrictive. It only allows us to choose a default model and size from within the options, and if you want to change the model or the size, you must go into the settings and change it manually.
This isn't good for the admin, and it's impossible for a user to do anything other than generate a single image using the default model selected in the options at the time of image generation. This is very restrictive.

Desired Solution you'd like

I already messed about with the code and wrote my own implementation of converting keywords in the chat response into workflows and it works very well and unlocks unlimited possibilities.

https://github.com/open-webui/open-webui/discussions/14130

Just for proof of concept, I used local text files within the Docker container. but it works brilliantly.

What I Did:

in the file middleware.py

the function python chat_image_generation_handler

I added

try:
                bracket_start = response.find("{")
                bracket_end = response.rfind("}") + 1

                if bracket_start == -1 or bracket_end == -1:
                    raise Exception("No JSON object found in the response")

                response = response[bracket_start:bracket_end]
                response = json.loads(response)
                prompt = response.get("prompt", [])
                if request.app.state.config.IMAGE_GENERATION_ENGINE == "comfyui" and isinstance(prompt, str):
                    prompt = prompt + ' [OM:' + user_message + ']'

on line 480

That adds the original user response to the end of the AI-generated response, if they have the AI image prompt generation setting turned on.

in the file comfyui.py I included the file comfyuicat.py

Then added this change to comfyui.py


cat_workflow = get_workflow_based_on_prompt(payload.prompt)
    if cat_workflow != "" and cat_workflow != None:
        workflow = cat_workflow
    else:
        workflow = json.loads(payload.workflow.workflow)

on line 124

This runs the get_workflow_based_on_prompt() function, and if it catches a keyword and returns a workflow, it uses the workflow attached to the keyword. If it doesn't, it returns null and proceeds with the default workflow in OpenWebUI.

then the comfyuicat.py file I used:


import re
from os import listdir
from os.path import isfile, join

class ComfyUICategoryWorkflow:
    def __init__(self, name, keywords, prompt_sculpting, prompt_id):
        self.name = name
        self.keywords = keywords
        self.sculpting = prompt_sculpting
        self.pid = prompt_id
        self.workflow = name + ".json"
    
def fetch_category_data():
    # set an empty model dictionary
    categories = []
    
    keyword_files_path = "/app/backend/open_webui/utils/images/keywords/"
    keywordfiles = [f for f in listdir(keyword_files_path) if isfile(join(keyword_files_path, f))]
    
    for kfilename in keywordfiles:
        # first get the name without extension
        cp_filename = kfilename.replace(".txt","")
        with open(keyword_files_path + kfilename) as file_data:
            keyword_data = file_data.read()
            if keyword_data != "" and '|' in keyword_data:
                # now break up the data from the file
                keyword_data = keyword_data.split("|")
                if len(keyword_data) == 3:
                    cp_keywords = keyword_data[0].split(",")
                    cp_sculpting = keyword_data[1]
                    cp_pid = keyword_data[2]
                    categories.append( ComfyUICategoryWorkflow(cp_filename,cp_keywords,cp_sculpting,cp_pid) )
                    
    return categories

def seed_from_prompt(prompt: str):
    
    # see if we have the seed in the prompt
    seed_from_prompt = re.search(r"seed:[0-9]*[ |,]", prompt)
    if seed_from_prompt:
        seed_from_prompt = seed_from_prompt.group(0)
        # we have the seed from the prompt lets remove the seed from the prompt
        prompt = prompt.replace(seed_from_prompt, "")
        seed_from_prompt = seed_from_prompt.replace(",","")
        seed_from_prompt = seed_from_prompt.replace("seed:","")
        seed_from_prompt = seed_from_prompt.strip()
        # convert to integer
        seed_from_prompt = int(seed_from_prompt)
        if seed_from_prompt > 0 and seed_from_prompt < 1125899906842624:
            # number in range update
            seed = int(seed_from_prompt)
        else:
            # number not in range
            seed = random.randint(0, 1125899906842624)
    else:
        seed = random.randint(0, 1125899906842624)
    
    return prompt, seed

def get_workflow_based_on_prompt(prompt):
    
    # lets see if this is ai generated or user submitted
    original_prompt = re.search(r"\[OM:.*\]", prompt)
    if original_prompt:
        # this prompt has been generated by the AI and the original prompt added as additional payload
        original_prompt = original_prompt.group(0)
        # strip the original prompt from the ai generated prompt
        prompt = prompt.replace(original_prompt,"")
        # now strip the container
        original_prompt = original_prompt.replace("[OM:","")
        original_prompt = original_prompt.replace("]","")
        
        print("Original Prompt:" + original_prompt)
    else:
        original_prompt = prompt;
    
    print("Workflow Input Prompt:" + prompt)
    
    categories = fetch_category_data()
    workflow_files_path = "/app/backend/open_webui/utils/images/workflows/"
    ## loop through
    for category in categories:
        if re.findall(r"(?=(\b" + '|'.join(category.keywords).strip() + r"\b))", original_prompt.lower()):
            if isfile(workflow_files_path + category.workflow):
                print("Workflow Triggered:" + category.name)
                print("Keywords:" + ",".join(category.keywords))
                with open(workflow_files_path + category.workflow) as file_data:
                    wfdata = json.load(file_data)                
                    # change the seed from the prompt
                    prompt, seed = seed_from_prompt(prompt)
                    for node in wfdata:
                        for nodeinputs in wfdata[node]["inputs"]:
                            if nodeinputs == "seed":
                                wfdata[node]["inputs"][nodeinputs] = seed
                                
                    # update the prompt in the json
                    if str(category.pid) in wfdata:
                        wfdata[str(category.pid)]["inputs"]["text"] = prompt + ' ' + category.sculpting
                        
                    return wfdata
        
    return

This detects if the [OM:] original message is sent then matches against it then strips the original message and adds they AI generated message to the workflow. If the standard message is sent and not generated by the AI. It uses the user message.

Matches each word against keywords that trigger the loading of specific workflows based on keywords.

If the user sends seed: followed by a number, this regex captures the seed from the message, updates the seed and then removes the seed from the prompt, allowing inline chat seed generation.

Alternatives Considered

No response

Additional Context

Why would this be beneficial? This allows the user to set workflows based on specific keywords e.g. if the user sends the keyword Anime. This loads a workflow with aamXLAnimeMix_v10.safetensors model, with the correct image size, sampler, scheduler, and VAE already set up for that model.

For instance, I have it set up so if a user types as quickly as possible or turbo it loads in a workflow that uses SDXL-Turbo.safetensors image size 512x512 and the sampler euler with an upscaler, as the images from SDXL-Turbo are too small.

If the user types photography or photo realistic, this auto loads the workflow with Juggernaut-XL_v9_RunDiffusionPhoto_v2.safetensors and the sampler dpmpp_3m_sde and the scheduler karras this model isn't very good with faces so the workflow has loras for Better Eyes V3 and Better Faces and Better Skin

All this is impossible with the way the workflows load currently in OpenWebUI, but with keyword-to-workflow matching, it allows unlimited possibilities.

Also Ive tested by adding the keywords audio and video and loading in an audio/video workflow. This works and Comfy UI can generate audio and video, but OpenWebUI isn't waiting for a video or audio file to be returned, so it fails. But it should be possible.. You could then copy and paste the keyword capture code and add a section to the settings in OpenWebUI to allow audio and video generation via Comfy UI.

I'll upload a zip file of the changes, with an installer for Docker, so you can try it out in a Docker container of v0.6.10

Originally created by @digitalassassins on GitHub (May 28, 2025). ### Check Existing Issues - [x] I have searched the existing issues and discussions. ### Problem Description The Comfy UI implementation is very restrictive. It only allows us to choose a default model and size from within the options, and if you want to change the model or the size, you must go into the settings and change it manually. This isn't good for the admin, and it's impossible for a user to do anything other than generate a single image using the default model selected in the options at the time of image generation. This is very restrictive. ### Desired Solution you'd like I already messed about with the code and wrote my own implementation of converting keywords in the chat response into workflows and it works very well and unlocks unlimited possibilities. https://github.com/open-webui/open-webui/discussions/14130 Just for proof of concept, I used local text files within the Docker container. but it works brilliantly. What I Did: in the file `middleware.py` the function ```python chat_image_generation_handler``` I added ```python try: bracket_start = response.find("{") bracket_end = response.rfind("}") + 1 if bracket_start == -1 or bracket_end == -1: raise Exception("No JSON object found in the response") response = response[bracket_start:bracket_end] response = json.loads(response) prompt = response.get("prompt", []) if request.app.state.config.IMAGE_GENERATION_ENGINE == "comfyui" and isinstance(prompt, str): prompt = prompt + ' [OM:' + user_message + ']' ``` on line 480 That adds the original user response to the end of the AI-generated response, if they have the AI image prompt generation setting turned on. in the file `comfyui.py` I included the file `comfyuicat.py` Then added this change to `comfyui.py` ```python cat_workflow = get_workflow_based_on_prompt(payload.prompt) if cat_workflow != "" and cat_workflow != None: workflow = cat_workflow else: workflow = json.loads(payload.workflow.workflow) ``` on line 124 This runs the `get_workflow_based_on_prompt()` function, and if it catches a keyword and returns a workflow, it uses the workflow attached to the keyword. If it doesn't, it returns null and proceeds with the default workflow in OpenWebUI. then the `comfyuicat.py` file I used: ```python import re from os import listdir from os.path import isfile, join class ComfyUICategoryWorkflow: def __init__(self, name, keywords, prompt_sculpting, prompt_id): self.name = name self.keywords = keywords self.sculpting = prompt_sculpting self.pid = prompt_id self.workflow = name + ".json" def fetch_category_data(): # set an empty model dictionary categories = [] keyword_files_path = "/app/backend/open_webui/utils/images/keywords/" keywordfiles = [f for f in listdir(keyword_files_path) if isfile(join(keyword_files_path, f))] for kfilename in keywordfiles: # first get the name without extension cp_filename = kfilename.replace(".txt","") with open(keyword_files_path + kfilename) as file_data: keyword_data = file_data.read() if keyword_data != "" and '|' in keyword_data: # now break up the data from the file keyword_data = keyword_data.split("|") if len(keyword_data) == 3: cp_keywords = keyword_data[0].split(",") cp_sculpting = keyword_data[1] cp_pid = keyword_data[2] categories.append( ComfyUICategoryWorkflow(cp_filename,cp_keywords,cp_sculpting,cp_pid) ) return categories def seed_from_prompt(prompt: str): # see if we have the seed in the prompt seed_from_prompt = re.search(r"seed:[0-9]*[ |,]", prompt) if seed_from_prompt: seed_from_prompt = seed_from_prompt.group(0) # we have the seed from the prompt lets remove the seed from the prompt prompt = prompt.replace(seed_from_prompt, "") seed_from_prompt = seed_from_prompt.replace(",","") seed_from_prompt = seed_from_prompt.replace("seed:","") seed_from_prompt = seed_from_prompt.strip() # convert to integer seed_from_prompt = int(seed_from_prompt) if seed_from_prompt > 0 and seed_from_prompt < 1125899906842624: # number in range update seed = int(seed_from_prompt) else: # number not in range seed = random.randint(0, 1125899906842624) else: seed = random.randint(0, 1125899906842624) return prompt, seed def get_workflow_based_on_prompt(prompt): # lets see if this is ai generated or user submitted original_prompt = re.search(r"\[OM:.*\]", prompt) if original_prompt: # this prompt has been generated by the AI and the original prompt added as additional payload original_prompt = original_prompt.group(0) # strip the original prompt from the ai generated prompt prompt = prompt.replace(original_prompt,"") # now strip the container original_prompt = original_prompt.replace("[OM:","") original_prompt = original_prompt.replace("]","") print("Original Prompt:" + original_prompt) else: original_prompt = prompt; print("Workflow Input Prompt:" + prompt) categories = fetch_category_data() workflow_files_path = "/app/backend/open_webui/utils/images/workflows/" ## loop through for category in categories: if re.findall(r"(?=(\b" + '|'.join(category.keywords).strip() + r"\b))", original_prompt.lower()): if isfile(workflow_files_path + category.workflow): print("Workflow Triggered:" + category.name) print("Keywords:" + ",".join(category.keywords)) with open(workflow_files_path + category.workflow) as file_data: wfdata = json.load(file_data) # change the seed from the prompt prompt, seed = seed_from_prompt(prompt) for node in wfdata: for nodeinputs in wfdata[node]["inputs"]: if nodeinputs == "seed": wfdata[node]["inputs"][nodeinputs] = seed # update the prompt in the json if str(category.pid) in wfdata: wfdata[str(category.pid)]["inputs"]["text"] = prompt + ' ' + category.sculpting return wfdata return ``` This detects if the [OM:] original message is sent then matches against it then strips the original message and adds they AI generated message to the workflow. If the standard message is sent and not generated by the AI. It uses the user message. Matches each word against keywords that trigger the loading of specific workflows based on keywords. If the user sends `seed:` followed by a number, this regex captures the seed from the message, updates the seed and then removes the seed from the prompt, allowing inline chat seed generation. ### Alternatives Considered _No response_ ### Additional Context Why would this be beneficial? This allows the user to set workflows based on specific keywords e.g. if the user sends the keyword Anime. This loads a workflow with `aamXLAnimeMix_v10.safetensors` model, with the correct image size, sampler, scheduler, and VAE already set up for that model. For instance, I have it set up so if a user types `as quickly as possible` or `turbo` it loads in a workflow that uses `SDXL-Turbo.safetensors` image size 512x512 and the sampler euler with an upscaler, as the images from `SDXL-Turbo ` are too small. If the user types `photography` or `photo realistic`, this auto loads the workflow with `Juggernaut-XL_v9_RunDiffusionPhoto_v2.safetensors` and the sampler `dpmpp_3m_sde` and the scheduler `karras` this model isn't very good with faces so the workflow has loras for `Better Eyes V3` and `Better Faces` and `Better Skin` All this is impossible with the way the workflows load currently in OpenWebUI, but with keyword-to-workflow matching, it allows unlimited possibilities. Also Ive tested by adding the keywords `audio` and `video` and loading in an audio/video workflow. This works and Comfy UI can generate audio and video, but OpenWebUI isn't waiting for a video or audio file to be returned, so it fails. But it should be possible.. You could then copy and paste the keyword capture code and add a section to the settings in OpenWebUI to allow audio and video generation via Comfy UI. I'll upload a zip file of the changes, with an installer for Docker, so you can try it out in a Docker container of v0.6.10
Author
Owner

@digitalassassins commented on GitHub (May 28, 2025):

Here is the code:

comfy-ui-category.zip

@digitalassassins commented on GitHub (May 28, 2025): Here is the code: [comfy-ui-category.zip](https://github.com/user-attachments/files/20473296/comfy-ui-category.zip)
Author
Owner

@digitalassassins commented on GitHub (May 28, 2025):

Then you could have something like:

Image

@digitalassassins commented on GitHub (May 28, 2025): Then you could have something like: ![Image](https://github.com/user-attachments/assets/1541a4a9-55f1-4c5a-a9b0-e69b82adc865)
Author
Owner

@silenceroom commented on GitHub (May 28, 2025):

This looks awesome.

@silenceroom commented on GitHub (May 28, 2025): This looks awesome.
Author
Owner

@bmabir17 commented on GitHub (Jun 13, 2025):

I can understand that you are using prompt's keyword to choose which workflow file should be used to generate image/video.
But each workflow also requires certain input and output that needs to be mapped and openWebUi needs to know those before sending any generation requests to comfyUI. How are you handing that?
From my experience, multiple nodes can have inputs/outputs key in them. So how would the openwebUI figure out which is the actual nodes that inputs value needs to be set and outputs needs to be extracted from?

@bmabir17 commented on GitHub (Jun 13, 2025): I can understand that you are using prompt's keyword to choose which workflow file should be used to generate image/video. But each workflow also requires certain input and output that needs to be mapped and openWebUi needs to know those before sending any generation requests to comfyUI. How are you handing that? From my experience, multiple nodes can have `inputs`/`outputs` key in them. So how would the openwebUI figure out which is the actual nodes that `inputs` value needs to be set and `outputs` needs to be extracted from?
Author
Owner

@digitalassassins commented on GitHub (Jun 13, 2025):

Yes that's what the whole point of the post was for to demonstrate that comfy UI can be passed workflows as part of keyword requests.

As shown in the screenshots you would need buttons adding to the UI to be able to know which filetypes to be listening for. This isn't implemented but was the whole premise of my post. That with a few changes in the code base.

E.g.

  1. incorporate my code into the ComfyUI file in open Web UI to send different workflows based on keywords.

  2. developers would need to add sections to the settings page for video and audio. So OpenWebUI would be listening for the correct file format returned from comfy. incorporate a HTML5 player for audio and video files in the chat.
    (the bit that is missing)

Open Web UI when retrieving the file from Comfy, bypassed Ollama and posts the response in the window. You can tell this because AI models "say here you go, here is your image" then describe a completely different image to the one returned from Comfy..

So it would be the same premise. Except embedding HTML5 media instead of just an image.

I haven't implemented Audio and Video into Open Web UI. Just the keyword to workflow generator which makes it super convenient and supercharges the image generation ability in open Web UI.

I would edit the code and submit it as a pull request but I'm finishing off an AI powered plugin for Calibre eBook software for personal use to supercharge my RAG performance and building plugins for Tdarr so a little too busy.

Just put the idea out there, incase others wanted to take the code and implement it officially.

If you check in the settings page it already asks for the input id numbers from your comfy JSON to know which one is the correct input node.. the output in the API is sent automatically as a Post request, so OpenWebUI just downloads the file sent back and then does a check on the file type.. currently it only wants images. So discards any video or audio..

But that could be changed..

@digitalassassins commented on GitHub (Jun 13, 2025): Yes that's what the whole point of the post was for to demonstrate that comfy UI can be passed workflows as part of keyword requests. As shown in the screenshots you would need buttons adding to the UI to be able to know which filetypes to be listening for. This isn't implemented but was the whole premise of my post. That with a few changes in the code base. E.g. 1) incorporate my code into the ComfyUI file in open Web UI to send different workflows based on keywords. 2) developers would need to add sections to the settings page for video and audio. So OpenWebUI would be listening for the correct file format returned from comfy. incorporate a HTML5 player for audio and video files in the chat. (the bit that is missing) Open Web UI when retrieving the file from Comfy, bypassed Ollama and posts the response in the window. You can tell this because AI models "say here you go, here is your image" then describe a completely different image to the one returned from Comfy.. So it would be the same premise. Except embedding HTML5 media instead of just an image. I haven't implemented Audio and Video into Open Web UI. Just the keyword to workflow generator which makes it super convenient and supercharges the image generation ability in open Web UI. I would edit the code and submit it as a pull request but I'm finishing off an AI powered plugin for Calibre eBook software for personal use to supercharge my RAG performance and building plugins for Tdarr so a little too busy. Just put the idea out there, incase others wanted to take the code and implement it officially. If you check in the settings page it already asks for the input id numbers from your comfy JSON to know which one is the correct input node.. the output in the API is sent automatically as a Post request, so OpenWebUI just downloads the file sent back and then does a check on the file type.. currently it only wants images. So discards any video or audio.. But that could be changed..
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#5362