mirror of
https://github.com/open-webui/open-webui.git
synced 2026-05-07 03:18:23 -05:00
[GH-ISSUE #20099] feat: Need to Bypass Non-Multimodal LLM for ComfyUI Image Generation/Editing in open-webui #19085
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @freesunshine on GitHub (Dec 22, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/20099
Check Existing Issues
Verify Feature Scope
Problem Description
I have set up the Qwen 235b thinking model using vLLM, which is not a multimodal model and therefore cannot process images. To address this, I configured two instances of ComfyUI for image generation and image editing, respectively. The current issues are as follows:
Both image generation and editing are actually handled without involving Qwen. However, every time generation or editing finishes, Qwen outputs a lot of irrelevant text, forcing me to manually terminate it.
When I select the image generation feature and attempt to upload a local image as input for image editing, open-webui reports an error, stating that this is not a multimodal model, and the process cannot continue. This prevents me from editing locally uploaded images. In reality, the image editing functionality is provided by ComfyUI and has nothing to do with Qwen. Is it possible to bypass the LLM for both input and output when the image generation feature is selected?
Is this module community-contributed, so I should not open an issue here?
Desired Solution you'd like
Bypass the LLM for both input and output when the image generation feature is selected
Alternatives Considered
No response
Additional Context
No response
@owui-terminator[bot] commented on GitHub (Dec 22, 2025):
🔍 Similar Issues Found
I found some existing issues that might be related to this one. Please check if any of these are duplicates or contain helpful solutions:
#14431 feat: Comfy UI Improvements to support Keywords, which opens the door for Audio and Video Generation.
by digitalassassins • May 28, 2025
#18058 issue: handle thinking for Qwen3-VL models
by SlavikCA • Oct 05, 2025 •
bug#16645 issue: Multimodal models cannot recognize larger-sized images
by AXuanCreator • Aug 15, 2025 •
bug#18381 feat: Qwen3-Next reasoning support
by R3tr0ooo • Oct 17, 2025
💡 Tips:
This comment was generated automatically by a bot. Please react with a 👍 if this comment was helpful, or a 👎 if it was not.