mirror of
https://github.com/open-webui/open-webui.git
synced 2026-03-22 14:13:08 -05:00
Feature request: Selector for low, high, or auto fidelity image understanding in vision models #2040
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Jonseed on GitHub (Sep 9, 2024).
Is your feature request related to a problem? Please describe.
Currently you can only enable or disable vision capability for a model. But models like OpenAI's vision models can accept several different options for the fidelity of the image processed by the API:
low,high, orauto. If none is specified, then it defaults toauto, which looks at the image input size and decides if it should use theloworhighsetting.autois likely the default currently in Open WebUI, either explicitly coded or implicitly omitted, but I was not able to find the code in the repo to verify.Some users might prefer to use the
lowsetting, no matter how large the input image is in order to save tokens on the API, as this mode currently has a fixed cost of 85 tokens. Thehighsetting costs 85 tokens plus 512x512 tiles of the image, which adds at least another 170 tokens for one tile, and thus triples the cost (255 tokens, although ifautoworks as specified, a 512x512 image should be processed inlowmode). More likely, four tiles are needed for most images larger than 512x512, which adds 680 tokens, which is 9 times more expensive thanlowmode (765 tokens), and non-square image might need six tiles, for a total of 1,105 tokens for one image. For many use cases where high fidelity image understanding isn't needed,lowmode is likely sufficient, and can save users a lot on API costs for vision tasks.Describe the solution you'd like
Provide a selector in the Model Builder config screen, when enabling "Vision" for
low,high, orautoresolution, defaulting toauto. This is controlled by adding adetailparameter in the API call. Another option would be to provide an option to the user in the UI to specify directly when they add an image to a message whether they want it to be processed inlow,high, orauto(defaulting toautoor whatever is set in the model config), although this might add unnecessary clutter to the UI.Describe alternatives you've considered
Another option is the user can resize the image prior to adding it to a message. Assuming
autoworks as it should, a 512x512 image should be processed inlowmode. But this adds an additional inconvenient resizing step for the user for every image they want to use for vision tasks.Additional context

I'm thinking the option could be added here in the Model Builder, perhaps in a
Fidelitydropdown menu selector next toVision.