issue: Multimodal models cannot recognize larger-sized images #6107

Closed
opened 2025-11-11 16:45:05 -06:00 by GiteaMirror · 1 comment
Owner

Originally created by @AXuanCreator on GitHub (Aug 15, 2025).

Check Existing Issues

  • I have searched the existing issues and discussions.
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.6.22

Ollama Version (if applicable)

No response

Operating System

Windows 11

Browser (if applicable)

No response

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

In multimodal model conversations, large-sized images should be recognized

Actual Behavior

In reality, no content will be output, while smaller-sized images can be recognized normally
Using the image compression in the settings is still ineffective
In the official online inference provided by the model, images can be recognized normally
Tested on

  • Gemini 2.0 flash
  • Gemini 2.5 flash lite
  • Doubao seed 1.6
    Image size: 2560 * 1440

Steps to Reproduce

copy the image to [dialog]
enter "summarize the image content" and press Enter

Logs & Screenshots

Image

origin image:

Image

Additional Information

No response

Originally created by @AXuanCreator on GitHub (Aug 15, 2025). ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.6.22 ### Ollama Version (if applicable) _No response_ ### Operating System Windows 11 ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior In multimodal model conversations, large-sized images should be recognized ### Actual Behavior In reality, no content will be output, while smaller-sized images can be recognized normally Using the image compression in the settings is still ineffective In the official online inference provided by the model, images can be recognized normally Tested on * Gemini 2.0 flash * Gemini 2.5 flash lite * Doubao seed 1.6 Image size: 2560 * 1440 ### Steps to Reproduce copy the image to [dialog] enter "summarize the image content" and press Enter ### Logs & Screenshots <img width="2131" height="1558" alt="Image" src="https://github.com/user-attachments/assets/fee555af-78d8-48f2-9311-e48ddc33c4d4" /> origin image: <img width="2560" height="1440" alt="Image" src="https://github.com/user-attachments/assets/00cd7ab2-345c-4d57-aaa1-2563b1dfc6b5" /> ### Additional Information _No response_
GiteaMirror added the bug label 2025-11-11 16:45:05 -06:00
Author
Owner

@tjbck commented on GitHub (Aug 16, 2025):

Model inference issue.

@tjbck commented on GitHub (Aug 16, 2025): Model inference issue.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#6107