[GH-ISSUE #17466] issue: LLMs such as Gemma3 unable to interpret images #56963

Closed
opened 2026-05-05 20:19:33 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @batcheej on GitHub (Sep 15, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/17466

Check Existing Issues

  • I have searched for any existing and/or related issues.
  • I have searched for any existing and/or related discussions.
  • I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.6.28 (latest)

Ollama Version (if applicable)

ollama version is 0.11.4

Operating System

RHEL 8

Browser (if applicable)

No response

Confirmation

  • I have read and followed all instructions in README.md.
  • I am using the latest version of both Open WebUI and Ollama.
  • I have included the browser console logs.
  • I have included the Docker container logs.
  • I have provided every relevant configuration, setting, and environment variable used in my setup.
  • I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
  • I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
  • Start with the initial platform/version/OS and dependencies used,
  • Specify exact install/launch/configure commands,
  • List URLs visited, user input (incl. example values/emails/passwords if needed),
  • Describe all options and toggles enabled or changed,
  • Include any files or environmental changes,
  • Identify the expected and actual result at each stage,
  • Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

Drag and drop an image file into the chat window. Ask a LLM capable of interpreting images ie. Gemma3 "Describe this image" . Expect a text response describing the image.

Actual Behavior

Drag and drop an image file into the chat window. Ask a LLM capable of interpreting images ie. Gemma3 "Describe this image" . Receive (500, 'text input must be of type str (single example), list[str] (batch or single pretokenized example) or list[list[str]] (batch of pretokenized examples).')

I have tested version v0.6.26 and this issue does not exist. Believe image files are not being correctly "passed" to the LLM.

Steps to Reproduce

  1. docker pull ghcr.io/open-webui/open-webui:main
  2. Run v0.6.28 (latest) in a Docker container.
  3. Ensure Ollama is installed with an LLM that can understand the content of images.
  4. perform steps in expected/actual behavior above.

Logs & Screenshots

OWUI-1757957561852.log

Additional Information

No response

Originally created by @batcheej on GitHub (Sep 15, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/17466 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.6.28 (latest) ### Ollama Version (if applicable) ollama version is 0.11.4 ### Operating System RHEL 8 ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior Drag and drop an image file into the chat window. Ask a LLM capable of interpreting images ie. Gemma3 "Describe this image" . Expect a text response describing the image. ### Actual Behavior Drag and drop an image file into the chat window. Ask a LLM capable of interpreting images ie. Gemma3 "Describe this image" . Receive (500, 'text input must be of type `str` (single example), `list[str]` (batch or single pretokenized example) or `list[list[str]]` (batch of pretokenized examples).') I have tested version v0.6.26 and this issue does not exist. Believe image files are not being correctly "passed" to the LLM. ### Steps to Reproduce 1. docker pull ghcr.io/open-webui/open-webui:main 2. Run v0.6.28 (latest) in a Docker container. 3. Ensure Ollama is installed with an LLM that can understand the content of images. 4. perform steps in expected/actual behavior above. ### Logs & Screenshots [OWUI-1757957561852.log](https://github.com/user-attachments/files/22348327/OWUI-1757957561852.log) ### Additional Information _No response_
GiteaMirror added the bug label 2026-05-05 20:19:33 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#56963