[GH-ISSUE #17466] issue: LLMs such as Gemma3 unable to interpret images

Originally created by @batcheej on GitHub (Sep 15, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/17466 ### Check Existing Issues - [x] I have searched for any existing and/or related issues. - [x] I have searched for any existing and/or related discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.6.28 (latest) ### Ollama Version (if applicable) ollama version is 0.11.4 ### Operating System RHEL 8 ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.** - [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc). - [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps: - Start with the initial platform/version/OS and dependencies used, - Specify exact install/launch/configure commands, - List URLs visited, user input (incl. example values/emails/passwords if needed), - Describe all options and toggles enabled or changed, - Include any files or environmental changes, - Identify the expected and actual result at each stage, - Ensure any reasonably skilled user can follow and hit the same issue. ### Expected Behavior Drag and drop an image file into the chat window. Ask a LLM capable of interpreting images ie. Gemma3 "Describe this image" . Expect a text response describing the image. ### Actual Behavior Drag and drop an image file into the chat window. Ask a LLM capable of interpreting images ie. Gemma3 "Describe this image" . Receive (500, 'text input must be of type `str` (single example), `list[str]` (batch or single pretokenized example) or `list[list[str]]` (batch of pretokenized examples).') I have tested version v0.6.26 and this issue does not exist. Believe image files are not being correctly "passed" to the LLM. ### Steps to Reproduce 1. docker pull ghcr.io/open-webui/open-webui:main 2. Run v0.6.28 (latest) in a Docker container. 3. Ensure Ollama is installed with an LLM that can understand the content of images. 4. perform steps in expected/actual behavior above. ### Logs & Screenshots [OWUI-1757957561852.log](https://github.com/user-attachments/files/22348327/OWUI-1757957561852.log) ### Additional Information _No response_

GiteaMirror commented

2026-05-05 20:19:33 -05:00

Owner

Originally created by @batcheej on GitHub (Sep 15, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/17466

Check Existing Issues

I have searched for any existing and/or related issues.
I have searched for any existing and/or related discussions.
I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.6.28 (latest)

Ollama Version (if applicable)

ollama version is 0.11.4

Operating System

RHEL 8

Browser (if applicable)

No response

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have provided every relevant configuration, setting, and environment variable used in my setup.
I have clearly listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
I have documented step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation. My steps:
Start with the initial platform/version/OS and dependencies used,
Specify exact install/launch/configure commands,
List URLs visited, user input (incl. example values/emails/passwords if needed),
Describe all options and toggles enabled or changed,
Include any files or environmental changes,
Identify the expected and actual result at each stage,
Ensure any reasonably skilled user can follow and hit the same issue.

Expected Behavior

Drag and drop an image file into the chat window. Ask a LLM capable of interpreting images ie. Gemma3 "Describe this image" . Expect a text response describing the image.

Actual Behavior

Drag and drop an image file into the chat window. Ask a LLM capable of interpreting images ie. Gemma3 "Describe this image" . Receive (500, 'text input must be of type str (single example), list[str] (batch or single pretokenized example) or list[list[str]] (batch of pretokenized examples).')

I have tested version v0.6.26 and this issue does not exist. Believe image files are not being correctly "passed" to the LLM.

Steps to Reproduce

docker pull ghcr.io/open-webui/open-webui:main
Run v0.6.28 (latest) in a Docker container.
Ensure Ollama is installed with an LLM that can understand the content of images.
perform steps in expected/actual behavior above.

Logs & Screenshots

OWUI-1757957561852.log

Additional Information

GiteaMirror added the bug label 2026-05-05 20:19:33 -05:00

GiteaMirror closed this issue

2026-05-05 20:19:34 -05:00

[GH-ISSUE #17466] issue: LLMs such as Gemma3 unable to interpret images #56963