[GH-ISSUE #13136] issue: Multimodal API Fails to Recognize Base64-Encoded Image in Open WebUI v0.5.10 #16821

New Issue

GiteaMirror · 2026-04-19T22:38:47-05:00

GiteaMirror commented

2026-04-19 22:38:47 -05:00

Originally created by @ZimaBlueee on GitHub (Apr 22, 2025).
Original GitHub issue: https://github.com/open-webui/open-webui/issues/13136

Check Existing Issues

I have searched the existing issues and discussions.
I am using the latest version of Open WebUI.

Installation Method

Docker

Open WebUI Version

v0.5.10

Ollama Version (if applicable)

No response

Operating System

centos 7

Browser (if applicable)

No response

Confirmation

I have read and followed all instructions in README.md.
I am using the latest version of both Open WebUI and Ollama.
I have included the browser console logs.
I have included the Docker container logs.
I have listed steps to reproduce the bug in detail.

Expected Behavior

The API should analyze the provided base64-encoded image and return a description of its content (e.g., objects, text, or scene details) when prompted with "What’s in this image?".

Actual Behavior

The API responds with:

"I'd be happy to describe the image, but since it doesn't contain any text, I can't identify or transcribe oral words..."

This implies the model did not detect the uploaded image in the request, even though the image is included as a valid base64 payload.

Steps to Reproduce

Send a POST request to /api/chat/completions with the following CURL:

curl --location 'https://xxxx/api/chat/completions' \  
--header 'Content-Type: application/json' \  
--header 'Accept: application/json' \  
--header 'Authorization: Bearer sk-xxx' \  
--data '{  
    "model": "qwen2-vl-72b-32k",  
    "messages": [  
        {  
            "role": "user",  
            "content": [  
                {"type": "text", "text": "What’s in this image?"},  
                {"type": "image", "url": "data:image/jpeg;base64,/9j/4AAQSxxxxxxxxxxxxxx"}  
            ]  
        }  
    ],  
    "temperature": 1,  
    "stream": false  
}'

Observe the response: The model claims no image was uploaded.

Logs & Screenshots

{  
    "id": "chat6089cee6-1f63-11f0-b118-a662d58ecc00",  
    "object": "chat.completion",  
    "created": 1745317215,  
    "model": "qwen2-vl-72b-32k",  
    "choices": [  
        {  
            "index": 0,  
            "message": {  
                "role": "assistant",  
                "content": "I'd be happy to describe the image, but since it doesn't contain any text..."  
            },  
            "finish_reason": "stop"  
        }  
    ]  
}

Additional Information

The base64 string is shortened for readability (full string validated via online decoders).

No server-side error logs were provided by Open WebUI (ensure debug mode is enabled if available).

Originally created by @ZimaBlueee on GitHub (Apr 22, 2025). Original GitHub issue: https://github.com/open-webui/open-webui/issues/13136 ### Check Existing Issues - [x] I have searched the existing issues and discussions. - [x] I am using the latest version of Open WebUI. ### Installation Method Docker ### Open WebUI Version v0.5.10 ### Ollama Version (if applicable) _No response_ ### Operating System centos 7 ### Browser (if applicable) _No response_ ### Confirmation - [x] I have read and followed all instructions in `README.md`. - [x] I am using the latest version of **both** Open WebUI and Ollama. - [x] I have included the browser console logs. - [x] I have included the Docker container logs. - [x] I have listed steps to reproduce the bug in detail. ### Expected Behavior The API should analyze the provided base64-encoded image and return a description of its content (e.g., objects, text, or scene details) when prompted with "What’s in this image?". ### Actual Behavior The API responds with: > "I'd be happy to describe the image, but since it doesn't contain any text, I can't identify or transcribe oral words..." This implies the model did not detect the uploaded image in the request, even though the image is included as a valid base64 payload. ### Steps to Reproduce 1. Send a POST request to /api/chat/completions with the following CURL: ```bash curl --location 'https://xxxx/api/chat/completions' \ --header 'Content-Type: application/json' \ --header 'Accept: application/json' \ --header 'Authorization: Bearer sk-xxx' \ --data '{ "model": "qwen2-vl-72b-32k", "messages": [ { "role": "user", "content": [ {"type": "text", "text": "What’s in this image?"}, {"type": "image", "url": "data:image/jpeg;base64,/9j/4AAQSxxxxxxxxxxxxxx"} ] } ], "temperature": 1, "stream": false }' ``` 2. Observe the response: The model claims no image was uploaded. ### Logs & Screenshots ``` { "id": "chat6089cee6-1f63-11f0-b118-a662d58ecc00", "object": "chat.completion", "created": 1745317215, "model": "qwen2-vl-72b-32k", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "I'd be happy to describe the image, but since it doesn't contain any text..." }, "finish_reason": "stop" } ] } ``` ### Additional Information The base64 string is shortened for readability (full string validated via online decoders). No server-side error logs were provided by Open WebUI (ensure debug mode is enabled if available).

GiteaMirror added the bug label 2026-04-19 22:38:47 -05:00

GiteaMirror closed this issue