[GH-ISSUE #13717] Add image input (vision) support to OpenAI Responses API in Ollama #8993

Closed
opened 2026-04-12 21:49:16 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @MeetSolanki530 on GitHub (Jan 14, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/13717

Ollama has started supporting the OpenAI Responses API, which is great. However, image input (vision) support is currently missing.

The OpenAI Responses API allows sending multimodal inputs, including images (for example base64 images or image URLs), but Ollama only handles text inputs at the moment. Because of this, models that support vision cannot be used properly through Ollama.

What’s missing

  • Ability to pass image inputs in the input field (base64 or image URL)
  • Proper handling of input_image content type alongside text
  • Vision-capable models should receive and process images the same way as the OpenAI Responses API

Why this matters

  • Many workflows rely on image understanding (OCR, screenshots, diagrams, photos)
  • Developers want a drop-in compatible replacement for OpenAI’s Responses API
  • This blocks multimodal use cases even when the underlying model supports vision

I’ve attached a screenshot showing the current behavior and where image input is ignored.

Image

Adding image input support would make Ollama’s Responses API implementation much more complete and production-ready.

Thanks for the great work on Ollama.


Reproduction (Python)

from openai import OpenAI
import base64

client = OpenAI(
    base_url="http://localhost:11434/v1/",
    api_key="ollama",  
)

def encode_image(path):
    with open(path, "rb") as f:
        return base64.b64encode(f.read()).decode("utf-8")

image_base64 = encode_image("Untitled.png") #replace with any png file 

responses_result = client.responses.create(
    model="qwen3-vl:2b",
    input=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_text",
                    "text": "What is in this image?"
                },
                {
                    "type": "input_image",
                    "image_base64": image_base64
                }
            ]
        }
    ]
)

print(responses_result.output_text)

Actual response

I cannot directly view or analyze images. However, if you can describe the content
of the image in detail, I will try to help you answer questions.

Expected behavior

The model should receive and process the image input and return a description or analysis of the image, consistent with OpenAI’s Responses API behavior for vision-capable models.

Originally created by @MeetSolanki530 on GitHub (Jan 14, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/13717 Ollama has started supporting the OpenAI Responses API, which is great. However, **image input (vision) support is currently missing**. The OpenAI Responses API allows sending multimodal inputs, including images (for example base64 images or image URLs), but Ollama only handles text inputs at the moment. Because of this, models that support vision cannot be used properly through Ollama. ## What’s missing * Ability to pass image inputs in the `input` field (base64 or image URL) * Proper handling of `input_image` content type alongside text * Vision-capable models should receive and process images the same way as the OpenAI Responses API ## Why this matters * Many workflows rely on image understanding (OCR, screenshots, diagrams, photos) * Developers want a drop-in compatible replacement for OpenAI’s Responses API * This blocks multimodal use cases even when the underlying model supports vision I’ve attached a screenshot showing the current behavior and where image input is ignored. <img width="1866" height="983" alt="Image" src="https://github.com/user-attachments/assets/e06d2ac0-73bb-4e95-970d-d3c573a21562" /> Adding image input support would make Ollama’s Responses API implementation much more complete and production-ready. Thanks for the great work on Ollama. --- ## Reproduction (Python) ```python from openai import OpenAI import base64 client = OpenAI( base_url="http://localhost:11434/v1/", api_key="ollama", ) def encode_image(path): with open(path, "rb") as f: return base64.b64encode(f.read()).decode("utf-8") image_base64 = encode_image("Untitled.png") #replace with any png file responses_result = client.responses.create( model="qwen3-vl:2b", input=[ { "role": "user", "content": [ { "type": "input_text", "text": "What is in this image?" }, { "type": "input_image", "image_base64": image_base64 } ] } ] ) print(responses_result.output_text) ``` ## Actual response ```text I cannot directly view or analyze images. However, if you can describe the content of the image in detail, I will try to help you answer questions. ``` ## Expected behavior The model should receive and process the image input and return a description or analysis of the image, consistent with OpenAI’s Responses API behavior for vision-capable models.
GiteaMirror added the feature request label 2026-04-12 21:49:16 -05:00
Author
Owner

@rick-github commented on GitHub (Jan 14, 2026):

$ diff -u 13717.py.orig 13717.py
--- 13717.py.orig	2026-01-14 18:58:35.570124268 +0100
+++ 13717.py	2026-01-14 19:00:42.386173512 +0100
@@ -24,7 +24,7 @@
                 },
                 {
                     "type": "input_image",
-                    "image_base64": image_base64
+                    "image_url": f"data:image/png;base64,{image_base64}"
                 }
             ]
         }
<!-- gh-comment-id:3750907541 --> @rick-github commented on GitHub (Jan 14, 2026): ```diff $ diff -u 13717.py.orig 13717.py --- 13717.py.orig 2026-01-14 18:58:35.570124268 +0100 +++ 13717.py 2026-01-14 19:00:42.386173512 +0100 @@ -24,7 +24,7 @@ }, { "type": "input_image", - "image_base64": image_base64 + "image_url": f"data:image/png;base64,{image_base64}" } ] } ```
Author
Owner

@MeetSolanki530 commented on GitHub (Jan 15, 2026):

Yes its resolved now. thank you!

<!-- gh-comment-id:3752935700 --> @MeetSolanki530 commented on GitHub (Jan 15, 2026): Yes its resolved now. thank you!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#8993