[GH-ISSUE #3690] Support vision models (image input) in OpenAI API chat completions #28032

Closed
opened 2026-04-22 05:45:40 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @slyt on GitHub (Apr 17, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3690

What are you trying to do?

I would like to use the Ollama implemented OpenAI chat completions API and OpenAI python client to ask question about images (e.g. llava multimodal model).

How should we solve this?

The official OpenAI API chat completions endpoint (/v1/chat/completions) supports sending images with the prompt using image_url:

If implemented in Ollama, this curl command should work to send images with prompts:

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llava",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What'\''s in this image?"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
            }
          }
        ]
      }
    ],
    "max_tokens": 300
  }'

Currently, the ollama server responds with the following error (thrown in the ollama/openai/openai.go code):

{"error":{"message":"json: cannot unmarshal array into Go struct field Message.messages.content of type string","type":"invalid_request_error","param":null,"code":null}}(env) 

What is the impact of not solving this?

Implementing this would help to make vision tools built on OpenAI API compatible with Ollama.

A workaround is to use Ollama Python client to send images or use the /api/generate endpoint as outlined on the ollama llava model page.

Example using Ollama Python client:

import ollama
import httpx

response = httpx.get('https://cdn.arstechnica.net/wp-content/uploads/2022/01/GettyImages-90790890-800x533.jpg')
response.raise_for_status()
image = response.content
response = ollama.generate(model='llava', prompt='Describe this photo in detail.', images=[image], keep_alive=0)
print(response["response"])

Anything else?

No response

Originally created by @slyt on GitHub (Apr 17, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3690 ### What are you trying to do? I would like to use the Ollama implemented OpenAI chat completions API and [OpenAI python client](https://github.com/openai/openai-python) to ask question about images (e.g. [llava](https://ollama.com/library/llava) multimodal model). ### How should we solve this? The official [OpenAI API chat completions](https://platform.openai.com/docs/guides/vision?lang=curl) endpoint (`/v1/chat/completions`) supports sending images with the prompt using `image_url`: If implemented in Ollama, this curl command should work to send images with prompts: ```sh curl http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "llava", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "What'\''s in this image?" }, { "type": "image_url", "image_url": { "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" } } ] } ], "max_tokens": 300 }' ``` Currently, the ollama server responds with the following error ([thrown in the ollama/openai/openai.go code](https://github.com/ollama/ollama/blob/9df6c85c3a51ce00d6a65be9dd8a06af07b24af5/openai/openai.go#L92)): ``` {"error":{"message":"json: cannot unmarshal array into Go struct field Message.messages.content of type string","type":"invalid_request_error","param":null,"code":null}}(env) ``` ### What is the impact of not solving this? Implementing this would help to make vision tools built on OpenAI API compatible with Ollama. A workaround is to use [Ollama Python client](https://github.com/ollama/ollama-python) to send images or use the `/api/generate` endpoint as outlined on the [ollama llava model page](https://ollama.com/library/llava). Example using Ollama Python client: ``` import ollama import httpx response = httpx.get('https://cdn.arstechnica.net/wp-content/uploads/2022/01/GettyImages-90790890-800x533.jpg') response.raise_for_status() image = response.content response = ollama.generate(model='llava', prompt='Describe this photo in detail.', images=[image], keep_alive=0) print(response["response"]) ``` ### Anything else? _No response_
Author
Owner

@unmotivatedgene commented on GitHub (May 7, 2024):

I too would appreciate the implementation of compatibility with the OpenAI api when it comes to vision. Personally I want it for moondream. https://github.com/ollama/ollama/blob/main/docs/openai.md#v1chatcompletions

The OpenAI api lets me keep things cross platform easily, OpenAI, Groq, LmStudio, Ollama

<!-- gh-comment-id:2097258685 --> @unmotivatedgene commented on GitHub (May 7, 2024): I too would appreciate the implementation of compatibility with the OpenAI api when it comes to vision. Personally I want it for moondream. https://github.com/ollama/ollama/blob/main/docs/openai.md#v1chatcompletions The OpenAI api lets me keep things cross platform easily, OpenAI, Groq, LmStudio, Ollama
Author
Owner

@vanpelt commented on GitHub (Jun 10, 2024):

I currently get TypeError: Object of type bytes is not JSON serializable when attempting to send a data uri in the request. Even if the feature only supported data uri's without grabbing arbitrary urls it would be a big improvement.

<!-- gh-comment-id:2157024240 --> @vanpelt commented on GitHub (Jun 10, 2024): ➕ I currently get `TypeError: Object of type bytes is not JSON serializable` when attempting to send a data uri in the request. Even if the feature only supported data uri's without grabbing arbitrary urls it would be a big improvement.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#28032