[GH-ISSUE #8026] OpenAI Chat Completion Client For Multimodal #51647

Closed
opened 2026-04-28 20:41:43 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @iejzh on GitHub (Dec 10, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/8026

Originally assigned to: @ParthSareen on GitHub.

Inconsistency with openai standard api parameters

response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image?",
        },
        {
          "type": "image_url",
          "image_url": {
            "url":  f"data:image/jpeg;base64,{base64_image}"
          },
        },
      ],
    }
  ],
)
Originally created by @iejzh on GitHub (Dec 10, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/8026 Originally assigned to: @ParthSareen on GitHub. Inconsistency with openai standard api parameters ```python response = client.chat.completions.create( model="gpt-4o-mini", messages=[ { "role": "user", "content": [ { "type": "text", "text": "What is in this image?", }, { "type": "image_url", "image_url": { "url": f"data:image/jpeg;base64,{base64_image}" }, }, ], } ], ) ```
GiteaMirror added the feature requestapi labels 2026-04-28 20:41:43 -05:00
Author
Owner

@rick-github commented on GitHub (Dec 10, 2024):

What is inconsistent?

<!-- gh-comment-id:2531627059 --> @rick-github commented on GitHub (Dec 10, 2024): What is inconsistent?
Author
Owner

@iejzh commented on GitHub (Dec 11, 2024):

OpenAI’s API supports input combining text and images using the following structure:

response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image?",
        },
        {
          "type": "image_url",
          "image_url": {
            "url": f"data:image/jpeg;base64,{base64_image}"
          },
        },
      ],
    }
  ],
)

The image URL is nested in a dictionary containing type and image_url.
However, the Ollama client uses a different format for handling similar requests:

response = client.chat.completions.create(
    model="llava",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAG0AAABmCAYAAADBPx+VAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAA3VSURBVHgB7Z27r0zdG..."
                },
            ],
        }
    ],
    max_tokens=300,
)

The image URL is nested directly as a string in the image_url field, with no additional nesting levels.

<!-- gh-comment-id:2533411404 --> @iejzh commented on GitHub (Dec 11, 2024): OpenAI’s API supports input combining text and images using the following structure: ``` response = client.chat.completions.create( model="gpt-4o-mini", messages=[ { "role": "user", "content": [ { "type": "text", "text": "What is in this image?", }, { "type": "image_url", "image_url": { "url": f"data:image/jpeg;base64,{base64_image}" }, }, ], } ], ) ``` The image URL is nested in a dictionary containing type and image_url. However, the Ollama client uses a different format for handling similar requests: ``` response = client.chat.completions.create( model="llava", messages=[ { "role": "user", "content": [ {"type": "text", "text": "What's in this image?"}, { "type": "image_url", "image_url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAG0AAABmCAYAAADBPx+VAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAA3VSURBVHgB7Z27r0zdG..." }, ], } ], max_tokens=300, ) ``` The image URL is nested directly as a string in the image_url field, with no additional nesting levels.
Author
Owner

@muzzlol commented on GitHub (Dec 24, 2024):

current implementation in openai/openai.go is designed to accommodate both the OpenAI and Ollama formats for image_url.

Code

The relevant code snippet (lines 409-419) is as follows:

case "image_url":
    var url string
    if urlMap, ok := data["image_url"].(map[string]any); ok {
        if url, ok = urlMap["url"].(string); !ok {
            return nil, errors.New("invalid message format")
        }
    } else {
        if url, ok = data["image_url"].(string); !ok {
            return nil, errors.New("invalid message format")
        }
    }

Accodmadating the Ollama format here is redundant, so I have made the requisite changes in order to remove this redundancy in #8232

<!-- gh-comment-id:2561231013 --> @muzzlol commented on GitHub (Dec 24, 2024): current implementation in `openai/openai.go` is designed to accommodate both the OpenAI and Ollama formats for `image_url`. ### Code The relevant code snippet (lines 409-419) is as follows: ```go case "image_url": var url string if urlMap, ok := data["image_url"].(map[string]any); ok { if url, ok = urlMap["url"].(string); !ok { return nil, errors.New("invalid message format") } } else { if url, ok = data["image_url"].(string); !ok { return nil, errors.New("invalid message format") } } ``` Accodmadating the Ollama format here is redundant, so I have made the requisite changes in order to remove this redundancy in #8232
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#51647