[GH-ISSUE #15727] Anthropic compatibility: image content blocks are dropped when forwarded to vision-capable cloud models (v0.21.0) #56539

Open
opened 2026-04-29 10:58:33 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @peter20201011-cmyk on GitHub (Apr 21, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15727

Bug description

When using Ollama's Anthropic-compatible endpoint (POST /v1/messages) with a vision-capable cloud model such as kimi-k2.6:cloud, image content blocks in the request are silently dropped before being forwarded to the model. The model only receives the surrounding text, so it responds as if no image was attached.

This breaks Claude Code (and any other Anthropic-API client) when combined with tools that send screenshots — e.g. the computer-use MCP server — since the model has no idea an image was ever sent.

Environment

  • Ollama: v0.21.0 (confirmed via /api/version)
  • Model: kimi-k2.6:cloudollama show reports Capabilities: vision, thinking, completion, tools
  • OS: Windows 11
  • Client: curl (identical behavior seen via Claude Code CLI)

Reproduction

curl -s http://localhost:11434/v1/messages \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "kimi-k2.6:cloud",
    "max_tokens": 200,
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "Describe this tiny test image in one sentence."},
        {"type": "image", "source": {"type": "base64", "media_type": "image/png",
         "data": "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mNkYAAAAAYAAjCB0C8AAAAASUVORK5CYII="}}
      ]
    }]
  }'

Expected

Model receives the image bytes and describes it (or at minimum reports back that it was received).

Actual

The model's thinking output says:

"looking at the input, I don't actually see an image - I see the text [Image: but no actual image content loaded"

usage.input_tokens is 23 — far below what a real image would add — confirming the base64 payload never reaches the model. The image block appears to be replaced by a text placeholder during the Anthropic → Ollama format conversion.

Notes

  • The same Kimi model works fine with images when used through Moonshot's own Anthropic endpoint, so the model itself is not the issue.
  • Ollama's own OpenAI-compatibility layer (/v1/chat/completions) handles images correctly for other vision models, which suggests the Anthropic layer's request translator is just missing the image → images[] mapping.
  • DeepWiki's Anthropic compatibility layer doc lists image blocks as a recognized type in MessagesRequest but does not indicate whether they are actively processed — this report confirms they are not.

Suggested fix

In the Anthropic compatibility translator, when a user message contains an image content block of type: "base64", the base64 data should be appended to the images array of the corresponding Ollama /api/chat message (or stored in the multimodal buffer for vision models), instead of being stringified into a [Image: placeholder.

Originally created by @peter20201011-cmyk on GitHub (Apr 21, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15727 ### Bug description When using Ollama's Anthropic-compatible endpoint (`POST /v1/messages`) with a vision-capable cloud model such as `kimi-k2.6:cloud`, `image` content blocks in the request are silently dropped before being forwarded to the model. The model only receives the surrounding text, so it responds as if no image was attached. This breaks Claude Code (and any other Anthropic-API client) when combined with tools that send screenshots — e.g. the computer-use MCP server — since the model has no idea an image was ever sent. ### Environment - Ollama: `v0.21.0` (confirmed via `/api/version`) - Model: `kimi-k2.6:cloud` — `ollama show` reports `Capabilities: vision, thinking, completion, tools` - OS: Windows 11 - Client: `curl` (identical behavior seen via Claude Code CLI) ### Reproduction ```bash curl -s http://localhost:11434/v1/messages \ -H "Content-Type: application/json" \ -H "anthropic-version: 2023-06-01" \ -d '{ "model": "kimi-k2.6:cloud", "max_tokens": 200, "messages": [{ "role": "user", "content": [ {"type": "text", "text": "Describe this tiny test image in one sentence."}, {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mNkYAAAAAYAAjCB0C8AAAAASUVORK5CYII="}} ] }] }' ``` ### Expected Model receives the image bytes and describes it (or at minimum reports back that it was received). ### Actual The model's thinking output says: > "looking at the input, I don't actually see an image - I see the text `[Image:` but no actual image content loaded" `usage.input_tokens` is `23` — far below what a real image would add — confirming the base64 payload never reaches the model. The image block appears to be replaced by a text placeholder during the Anthropic → Ollama format conversion. ### Notes - The same Kimi model works fine with images when used through Moonshot's own Anthropic endpoint, so the model itself is not the issue. - Ollama's own OpenAI-compatibility layer (`/v1/chat/completions`) handles images correctly for other vision models, which suggests the Anthropic layer's request translator is just missing the image → `images[]` mapping. - DeepWiki's Anthropic compatibility layer doc lists image blocks as a recognized type in `MessagesRequest` but does not indicate whether they are actively processed — this report confirms they are not. ### Suggested fix In the Anthropic compatibility translator, when a user message contains an `image` content block of `type: "base64"`, the base64 data should be appended to the `images` array of the corresponding Ollama `/api/chat` message (or stored in the multimodal buffer for vision models), instead of being stringified into a `[Image:` placeholder.
Author
Owner

@dani931004 commented on GitHub (Apr 21, 2026):

+1 — Confirmed on Linux as well, and I have comparative evidence.

Environment: Ollama latest (pulled today), kimi-k2.6:cloud, Ubuntu 24.04

Findings:

  • kimi-k2.6:cloud → Fails — responds "I don't see any image attached"
  • qwen3.5:cloud → Works — correctly describes shapes, colors, and numbers in the image
  • gemma4:31b-cloud → Works — same correct description

This rules out a client-side problem and confirms the issue is specific to Ollama's cloud proxy/translation layer for the Kimi endpoint.

Reproduction script (Python, uses /api/generate):

import base64, json, urllib.request
from PIL import Image, ImageDraw

# Create 200x100 test image: blue square with "42", green square with "99"
img = Image.new('RGB', (200, 100), 'white')
draw = ImageDraw.Draw(img)
draw.rectangle([10, 10, 90, 90], fill='blue')
draw.rectangle([110, 10, 190, 90], fill='green')
draw.text((25, 40), '42', fill='white')
draw.text((125, 40), '99', fill='white')

buf = io.BytesIO()
img.save(buf, format='PNG')
img_b64 = base64.b64encode(buf.getvalue()).decode()

data = json.dumps({
    'model': 'kimi-k2.6:cloud',
    'prompt': 'Describe this image exactly.',
    'images': [img_b64],
    'stream': False
}).encode()

req = urllib.request.Request(
    'http://localhost:11434/api/generate',
    data=data, headers={'Content-Type': 'application/json'}, method='POST'
)
# Response: "I don't see any image attached to your message."

Switching the model to qwen3.5:cloud or gemma4:31b-cloud with the exact same request yields the correct description.

<!-- gh-comment-id:4287641056 --> @dani931004 commented on GitHub (Apr 21, 2026): +1 — Confirmed on Linux as well, and I have comparative evidence. **Environment:** Ollama latest (pulled today), kimi-k2.6:cloud, Ubuntu 24.04 **Findings:** - kimi-k2.6:cloud → ❌ Fails — responds "I don't see any image attached" - qwen3.5:cloud → ✅ Works — correctly describes shapes, colors, and numbers in the image - gemma4:31b-cloud → ✅ Works — same correct description This rules out a client-side problem and confirms the issue is specific to Ollama's cloud proxy/translation layer for the Kimi endpoint. **Reproduction script (Python, uses /api/generate):** ```python import base64, json, urllib.request from PIL import Image, ImageDraw # Create 200x100 test image: blue square with "42", green square with "99" img = Image.new('RGB', (200, 100), 'white') draw = ImageDraw.Draw(img) draw.rectangle([10, 10, 90, 90], fill='blue') draw.rectangle([110, 10, 190, 90], fill='green') draw.text((25, 40), '42', fill='white') draw.text((125, 40), '99', fill='white') buf = io.BytesIO() img.save(buf, format='PNG') img_b64 = base64.b64encode(buf.getvalue()).decode() data = json.dumps({ 'model': 'kimi-k2.6:cloud', 'prompt': 'Describe this image exactly.', 'images': [img_b64], 'stream': False }).encode() req = urllib.request.Request( 'http://localhost:11434/api/generate', data=data, headers={'Content-Type': 'application/json'}, method='POST' ) # Response: "I don't see any image attached to your message." ``` Switching the model to qwen3.5:cloud or gemma4:31b-cloud with the exact same request yields the correct description.
Author
Owner

@peter20201011-cmyk commented on GitHub (Apr 21, 2026):

Update: cannot reproduce on v0.21.0 as of 2026-04-22 — may already be fixed server-side?

I retested on the same Windows 11 machine where I originally filed this, with the same Ollama version (/api/version0.21.0).

Test image: 200×100 PNG — a red square containing white text ABC next to a yellow square containing black text XYZ (chosen to make hallucination statistically implausible).

Request: identical shape to the repro in the original report — POST /v1/messages with anthropic-version: 2023-06-01, a text block plus an image block (source.type: base64, media_type: image/png).

Results (same endpoint, same client):

Model Input tokens Got the image?
kimi-k2.6:cloud 71 Correctly identifies both colors, both strings, and both text colors
kimi-k2.5:cloud 71 Same

Example response from kimi-k2.6:cloud:

"Left square: Red background, text "ABC" in white letters. Right square: Yellow background, text "XYZ" in black letters."

That's 6 independent facts correct — far beyond coincidence, so the base64 payload is clearly reaching the model now, unlike the original input_tokens=23 + [Image: placeholder behavior I reported.

Since these are :cloud models, a fix on the hosted proxy side wouldn't require a new client release, which would explain why the symptom disappeared without a version bump.

Question for maintainers: was the Anthropic → Ollama image block translation recently fixed on the cloud side? If yes, this issue can probably be closed. If no, something else changed and it'd be worth tracking down what, since the behavior is now the opposite of what I originally reported.

Happy to run any additional repro you'd like.

<!-- gh-comment-id:4290205023 --> @peter20201011-cmyk commented on GitHub (Apr 21, 2026): **Update: cannot reproduce on v0.21.0 as of 2026-04-22 — may already be fixed server-side?** I retested on the same Windows 11 machine where I originally filed this, with the same Ollama version (`/api/version` → `0.21.0`). **Test image:** 200×100 PNG — a red square containing white text `ABC` next to a yellow square containing black text `XYZ` (chosen to make hallucination statistically implausible). **Request:** identical shape to the repro in the original report — `POST /v1/messages` with `anthropic-version: 2023-06-01`, a `text` block plus an `image` block (`source.type: base64`, `media_type: image/png`). **Results (same endpoint, same client):** | Model | Input tokens | Got the image? | |---|---|---| | `kimi-k2.6:cloud` | 71 | ✅ Correctly identifies both colors, both strings, and both text colors | | `kimi-k2.5:cloud` | 71 | ✅ Same | Example response from `kimi-k2.6:cloud`: > "Left square: Red background, text **"ABC"** in white letters. Right square: Yellow background, text **"XYZ"** in black letters." That's 6 independent facts correct — far beyond coincidence, so the base64 payload is clearly reaching the model now, unlike the original `input_tokens=23` + `[Image:` placeholder behavior I reported. Since these are `:cloud` models, a fix on the hosted proxy side wouldn't require a new client release, which would explain why the symptom disappeared without a version bump. **Question for maintainers:** was the Anthropic → Ollama image block translation recently fixed on the cloud side? If yes, this issue can probably be closed. If no, something else changed and it'd be worth tracking down what, since the behavior is now the opposite of what I originally reported. Happy to run any additional repro you'd like.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#56539