[GH-ISSUE #15592] Huge difference in image input tokens with local Qwen3.5 versions when format="json" specified #72010

Open
opened 2026-05-05 03:18:26 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @fchahun on GitHub (Apr 14, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15592

What is the issue?

With local versions of Qwen3.5 (9b, 27b, 35b, 122b), using a blank prompt and a single image (attached) as input, I notice a huge difference in input tokens depending whether format is unspecified or assigned to "json" (prompt_eval_count increases from 432 to 1789).

This does not occur with the cloud version (397b-cloud) (prompt_eval_count remains unchanged at 432)

This does not occur either with other models such as gemma4 (prompt_eval_count slightly increases from 281 to 286)

This phenomenon can be easily reproduced using the attached image an the curl commands below.

What is the reason for that?

IMG=$(base64 < Image_001.jpeg | tr -d '\n')

echo "{\"model\": \"qwen3.5:27b\", \"messages\": [{\"role\": \"user\",  \"content\": \"\",\"images\": [\"$IMG\"] }],\"stream\": false}" | curl -X POST http://0.0.0.0:11434/api/chat -H "Content-Type:
 application/json" -d @-
{"model":"qwen3.5:27b","created_at":"2026-04-14T12:57:14.954993416Z","message":{"..."},"done":true,"done_reason":"stop","total_duration":160473406601,"load_duration":6219169752,"prompt_eval_count":432,"prompt_eval_duration":724455268,"eval_count":2003,"eval_duration":152220636272}


echo "{\"model\": \"qwen3.5:27b-q4_K_M\", \"messages\": [{\"role\": \"user\",  \"content\": \"\",\"images\": [\"$IMG\"] }],\"stream\": false, \"format\": \"json\"}" | curl -X POST http://0.0.0.0:11434/api/chat -H "Content-Type: application/json" -d @-
{"model":"qwen3.5:27b-q4_K_M","created_at":"2026-04-14T21:27:56.795489648Z","message":{"..."},"done":true,"done_reason":"stop","total_duration":156916654288,"load_duration":8222672080,"prompt_eval_count":1789,"prompt_eval_duration":2405578208,"eval_count":300,"eval_duration":26354827648}

echo "{\"model\": \"qwen3.5:397b-cloud\", \"messages\": [{\"role\": \"user\",  \"content\": \"\",\"images\": [\"$IMG\"] }],\"stream\": false}" | curl -X POST http://0.0.0.0:11434/api/chat -H "Content-Type: application/json" -H "Authorization: Bearer $OLLAMA_API_KEY" -d @-
{"model":"qwen3.5:397b","created_at":"2026-04-14T12:53:46.888392524Z","message":{"..."},"done":true,"done_reason":"stop","total_duration":15905907345,"prompt_eval_count":432,"eval_count":775}

echo "{\"model\": \"qwen3.5:397b-cloud\", \"messages\": [{\"role\": \"user\",  \"content\": \"\",\"images\": [\"$IMG\"] }],\"stream\": false, \"format\": \"json\"}" | curl -X POST http://0.0.0.0:11434/api/chat -H "Content-Type: application/json" -H "Authorization: Bearer "$OLLAMA_API_KEY -d @-
{"model":"qwen3.5:397b","created_at":"2026-04-14T21:51:49.78707483Z","message":{"..."},"done":true,"done_reason":"stop","total_duration":15994634694,"prompt_eval_count":432,"eval_count":889}

echo "{\"model\": \"gemma4:31b-it-q4_K_M\", \"messages\": [{\"role\": \"user\",  \"content\": \"\",\"images\": [\"$IMG\"] }],\"stream\": false}" | curl -X POST http://0.0.0.0:11434/api/chat -H "Content-Type: application/json" -H "Authorization: Bearer "$OLLAMA_API_KEY -d @-
{"model":"gemma4:31b-it-q4_K_M","created_at":"2026-04-14T21:56:58.18721552Z","message":{"..."},"done":true,"done_reason":"stop","total_duration":90482110976,"load_duration":285760912,"prompt_eval_count":281,"prompt_eval_duration":99862896,"eval_count":915,"eval_duration":8977992150

echo "{\"model\": \"gemma4:31b-it-q4_K_M\", \"messages\": [{\"role\": \"user\",  \"content\": \"\",\"images\": [\"$IMG\"] }],\"stream\": false, \"format\": \"json\"}" | curl -X POST http://0.0.0.0:11434/api/chat -H "Content-Type: application/json" -H "Authorization: Bearer "$OLLAMA_API_KEY -d @-
{"model":"gemma4:31b-it-q4_K_M","created_at":"2026-04-14T21:59:13.743806688Z","message":{"..."},"done":true,"done_reason":"stop","total_duration":56701847024,"load_duration":220037664,"prompt_eval_count":286,"prompt_eval_duration":116230352,"eval_count":28,"eval_duration":2635753616}

Image

(note; the attached image is a fake bank transfer order with synthetic imaginary data)

Relevant log output


OS

Linux

GPU

NVIDIA GB10 (DGX SPARK)

CPU

No response

Ollama version

0.20.4

Originally created by @fchahun on GitHub (Apr 14, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15592 ### What is the issue? With local versions of Qwen3.5 (9b, 27b, 35b, 122b), using a blank prompt and a single image (attached) as input, I notice a huge difference in input tokens depending whether format is unspecified or assigned to "json" (_prompt_eval_count_ increases from 432 to 1789). This does not occur with the cloud version (397b-cloud) (_prompt_eval_count_ remains unchanged at 432) This does not occur either with other models such as gemma4 (_prompt_eval_count_ slightly increases from 281 to 286) This phenomenon can be easily reproduced using the attached image an the **curl** commands below. What is the reason for that? ``` IMG=$(base64 < Image_001.jpeg | tr -d '\n') echo "{\"model\": \"qwen3.5:27b\", \"messages\": [{\"role\": \"user\", \"content\": \"\",\"images\": [\"$IMG\"] }],\"stream\": false}" | curl -X POST http://0.0.0.0:11434/api/chat -H "Content-Type: application/json" -d @- {"model":"qwen3.5:27b","created_at":"2026-04-14T12:57:14.954993416Z","message":{"..."},"done":true,"done_reason":"stop","total_duration":160473406601,"load_duration":6219169752,"prompt_eval_count":432,"prompt_eval_duration":724455268,"eval_count":2003,"eval_duration":152220636272} echo "{\"model\": \"qwen3.5:27b-q4_K_M\", \"messages\": [{\"role\": \"user\", \"content\": \"\",\"images\": [\"$IMG\"] }],\"stream\": false, \"format\": \"json\"}" | curl -X POST http://0.0.0.0:11434/api/chat -H "Content-Type: application/json" -d @- {"model":"qwen3.5:27b-q4_K_M","created_at":"2026-04-14T21:27:56.795489648Z","message":{"..."},"done":true,"done_reason":"stop","total_duration":156916654288,"load_duration":8222672080,"prompt_eval_count":1789,"prompt_eval_duration":2405578208,"eval_count":300,"eval_duration":26354827648} echo "{\"model\": \"qwen3.5:397b-cloud\", \"messages\": [{\"role\": \"user\", \"content\": \"\",\"images\": [\"$IMG\"] }],\"stream\": false}" | curl -X POST http://0.0.0.0:11434/api/chat -H "Content-Type: application/json" -H "Authorization: Bearer $OLLAMA_API_KEY" -d @- {"model":"qwen3.5:397b","created_at":"2026-04-14T12:53:46.888392524Z","message":{"..."},"done":true,"done_reason":"stop","total_duration":15905907345,"prompt_eval_count":432,"eval_count":775} echo "{\"model\": \"qwen3.5:397b-cloud\", \"messages\": [{\"role\": \"user\", \"content\": \"\",\"images\": [\"$IMG\"] }],\"stream\": false, \"format\": \"json\"}" | curl -X POST http://0.0.0.0:11434/api/chat -H "Content-Type: application/json" -H "Authorization: Bearer "$OLLAMA_API_KEY -d @- {"model":"qwen3.5:397b","created_at":"2026-04-14T21:51:49.78707483Z","message":{"..."},"done":true,"done_reason":"stop","total_duration":15994634694,"prompt_eval_count":432,"eval_count":889} echo "{\"model\": \"gemma4:31b-it-q4_K_M\", \"messages\": [{\"role\": \"user\", \"content\": \"\",\"images\": [\"$IMG\"] }],\"stream\": false}" | curl -X POST http://0.0.0.0:11434/api/chat -H "Content-Type: application/json" -H "Authorization: Bearer "$OLLAMA_API_KEY -d @- {"model":"gemma4:31b-it-q4_K_M","created_at":"2026-04-14T21:56:58.18721552Z","message":{"..."},"done":true,"done_reason":"stop","total_duration":90482110976,"load_duration":285760912,"prompt_eval_count":281,"prompt_eval_duration":99862896,"eval_count":915,"eval_duration":8977992150 echo "{\"model\": \"gemma4:31b-it-q4_K_M\", \"messages\": [{\"role\": \"user\", \"content\": \"\",\"images\": [\"$IMG\"] }],\"stream\": false, \"format\": \"json\"}" | curl -X POST http://0.0.0.0:11434/api/chat -H "Content-Type: application/json" -H "Authorization: Bearer "$OLLAMA_API_KEY -d @- {"model":"gemma4:31b-it-q4_K_M","created_at":"2026-04-14T21:59:13.743806688Z","message":{"..."},"done":true,"done_reason":"stop","total_duration":56701847024,"load_duration":220037664,"prompt_eval_count":286,"prompt_eval_duration":116230352,"eval_count":28,"eval_duration":2635753616} ``` ![Image](https://github.com/user-attachments/assets/8f815b67-0d4d-41b6-8d72-f87259e6b0e9) (note; the attached image is a fake bank transfer order with synthetic imaginary data) ### Relevant log output ```shell ``` ### OS Linux ### GPU NVIDIA GB10 (DGX SPARK) ### CPU _No response_ ### Ollama version 0.20.4
GiteaMirror added the bug label 2026-05-05 03:18:26 -05:00
Author
Owner

@rick-github commented on GitHub (Apr 15, 2026):

https://github.com/ollama/ollama/issues/14957

<!-- gh-comment-id:4248372664 --> @rick-github commented on GitHub (Apr 15, 2026): https://github.com/ollama/ollama/issues/14957
Author
Owner

@fchahun commented on GitHub (Apr 15, 2026):

Thank you. This also explains why the prompt_eval_count varies with different images all exactly the same dimensions.

But why is the situation différent for the cloud version (qwen3.5:397b-cloud)? Does it use a different inference framework?

<!-- gh-comment-id:4249836677 --> @fchahun commented on GitHub (Apr 15, 2026): Thank you. This also explains why the prompt_eval_count varies with different images all exactly the same dimensions. But why is the situation différent for the cloud version (qwen3.5:397b-cloud)? Does it use a different inference framework?
Author
Owner

@rick-github commented on GitHub (Apr 15, 2026):

Yes, different inference framework and cloud doesn't currently support structured outputs.

<!-- gh-comment-id:4251169059 --> @rick-github commented on GitHub (Apr 15, 2026): Yes, different inference framework and cloud [doesn't currently support](https://github.com/ollama/ollama/issues/12362) structured outputs.
Author
Owner

@PureBlissAK commented on GitHub (Apr 18, 2026):

🤖 Automated Triage & Analysis Report

Issue: #15592
Analyzed: 2026-04-18T18:19:27.003914

Analysis

  • Type: unknown
  • Severity: medium
  • Components: unknown

Implementation Plan

  • Effort: medium
  • Steps:

This issue has been triaged and marked for implementation.

<!-- gh-comment-id:4274304890 --> @PureBlissAK commented on GitHub (Apr 18, 2026): <!-- ollama-issue-orchestrator:v1 issue:15592 --> ## 🤖 Automated Triage & Analysis Report **Issue**: #15592 **Analyzed**: 2026-04-18T18:19:27.003914 ### Analysis - **Type**: unknown - **Severity**: medium - **Components**: unknown ### Implementation Plan - **Effort**: medium - **Steps**: *This issue has been triaged and marked for implementation.*
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#72010