[GH-ISSUE #11297] qwen2.5vl model crashes on small image input (height or width < 28px) #69510

Closed
opened 2026-05-04 18:17:24 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @uhrinalex on GitHub (Jul 4, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11297

What is the issue?

When using vision models like qwen2.5vl with Ollama, if an image is passed where width or height < 28px, the model runner crashes with:
panic: height:27 or width:257 must be larger than factor:28

Relevant log output

http: panic serving 127.0.0.1:57511: height:27 or width:257 must be larger than factor:28
...
panic ...
github.com/ollama/ollama/model/models/qwen25vl.(*ImageProcessor).SmartResize

OS

Windows 11

GPU

No response

CPU

Snapdragon X Elite (arm)

Ollama version

0.9.5

Originally created by @uhrinalex on GitHub (Jul 4, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11297 ### What is the issue? When using vision models like qwen2.5vl with Ollama, if an image is passed where width or height < 28px, the model runner crashes with: `panic: height:27 or width:257 must be larger than factor:28` ### Relevant log output ```shell http: panic serving 127.0.0.1:57511: height:27 or width:257 must be larger than factor:28 ... panic ... github.com/ollama/ollama/model/models/qwen25vl.(*ImageProcessor).SmartResize ``` ### OS Windows 11 ### GPU _No response_ ### CPU Snapdragon X Elite (arm) ### Ollama version 0.9.5
GiteaMirror added the bug label 2026-05-04 18:17:24 -05:00
Author
Owner

@yss0729 commented on GitHub (Jul 14, 2025):

mark

<!-- gh-comment-id:3067588659 --> @yss0729 commented on GitHub (Jul 14, 2025): mark
Author
Owner

@mikel-brostrom commented on GitHub (Jul 15, 2025):

From the Qwen2.5vl technical report paper:

During both training and
inference, the height and width of the input images are resized to multiples of 28 before being fed into the
ViT

As a workaround you could resize the image or pad it to this size

<!-- gh-comment-id:3075556870 --> @mikel-brostrom commented on GitHub (Jul 15, 2025): From the Qwen2.5vl technical report paper: > During both training and > inference, the height and width of the input images are resized to multiples of 28 before being fed into the > ViT As a workaround you could resize the image or pad it to this size
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#69510