[GH-ISSUE #11217] image size of qwen2.5-vl #7390

Closed
opened 2026-04-12 19:28:41 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @wubangcai on GitHub (Jun 27, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11217

When I deployed qwen2.5-vl-72B using ollama and input the image for inference, to what scale will the image be scaled at this time? When I directly asked the model about the image size, the responses showed 10001000 and 1200800. However, I know that the image size of qwen2.5-vl must be an integer multiple of 28. Why is that?

Originally created by @wubangcai on GitHub (Jun 27, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11217 When I deployed qwen2.5-vl-72B using ollama and input the image for inference, to what scale will the image be scaled at this time? When I directly asked the model about the image size, the responses showed 1000*1000 and 1200*800. However, I know that the image size of qwen2.5-vl must be an integer multiple of 28. Why is that?
GiteaMirror added the question label 2026-04-12 19:28:41 -05:00
Author
Owner

@rick-github commented on GitHub (Jun 27, 2025):

qwne2.5vl resizes images such that the dimensions are a multiple of 28, and the final image has less than 1M pixels. So 1200x800 is resized to 1204x812, and 1000x1000 is resized to 980x980. Unless models are explicitly trained with the information, they generally can't answer questions about model parameters, as they have no introspection. The value of 28 is derived from the model parameters of patch_size and spatial_merge_size which control how the vision processor handles the images. For qwen2.5vl, patch_size is 14, spatial_merge_size is 2, and this constrains the processing to patch_size * spatial_merge_size or 28 pixels.

<!-- gh-comment-id:3013302452 --> @rick-github commented on GitHub (Jun 27, 2025): qwne2.5vl resizes images such that the dimensions are a multiple of 28, and the final image has less than 1M pixels. So 1200x800 is resized to 1204x812, and 1000x1000 is resized to 980x980. Unless models are explicitly trained with the information, they generally can't answer questions about model parameters, as they have no introspection. The value of 28 is derived from the model parameters of `patch_size` and `spatial_merge_size` which control how the vision processor handles the images. For qwen2.5vl, `patch_size` is 14, `spatial_merge_size` is 2, and this constrains the processing to `patch_size` * `spatial_merge_size` or 28 pixels.
Author
Owner

@pdevine commented on GitHub (Jul 1, 2025):

Going to mark this as answered. (Thank you @rick-github !)

<!-- gh-comment-id:3025058486 --> @pdevine commented on GitHub (Jul 1, 2025): Going to mark this as answered. (Thank you @rick-github !)
Author
Owner

@wubangcai commented on GitHub (Jul 8, 2025):

If I want the size of the input image to exceed 1M, how should I set it? @rick-github

<!-- gh-comment-id:3047563500 --> @wubangcai commented on GitHub (Jul 8, 2025): If I want the size of the input image to exceed 1M, how should I set it? @rick-github
Author
Owner

@crackerfly commented on GitHub (Jul 29, 2025):

qwne2.5vl resizes images such that the dimensions are a multiple of 28, and the final image has less than 1M pixels. So 1200x800 is resized to 1204x812, and 1000x1000 is resized to 980x980. Unless models are explicitly trained with the information, they generally can't answer questions about model parameters, as they have no introspection. The value of 28 is derived from the model parameters of patch_size and spatial_merge_size which control how the vision processor handles the images. For qwen2.5vl, patch_size is 14, spatial_merge_size is 2, and this constrains the processing to patch_size * spatial_merge_size or 28 pixels.

Hi~
ollama: 0.9.3
model: qwen2.5vl:7b-q8_0
question: i entered a image resolution of 798x519, and ollama reported an error directly, but when I crop the image resolution to 798x518, it works fine, what is the reason for this

<!-- gh-comment-id:3132036966 --> @crackerfly commented on GitHub (Jul 29, 2025): > qwne2.5vl resizes images such that the dimensions are a multiple of 28, and the final image has less than 1M pixels. So 1200x800 is resized to 1204x812, and 1000x1000 is resized to 980x980. Unless models are explicitly trained with the information, they generally can't answer questions about model parameters, as they have no introspection. The value of 28 is derived from the model parameters of `patch_size` and `spatial_merge_size` which control how the vision processor handles the images. For qwen2.5vl, `patch_size` is 14, `spatial_merge_size` is 2, and this constrains the processing to `patch_size` * `spatial_merge_size` or 28 pixels. Hi~ ollama: 0.9.3 model: qwen2.5vl:7b-q8_0 question: i entered a image resolution of 798x519, and ollama reported an error directly, but when I crop the image resolution to 798x518, it works fine, what is the reason for this
Author
Owner

@rick-github commented on GitHub (Jul 29, 2025):

Error?

<!-- gh-comment-id:3132122180 --> @rick-github commented on GitHub (Jul 29, 2025): Error?
Author
Owner

@crackerfly commented on GitHub (Jul 30, 2025):

Error?

I can't reproduce the problem after restarting the operating system...

<!-- gh-comment-id:3134552628 --> @crackerfly commented on GitHub (Jul 30, 2025): > Error? I can't reproduce the problem after restarting the operating system...
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#7390