[GH-ISSUE #15828] qwen2.5vl:7b GGML_ASSERT(a->ne[2] * 4 == b->ne[0]) - 500 Internal Server Error on vision inference #56600

Open
opened 2026-04-29 11:05:03 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @tustuntas on GitHub (Apr 26, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15828

What is the issue?

Body:

Environment

  • OS: Windows 11
  • GPU: NVIDIA RTX 4090 24GB VRAM, Driver 591.86, CUDA 13.1
  • Ollama: 0.21.3-rc0
  • Model: qwen2.5vl:7b (ID: 5ced39dfa4ba, 6.0 GB, 29/29 layers on GPU)

Description

During batch vision inference (PDF page-by-page extraction with Qwen2.5-VL), Ollama randomly returns 500 Internal Server Error. The server log shows:

GGML_ASSERT(a->ne[2] * 4 == b->ne[0])

The error is not deterministic — the same image succeeds on retry or after an Ollama restart. During sustained batch processing (~10-15 sequential requests), the error rate is approximately 30-40%.

Environment Variables

OLLAMA_FLASH_ATTENTION=1
OLLAMA_NUM_GPU=99
OLLAMA_NUM_PARALLEL=1

Steps to Reproduce

  1. Load qwen2.5vl:7b model
  2. Send 10-15 sequential vision requests with 150 DPI page images (1288x1638 pixels)
  3. Observe intermittent 500 errors on random pages

Error Pattern

  • First few pages typically succeed
  • After 3-5 successful requests, 500 errors start appearing
  • Error rate increases with sustained usage
  • Restarting Ollama temporarily clears the issue
  • Splitting images into smaller halves (1288x839) reduces error frequency

Workaround

  • Splitting the image into 2 halves before sending reduces error rate
  • Restarting Ollama service after consecutive failures
  • Using keep_alive and retry logic in the client

Model Details

NAME                       ID              SIZE
qwen2.5vl:7b               5ced39dfa4ba    6.0 GB
qwen2.5vl:32b              3edc3a52fe98    21 GB

Same issue also occurs with qwen2.5vl:32b (49/65 layers on GPU) but less frequently due to slower processing.


### Relevant log output

```shell

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.21.3-rc0

Originally created by @tustuntas on GitHub (Apr 26, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15828 ### What is the issue? **Body:** ### Environment - **OS:** Windows 11 - **GPU:** NVIDIA RTX 4090 24GB VRAM, Driver 591.86, CUDA 13.1 - **Ollama:** 0.21.3-rc0 - **Model:** qwen2.5vl:7b (ID: 5ced39dfa4ba, 6.0 GB, 29/29 layers on GPU) ### Description During batch vision inference (PDF page-by-page extraction with Qwen2.5-VL), Ollama randomly returns `500 Internal Server Error`. The server log shows: ``` GGML_ASSERT(a->ne[2] * 4 == b->ne[0]) ``` The error is **not deterministic** — the same image succeeds on retry or after an Ollama restart. During sustained batch processing (~10-15 sequential requests), the error rate is approximately 30-40%. ### Environment Variables ``` OLLAMA_FLASH_ATTENTION=1 OLLAMA_NUM_GPU=99 OLLAMA_NUM_PARALLEL=1 ``` ### Steps to Reproduce 1. Load `qwen2.5vl:7b` model 2. Send 10-15 sequential vision requests with 150 DPI page images (1288x1638 pixels) 3. Observe intermittent 500 errors on random pages ### Error Pattern - First few pages typically succeed - After 3-5 successful requests, 500 errors start appearing - Error rate increases with sustained usage - Restarting Ollama temporarily clears the issue - Splitting images into smaller halves (1288x839) reduces error frequency ### Workaround - Splitting the image into 2 halves before sending reduces error rate - Restarting Ollama service after consecutive failures - Using `keep_alive` and retry logic in the client ### Model Details ``` NAME ID SIZE qwen2.5vl:7b 5ced39dfa4ba 6.0 GB qwen2.5vl:32b 3edc3a52fe98 21 GB ``` Same issue also occurs with `qwen2.5vl:32b` (49/65 layers on GPU) but less frequently due to slower processing. ``` ### Relevant log output ```shell ``` ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.21.3-rc0
GiteaMirror added the bug label 2026-04-29 11:05:03 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#56600