[GH-ISSUE #14388] Qwen2-VL-2B GGUF model fails to recognize images in Ollama (works fine with llama.cpp) #55862

Closed
opened 2026-04-29 09:49:25 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @limingda1212 on GitHub (Feb 24, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14388

What is the issue?

Issue Description:

I am trying to use the Qwen2-VL-2B-Instruct GGUF model with Ollama for image text recognition (OCR). The model loads and responds to text-only prompts correctly, but when an image is provided (via -- image flag or API), the output is gibberish (random Chinese/English characters) or empty. The same GGUF model files (both the main model and the mmproj projector) work perfectly when run with llama-mtmd-cli from llama.cpp, confirming that the model files themselves are intact.

Environment:

  • OS: Ubuntu 24.04 (x86_64)
  • Ollama version: 0.17.0 (upgraded from 0.14.3, problem persists)
  • Model files:
    • Main model: Qwen2-VL-2B-Instruct-Q4_K_M.gguf (from bartowski's Hugging Face repo)
    • Projector: mmproj-Qwen2-VL-2B-Instruct-f16.gguf (same source)
  • Test image: a screenshot containing clear English/Chinese text (previously used with llama-mtmd-cli and OCR succeeded)

Steps to Reproduce:

  1. Create a simple Modelfile (only required lines):
FROM /path/to/Qwen2-VL-2B-Instruct-Q4_K_M.gguf
ADAPTER /path/to/mmproj-Qwen2-VL-2B-Instruct-f16.gguf
TEMPLATE "{{ .Prompt }}"
  1. Create the model:
ollama create qwen2-vl-test -f Modelfile
  1. Run with an image:
ollama run qwen2-vl-test "识别这张图片中的文字" -- /path/to/test-image.png

Actual Result:
The model outputs random gibberish (see example below). Multiple runs yield different random outputs.

Added image '/path/to/test-image.png'
 。
。
 '。
。。
两个  .
 。
接到 。
 .chomp垃圾和中国湖北省"/。
 
  chimp '
, hosts '/
 在  
 。
,。
。
 '"/>
,
。
  lu硬件 
通过
 《-portschemes
中的成为中国.chomp
。
。

Expected Result:
The model should output the actual text contained in the image, as verified by the same image processed with llama-mtmd-cli.

Additional Information:

  • The same GGUF files work flawlessly with llama-mtmd-cli (llama.cpp b8140):
./llama-mtmd-cli -m Qwen2-VL-2B-Instruct-Q4_K_M.gguf --mmproj mmproj-Qwen2-VL-2B-Instruct-f16.gguf --image test-image.png -p "识别这张图片中的文字" -n 256

Output (correct):

一个轻量级的中文/英文对话模型,基于 Qwen2.5 架构,1.5B 参数,适合快速响应的本地部署场景。支持通用对话、文本生成、简单推理等任务。
  • The issue occurs both with the -- image command line flag and via the API (image base64 encoded). The API returns empty content (evaluation count 1).
  • No errors are shown in Ollama logs when the image is processed.

Possible Cause:
It seems that Ollama's handling of the visual projector (mmproj) for Qwen2-VL models may be broken, possibly due to incorrect tensor mapping or missing support for the newer architecture. The same model works with llama.cpp, so the issue is likely in Ollama's integration.

Request:
Please investigate why Qwen2-VL GGUF models with visual projector fail to recognize images in Ollama, and provide a fix or workaround.

Thank you for your great work!

Relevant log output


OS

Linux

GPU

No response

CPU

No response

Ollama version

0.17.0

Originally created by @limingda1212 on GitHub (Feb 24, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14388 ### What is the issue? **Issue Description:** I am trying to use the Qwen2-VL-2B-Instruct GGUF model with Ollama for image text recognition (OCR). The model loads and responds to text-only prompts correctly, but when an image is provided (via `-- image` flag or API), the output is gibberish (random Chinese/English characters) or empty. The same GGUF model files (both the main model and the mmproj projector) work perfectly when run with `llama-mtmd-cli` from llama.cpp, confirming that the model files themselves are intact. **Environment:** - OS: Ubuntu 24.04 (x86_64) - Ollama version: 0.17.0 (upgraded from 0.14.3, problem persists) - Model files: - Main model: `Qwen2-VL-2B-Instruct-Q4_K_M.gguf` (from bartowski's Hugging Face repo) - Projector: `mmproj-Qwen2-VL-2B-Instruct-f16.gguf` (same source) - Test image: a screenshot containing clear English/Chinese text (previously used with `llama-mtmd-cli` and OCR succeeded) **Steps to Reproduce:** 1. Create a simple Modelfile (only required lines): ``` FROM /path/to/Qwen2-VL-2B-Instruct-Q4_K_M.gguf ADAPTER /path/to/mmproj-Qwen2-VL-2B-Instruct-f16.gguf TEMPLATE "{{ .Prompt }}" ``` 2. Create the model: ``` ollama create qwen2-vl-test -f Modelfile ``` 3. Run with an image: ``` ollama run qwen2-vl-test "识别这张图片中的文字" -- /path/to/test-image.png ``` **Actual Result:** The model outputs random gibberish (see example below). Multiple runs yield different random outputs. ``` Added image '/path/to/test-image.png' 。 。 '。 。。 两个 . 。 接到 。 .chomp垃圾和中国湖北省"/。 chimp ' , hosts '/ 在 。 ,。 。 '"/> , 。 lu硬件 通过 《-portschemes 中的成为中国.chomp 。 。 ``` **Expected Result:** The model should output the actual text contained in the image, as verified by the same image processed with `llama-mtmd-cli`. **Additional Information:** - The same GGUF files work flawlessly with `llama-mtmd-cli` (llama.cpp b8140): ``` ./llama-mtmd-cli -m Qwen2-VL-2B-Instruct-Q4_K_M.gguf --mmproj mmproj-Qwen2-VL-2B-Instruct-f16.gguf --image test-image.png -p "识别这张图片中的文字" -n 256 ``` Output (correct): ``` 一个轻量级的中文/英文对话模型,基于 Qwen2.5 架构,1.5B 参数,适合快速响应的本地部署场景。支持通用对话、文本生成、简单推理等任务。 ``` - The issue occurs both with the `-- image` command line flag and via the API (image base64 encoded). The API returns empty content (evaluation count 1). - No errors are shown in Ollama logs when the image is processed. **Possible Cause:** It seems that Ollama's handling of the visual projector (mmproj) for Qwen2-VL models may be broken, possibly due to incorrect tensor mapping or missing support for the newer architecture. The same model works with llama.cpp, so the issue is likely in Ollama's integration. **Request:** Please investigate why Qwen2-VL GGUF models with visual projector fail to recognize images in Ollama, and provide a fix or workaround. Thank you for your great work! ### Relevant log output ```shell ``` ### OS Linux ### GPU _No response_ ### CPU _No response_ ### Ollama version 0.17.0
GiteaMirror added the bug label 2026-04-29 09:49:25 -05:00
Author
Owner

@lingfan36 commented on GitHub (Feb 25, 2026):

👋 Hi there!

We've prepared a detailed solution guide for this issue:

🔗 Solution Guide: https://ollamahub.space/pages/solutions/detail.html?id=models/qwen2-vl-gguf-image-recognition

The documentation includes:

  • Problem analysis
  • Workaround solutions
  • Code examples

If your specific case isn't covered, feel free to provide more details.


Generated by OllamaHub

<!-- gh-comment-id:3958040966 --> @lingfan36 commented on GitHub (Feb 25, 2026): 👋 Hi there! We've prepared a detailed solution guide for this issue: 🔗 **Solution Guide**: https://ollamahub.space/pages/solutions/detail.html?id=models/qwen2-vl-gguf-image-recognition The documentation includes: - Problem analysis - Workaround solutions - Code examples If your specific case isn't covered, feel free to provide more details. --- *Generated by OllamaHub*
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#55862