[GH-ISSUE #11422] QWEN-VL vision encoder not on GPU #69600

Open
opened 2026-05-04 18:36:47 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @wangyi7099 on GitHub (Jul 15, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/11422

What is the issue?

Before the first token is generated, GPU usage remains above 80%. After searching online, I found that the vision encoding phase (VIT) is running on the CPU instead of the GPU, which significantly slows down the generation of the first token. Is there a configuration parameter to make the model use the GPU during the vision encoding phase?

Relevant log output


OS

win11
No response

GPU

RTX4090
No response

CPU

No response

Ollama version

0.9.6
No response

Originally created by @wangyi7099 on GitHub (Jul 15, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/11422 ### What is the issue? Before the first token is generated, GPU usage remains above 80%. After searching online, I found that the vision encoding phase (VIT) is running on the CPU instead of the GPU, which significantly slows down the generation of the first token. Is there a configuration parameter to make the model use the GPU during the vision encoding phase? ### Relevant log output ```shell ``` ### OS win11 _No response_ ### GPU RTX4090 _No response_ ### CPU _No response_ ### Ollama version 0.9.6 _No response_
GiteaMirror added the bug label 2026-05-04 18:36:47 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#69600