[GH-ISSUE #10778] Ollama utilizes both the CPU and GPU simultaneously. #69138

Closed
opened 2026-05-04 17:15:59 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @qingningLime on GitHub (May 20, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10778

When running Qwen2.5VL with Ollama using ROCm, Ollama utilizes both the CPU and GPU, whereas other models do not exhibit this issue.

When using Qwen2.5VL for image recognition with Ollama, both the CPU and GPU are utilized—even though GPU usage is visible, Ollama only returns the result after the CPU completes its tasks. This issue does not occur with other models (e.g., MiniCPM-V).

Image Image

Given that many modern vision-language models now support video understanding, but Ollama currently only handles image input, is video input functionality being considered for future updates? This would align with models like Qwen-2.5VL and others that are expanding into temporal visual data processing.

Originally created by @qingningLime on GitHub (May 20, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10778 When running Qwen2.5VL with Ollama using ROCm, Ollama utilizes both the CPU and GPU, whereas other models do not exhibit this issue. When using Qwen2.5VL for image recognition with Ollama, both the CPU and GPU are utilized—even though GPU usage is visible, Ollama only returns the result after the CPU completes its tasks. This issue does not occur with other models (e.g., MiniCPM-V). <img width="862" alt="Image" src="https://github.com/user-attachments/assets/dacdeb7f-a5ab-41f1-923c-0f553ce31161" /> <img width="1277" alt="Image" src="https://github.com/user-attachments/assets/d185074f-ced7-4b24-8c89-84f977675743" /> Given that many modern vision-language models now support video understanding, but Ollama currently only handles image input, is video input functionality being considered for future updates? This would align with models like Qwen-2.5VL and others that are expanding into temporal visual data processing.
Author
Owner

@rick-github commented on GitHub (May 20, 2025):

qwen2.5vl requires more VRAM than minicpm-v, so part of the model is loaded in system RAM where the CPU does inference.

<!-- gh-comment-id:2893925851 --> @rick-github commented on GitHub (May 20, 2025): qwen2.5vl requires more VRAM than minicpm-v, so part of the model is loaded in system RAM where the CPU does inference.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#69138