[GH-ISSUE #13742] Backend support "--no-mmproj-offload" arg to optimize vram use #71066

Closed
opened 2026-05-04 23:54:19 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @taozebra on GitHub (Jan 16, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/13742

When using a 32GB GPU, the VRAM becomes extremely tight for VL models approaching 33GB(eg. Qwen3 vl 30B Q8), as it is insufficient to accommodate most layers of the main language model alongside the visual projector simultaneously. This option can free up approximately several GB of VRAM, making this parameter extremely useful.And some times visual projector can not work well on old GPUs。

Originally created by @taozebra on GitHub (Jan 16, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/13742 When using a 32GB GPU, the VRAM becomes extremely tight for VL models approaching 33GB(eg. Qwen3 vl 30B Q8), as it is insufficient to accommodate most layers of the main language model alongside the visual projector simultaneously. This option can free up approximately several GB of VRAM, making this parameter extremely useful.And some times visual projector can not work well on old GPUs。
GiteaMirror added the feature request label 2026-05-04 23:54:19 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#71066