[GH-ISSUE #10889] Option to not offload vision to GPU #53670

Open
opened 2026-04-29 04:27:03 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @weikinhuang on GitHub (May 28, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10889

Hi, I'm wondering if there are any plans for ollama to support skipping offload for both the old and new engines for the projectors. I'm currently using mistral 3.1 small, prior to the version that ollama added vision support, I was able to fit 95% of the model in VRAM, however with the latest updates with the vision projector taking ~8G of VRAM, the model has become unusably slow.

I'm currently testing out llama.cpp's --no-mmproj-offload option and it's working pretty well for mistral, it would be great to add similar functionality to ollama's engine.

Originally created by @weikinhuang on GitHub (May 28, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10889 Hi, I'm wondering if there are any plans for ollama to support skipping offload for both the old and new engines for the projectors. I'm currently using mistral 3.1 small, prior to the version that ollama added vision support, I was able to fit 95% of the model in VRAM, however with the latest updates with the vision projector taking ~8G of VRAM, the model has become unusably slow. I'm currently testing out llama.cpp's `--no-mmproj-offload` option and it's working pretty well for mistral, it would be great to add similar functionality to ollama's engine.
GiteaMirror added the feature request label 2026-04-29 04:27:03 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#53670