[GH-ISSUE #9764] Question: Ollama support for VLLM with A30 GPUs for gemma-3-27b deployment #6382

Closed
opened 2026-04-12 17:53:29 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @ZimaBlueee on GitHub (Mar 14, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9764

Hello,

First of all, thank you for creating such an excellent project. I'm new to Ollama and have two A30 GPUs with 24GB VRAM each. I'd like to deploy the gemma-3-27b model, and after researching, I learned that A30 GPUs are better suited for VLLM rather than llama.cpp.

I'm wondering if Ollama supports switching to VLLM as the backend? This would help me better utilize my hardware for running larger models like gemma-3-27b.

Thank you for your help!

Originally created by @ZimaBlueee on GitHub (Mar 14, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9764 Hello, First of all, thank you for creating such an excellent project. I'm new to Ollama and have two A30 GPUs with 24GB VRAM each. I'd like to deploy the gemma-3-27b model, and after researching, I learned that A30 GPUs are better suited for VLLM rather than llama.cpp. I'm wondering if Ollama supports switching to VLLM as the backend? This would help me better utilize my hardware for running larger models like gemma-3-27b. Thank you for your help!
Author
Owner

@wisepmlin commented on GitHub (Mar 14, 2025):

Error: llama runner process has terminated: this model is not supported by your version of Ollama. You may need to upgrade

<!-- gh-comment-id:2724901978 --> @wisepmlin commented on GitHub (Mar 14, 2025): Error: llama runner process has terminated: this model is not supported by your version of Ollama. You may need to upgrade
Author
Owner

@jessegross commented on GitHub (Mar 14, 2025):

Ollama with Gemma3 is using neither llama.cpp nor vLLM - rather it uses its own internal engine. There are more hardware specific backends (to the Ollama engine) under development but you can't switch to a completely different inference engine.

<!-- gh-comment-id:2725448151 --> @jessegross commented on GitHub (Mar 14, 2025): Ollama with Gemma3 is using neither llama.cpp nor vLLM - rather it uses its own internal engine. There are more hardware specific backends (to the Ollama engine) under development but you can't switch to a completely different inference engine.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#6382