[GH-ISSUE #3953] Support VLLM as a backend #64488

Open
opened 2026-05-03 17:49:52 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @kannon92 on GitHub (Apr 26, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3953

Hi,

I realize that this is a big ask but I am learning more and more about inferencing and I've heard that VLLM tends to have better performance for many GPU training.

OLLAMA is a great UX and I love the tight integration with llama.cpp. But it would be nice to start exploring how one could use OLLAMA models with vllm.

Originally created by @kannon92 on GitHub (Apr 26, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3953 Hi, I realize that this is a big ask but I am learning more and more about inferencing and I've heard that VLLM tends to have better performance for many GPU training. OLLAMA is a great UX and I love the tight integration with llama.cpp. But it would be nice to start exploring how one could use OLLAMA models with vllm.
GiteaMirror added the feature request label 2026-05-03 17:49:52 -05:00
Author
Owner

@JJassonn69 commented on GitHub (Apr 26, 2024):

it would be quite an interesting challenge, vllm supports concurrency natively so some things would surely be a plus.

<!-- gh-comment-id:2080043314 --> @JJassonn69 commented on GitHub (Apr 26, 2024): it would be quite an interesting challenge, vllm supports concurrency natively so some things would surely be a plus.
Author
Owner

@matbeedotcom commented on GitHub (May 31, 2024):

I feel like Ollama /needs/ to prove it can work with more than gguf models. Its proposed Modelfile method of defining how models are inferenced will actually be useful.

<!-- gh-comment-id:2142589960 --> @matbeedotcom commented on GitHub (May 31, 2024): I feel like Ollama /needs/ to prove it can work with more than gguf models. Its proposed Modelfile method of defining how models are inferenced will actually be useful.
Author
Owner

@angiopteris commented on GitHub (Sep 1, 2024):

Up the issue, I'm love to test other backend such as vllm to keep on improving perf

<!-- gh-comment-id:2323258751 --> @angiopteris commented on GitHub (Sep 1, 2024): Up the issue, I'm love to test other backend such as vllm to keep on improving perf
Author
Owner

@deiangi commented on GitHub (Dec 23, 2024):

I second this

<!-- gh-comment-id:2559822670 --> @deiangi commented on GitHub (Dec 23, 2024): I second this
Author
Owner

@ericcurtin commented on GitHub (Nov 24, 2025):

Implemented in Docker Model Runner:

https://github.com/docker/model-runner
https://www.docker.com/blog/docker-model-runner-integrates-vllm/

<!-- gh-comment-id:3570761057 --> @ericcurtin commented on GitHub (Nov 24, 2025): Implemented in Docker Model Runner: https://github.com/docker/model-runner https://www.docker.com/blog/docker-model-runner-integrates-vllm/
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#64488