[GH-ISSUE #1476] Add parameters for continuous batching and parallel flags #47308

Closed
opened 2026-04-28 03:34:10 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @YourTechBud on GitHub (Dec 11, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1476

Llama.cpp server supports continuous batching and running requests in parallel which can be super effective to make things way more efficient specifically when running smaller models.

I can contribute this feature if the project maintainers think it's a good addition.

Originally created by @YourTechBud on GitHub (Dec 11, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1476 Llama.cpp server supports [continuous batching and running requests in parallel](https://github.com/ggerganov/llama.cpp/tree/master/examples/server) which can be super effective to make things way more efficient specifically when running smaller models. I can contribute this feature if the project maintainers think it's a good addition.
GiteaMirror added the feature request label 2026-04-28 03:34:10 -05:00
Author
Owner

@jmorganca commented on GitHub (Dec 22, 2023):

Hi @YourTechBud , thanks so much for the issue! Absolutely 😊 . Will merge this with https://github.com/jmorganca/ollama/issues/358

<!-- gh-comment-id:1867191418 --> @jmorganca commented on GitHub (Dec 22, 2023): Hi @YourTechBud , thanks so much for the issue! Absolutely 😊 . Will merge this with https://github.com/jmorganca/ollama/issues/358
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#47308