[GH-ISSUE #7321] Support loading the same model more than once #66708

Closed
opened 2026-05-04 07:54:56 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @jfwreinhardt on GitHub (Oct 22, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7321

Originally assigned to: @dhiltgen on GitHub.

Are there any plans to support loading the same model more than once?

On a CUDA based system with multiple GPUs, I have observed that performance decreases for each new concurrent prompt against the same model. To put it another way, we see higher tokens/s for sending a prompt to four different models concurrently than if we send four prompts to the same model concurrently.

It would be helpful from a performance perspective if ollama spawned a new runner when OLLAMA_NUM_PARALLEL was reached, rather than placing all the prompts in a queue to wait on a single runner.

Originally created by @jfwreinhardt on GitHub (Oct 22, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7321 Originally assigned to: @dhiltgen on GitHub. Are there any plans to support loading the same model more than once? On a CUDA based system with multiple GPUs, I have observed that performance decreases for each new concurrent prompt against the same model. To put it another way, we see higher tokens/s for sending a prompt to four different models concurrently than if we send four prompts to the same model concurrently. It would be helpful from a performance perspective if ollama spawned a new runner when OLLAMA_NUM_PARALLEL was reached, rather than placing all the prompts in a queue to wait on a single runner.
GiteaMirror added the feature request label 2026-05-04 07:54:56 -05:00
Author
Owner

@dhiltgen commented on GitHub (Oct 22, 2024):

More advanced scheduling support for the same model is tracked in #3902

<!-- gh-comment-id:2429872852 --> @dhiltgen commented on GitHub (Oct 22, 2024): More advanced scheduling support for the same model is tracked in #3902
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#66708