When two pages use two different models, the two models cannot be started at the same time. #888

Closed
opened 2025-11-11 14:33:03 -06:00 by GiteaMirror · 1 comment
Owner

Originally created by @ALixuhui on GitHub (May 11, 2024).

When two pages use two different models, the two models cannot be started at the same time.

I have some concurrent needs to support different users calling different models,. However, I've noticed that when multiple users chat with different models concurrently, Ollama only keeps one model in the startup state at a time. Whenever a new user starts chatting with a different model, the current model is killed and the new one is restarted. This impacts the response speed. How can I configure Ollama to start and maintain multiple models concurrently, without killing other models, to improve response speed?

Originally created by @ALixuhui on GitHub (May 11, 2024). When two pages use two different models, the two models cannot be started at the same time. I have some concurrent needs to support different users calling different models,. However, I've noticed that when multiple users chat with different models concurrently, Ollama only keeps one model in the startup state at a time. Whenever a new user starts chatting with a different model, the current model is killed and the new one is restarted. This impacts the response speed. How can I configure Ollama to start and maintain multiple models concurrently, without killing other models, to improve response speed?
Author
Owner

@ALixuhui commented on GitHub (May 11, 2024):

Environment="OLLAMA_MAX_LOADED_MODELS=2"

Solved by setting the ollama environment variable

@ALixuhui commented on GitHub (May 11, 2024): Environment="OLLAMA_MAX_LOADED_MODELS=2" Solved by setting the ollama environment variable
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/open-webui#888