[GH-ISSUE #1672] "api/chat loads the model only when a request is received. Is it possible to add a flag to keep a specific model in memory permanently, to improve response time?" #26703

Closed
opened 2026-04-22 03:08:59 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @goldenquant on GitHub (Dec 22, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1672

"api/chat loads the model only when a request is received. Is it possible to add a flag to keep a specific model in memory permanently, to improve response time?"

Originally created by @goldenquant on GitHub (Dec 22, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1672 "api/chat loads the model only when a request is received. Is it possible to add a flag to keep a specific model in memory permanently, to improve response time?"
Author
Owner

@justinh-rahb commented on GitHub (Dec 22, 2023):

This is easily supported by llama.cpp by passing the mlock argument. Would make for slow switches if someone requests a different model I would think though, especially if you don't have a massive amount of RAM.

<!-- gh-comment-id:1867838294 --> @justinh-rahb commented on GitHub (Dec 22, 2023): This is easily supported by llama.cpp by passing the `mlock` argument. Would make for slow switches if someone requests a different model I would think though, especially if you don't have a massive amount of RAM.
Author
Owner

@rgaidot commented on GitHub (Dec 23, 2023):

ah! I hadn't seen that they'd added "mlock" on ollama via "use_mlock" (default value is false). This information could answer several issues.

<!-- gh-comment-id:1868278184 --> @rgaidot commented on GitHub (Dec 23, 2023): ah! I hadn't seen that they'd added "mlock" on ollama via "use_mlock" (default value is false). This information could answer several issues. - https://github.com/jmorganca/ollama/blob/main/llm/ext_server.go#L157C34-L157C42 - https://github.com/jmorganca/ollama/blob/main/docs/api.md
Author
Owner

@goldenquant commented on GitHub (Dec 26, 2023):

@rgaidot thxs

<!-- gh-comment-id:1869459683 --> @goldenquant commented on GitHub (Dec 26, 2023): @rgaidot thxs
Author
Owner

@erick1337 commented on GitHub (Dec 27, 2023):

@highjim have you solved the problem, would you please give more detailed steps how you solve that

<!-- gh-comment-id:1870057553 --> @erick1337 commented on GitHub (Dec 27, 2023): @highjim have you solved the problem, would you please give more detailed steps how you solve that
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#26703