[GH-ISSUE #1251] How can I disable automatic model offloading from GPU memory #26400

Closed
opened 2026-04-22 02:40:15 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @erick1337 on GitHub (Nov 23, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1251

First of all, thank you for your great work with ollama!

I found that ollama will automatically offload models from GPU memory (very frequently, even after 2-minute inactive use).

But the loading process takes too much time, how can I forge ollama keep the model loading in GPU memory?

Thanks

Originally created by @erick1337 on GitHub (Nov 23, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1251 First of all, thank you for your great work with ollama! I found that ollama will automatically offload models from GPU memory (very frequently, even after 2-minute inactive use). But the loading process takes too much time, how can I forge ollama keep the model loading in GPU memory? Thanks
Author
Owner

@erick1337 commented on GitHub (Nov 23, 2023):

Close this issue because there is already a post about this:
https://github.com/jmorganca/ollama/issues/931

<!-- gh-comment-id:1823768786 --> @erick1337 commented on GitHub (Nov 23, 2023): Close this issue because there is already a post about this: https://github.com/jmorganca/ollama/issues/931
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#26400