[GH-ISSUE #2210] Keep models in RAM #27025

Closed
opened 2026-04-22 03:55:32 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @LeoPiresDeSouza on GitHub (Jan 26, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2210

I am testing llama2:7b models both using ollama and calling direct from a langchain python script.
My models are stored in an Ubuntu server withu 12 cores e 36 Gb of ram, but no GPU.
When I cal the model direct from python, setting memlock parameter to true, my memory usage goes above 6Gb, but when using ollma it stays below 3Gb.
It seams that ollama is not keeping the model entirely in ram, and it is taking a long time to response.

Is there a parameter like memlock to be set in Ollama to make it use my ram extensivelly?

I have installed Ollama using curl https://ollama.ai/install.sh | sh.

Originally created by @LeoPiresDeSouza on GitHub (Jan 26, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2210 I am testing llama2:7b models both using ollama and calling direct from a langchain python script. My models are stored in an Ubuntu server withu 12 cores e 36 Gb of ram, but no GPU. When I cal the model direct from python, setting memlock parameter to true, my memory usage goes above 6Gb, but when using ollma it stays below 3Gb. It seams that ollama is not keeping the model entirely in ram, and it is taking a long time to response. Is there a parameter like memlock to be set in Ollama to make it use my ram extensivelly? I have installed Ollama using curl https://ollama.ai/install.sh | sh.
Author
Owner

@easp commented on GitHub (Jan 27, 2024):

Ollama automatically unloads models from memory after 5 minutes of inactivity. That will be user-configurable in the next version 0.1.23.

Another thing to be aware of is that models are memory mapped and so they don't show up in process memory. They are instead accounted for in file cache.

<!-- gh-comment-id:1913324259 --> @easp commented on GitHub (Jan 27, 2024): Ollama automatically unloads models from memory after 5 minutes of inactivity. That will be user-configurable in the next version 0.1.23. Another thing to be aware of is that models are memory mapped and so they don't show up in process memory. They are instead accounted for in file cache.
Author
Owner

@pdevine commented on GitHub (Jan 28, 2024):

Going to close this since #2146 has merged.

<!-- gh-comment-id:1913742830 --> @pdevine commented on GitHub (Jan 28, 2024): Going to close this since #2146 has merged.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#27025