[GH-ISSUE #1867] ollama barely uses any Ram #1067

Closed
opened 2026-04-12 10:49:18 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @neuleo on GitHub (Jan 9, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/1867

Hey Guys,

I run ollama on docker and use mostly 7b models. But my Ram usage stays under 4 GB. Sometimes even below 3 GB.

But the recommendations are 8 GB of Ram.

It has 4 Core CPU, and it generates very slow even though I got 24 GB of Ram. I don't have a Video Card, though.

I'm new to this, so can anyone tell me what I might need to do differently?

Originally created by @neuleo on GitHub (Jan 9, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/1867 Hey Guys, I run ollama on docker and use mostly 7b models. But my Ram usage stays under 4 GB. Sometimes even below 3 GB. But the recommendations are 8 GB of Ram. It has 4 Core CPU, and it generates very slow even though I got 24 GB of Ram. I don't have a Video Card, though. I'm new to this, so can anyone tell me what I might need to do differently?
Author
Owner

@easp commented on GitHub (Jan 9, 2024):

Models are loaded using mmap and as a result probably appear in file cache memory use, rather than as part of the ollama process memory.

<!-- gh-comment-id:1883432103 --> @easp commented on GitHub (Jan 9, 2024): Models are loaded using mmap and as a result probably appear in file cache memory use, rather than as part of the ollama process memory.
Author
Owner

@pdevine commented on GitHub (Jan 9, 2024):

@neuleo for a 7b, 4bit quantized model I would expect it to take up around 4 GB. The amount of memory though comes down to the size of the model and the context size that you're using, so it's a bit squishy. We're adding some improvements in 0.1.19 to be able to more accurately guess the amount of memory though.

That said, I don't know what CPU you're using, but generally speaking, you'll get far better results from a GPU than the CPU. We've also got some changes coming to take more advantage of the AVX capabilities in the CPU which if you have a modern CPU w/ AVX-512 you may see some performance gains.

I'm going to go ahead and close the issue, but feel free to keep commenting or reach out on the discord.

<!-- gh-comment-id:1883592927 --> @pdevine commented on GitHub (Jan 9, 2024): @neuleo for a 7b, 4bit quantized model I would expect it to take up around 4 GB. The amount of memory though comes down to the size of the model _and_ the context size that you're using, so it's a bit squishy. We're adding some improvements in 0.1.19 to be able to more accurately guess the amount of memory though. That said, I don't know what CPU you're using, but generally speaking, you'll get far better results from a GPU than the CPU. We've also got some changes coming to take more advantage of the AVX capabilities in the CPU which if you have a modern CPU w/ AVX-512 you may see some performance gains. I'm going to go ahead and close the issue, but feel free to keep commenting or reach out on the discord.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#1067