[GH-ISSUE #6918] Unreliable free memory resulting in models not running #50889

New Issue

GiteaMirror · 2026-04-28T17:20:16-05:00

GiteaMirror commented

2026-04-28 17:20:16 -05:00

Originally created by @ddpasa on GitHub (Sep 23, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6918

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

From what I understand, new versions of ollama compare the expected memory requirements of a model with the amount of free memory seen by ollama, and prints an error message if the model memory requirements are larger. This make a lof of sense.

However, the free memory on Linux is (from what I understand) is not a very reliable estimate. For the same model on the same machine, I have had cases where ollama ran successfully, or reported insufficient memory.

Is it possible to disable this feature entirely?

OS

Linux

GPU

No response

CPU

No response

Ollama version

latest mainline

Originally created by @ddpasa on GitHub (Sep 23, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6918 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? From what I understand, new versions of ollama compare the expected memory requirements of a model with the amount of free memory seen by ollama, and prints an error message if the model memory requirements are larger. This make a lof of sense. However, the free memory on Linux is (from what I understand) is not a very reliable estimate. For the same model on the same machine, I have had cases where ollama ran successfully, or reported insufficient memory. Is it possible to disable this feature entirely? ### OS Linux ### GPU _No response_ ### CPU _No response_ ### Ollama version latest mainline

GiteaMirror added the feature request linux labels 2026-04-28 17:20:16 -05:00

GiteaMirror commented

2026-04-28 17:20:18 -05:00

@rick-github commented on GitHub (Sep 23, 2024):

Broadly speaking, ollama wants the sum of unallocated RAM and unallocated swap to be more than the required memory for loading the model + context space that don't fit on the GPU. The server logs will show the relevant values. If you are finding that a model loads sometimes and not others, then ollama thinks that your system is close to over-committing RAM and doesn't want to get in to a situation where the OOM-killer starts sniping processes. You can check the figures in the logs and if you find that the data is inconsistent then that should be followed up. You can mitigate the problems with model loading by using a smaller model, setting a smaller context size, or adding swap.

@rick-github commented on GitHub (Sep 23, 2024): Broadly speaking, ollama wants the sum of unallocated RAM and unallocated swap to be more than the required memory for loading the model + context space that don't fit on the GPU. The [server logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues) will show the relevant values. If you are finding that a model loads sometimes and not others, then ollama thinks that your system is close to over-committing RAM and doesn't want to get in to a situation where the OOM-killer starts sniping processes. You can check the figures in the logs and if you find that the data is inconsistent then that should be followed up. You can mitigate the problems with model loading by using a smaller model, setting a smaller context size, or adding swap.

GiteaMirror commented

2026-04-28 17:20:18 -05:00

@ddpasa commented on GitHub (Sep 23, 2024):

If you are finding that a model loads sometimes and not others, then ollama thinks that your system is close to over-committing RAM and doesn't want to get in to a situation where the OOM-killer starts sniping processes.

I think this is exactly what is happening.

I have 16GB of ram on my laptop, and the free memory ollama sees fluctuated between 9.5GB and all the way up to 13GB. This is a huge range of memory fluctuations.

This is cpu only inference, no GPU involved.

@ddpasa commented on GitHub (Sep 23, 2024): > If you are finding that a model loads sometimes and not others, then ollama thinks that your system is close to over-committing RAM and doesn't want to get in to a situation where the OOM-killer starts sniping processes. I think this is exactly what is happening. I have 16GB of ram on my laptop, and the free memory ollama sees fluctuated between 9.5GB and all the way up to 13GB. This is a huge range of memory fluctuations. This is cpu only inference, no GPU involved.

GiteaMirror commented

2026-04-28 17:20:19 -05:00

@ddpasa commented on GitHub (Sep 23, 2024):

I think the current logic is a safe convervative choice that works most of the time. However, I know my own system very well, and would like to override the available memory with a larger value for making sure I avoid changing behaviour.

@ddpasa commented on GitHub (Sep 23, 2024): I think the current logic is a safe convervative choice that works most of the time. However, I know my own system very well, and would like to override the available memory with a larger value for making sure I avoid changing behaviour.

GiteaMirror commented

2026-04-28 17:20:20 -05:00

@dhiltgen commented on GitHub (Sep 25, 2024):

We look at available memory which should be buffer cache aware, along with swap free space to establish a threshold so we can block model loads that exceed that. Can you describe your scenario a bit more? Are you loading different models in rapid fire where we unload one to make room for the next, but still think there isn't room due to stale memory information? For GPUs we wait up to 5s for the VRAM reporting to converge, but we don't currently have code in place to do that for system memory. If that's the scenario you're running into, maybe that enhancement would help address the problem.

@dhiltgen commented on GitHub (Sep 25, 2024): We look at available memory which should be buffer cache aware, along with swap free space to establish a threshold so we can block model loads that exceed that. Can you describe your scenario a bit more? Are you loading different models in rapid fire where we unload one to make room for the next, but still think there isn't room due to stale memory information? For GPUs we wait up to 5s for the VRAM reporting to converge, but we don't currently have code in place to do that for system memory. If that's the scenario you're running into, maybe that enhancement would help address the problem.

GiteaMirror commented

2026-04-28 17:20:22 -05:00

@ddpasa commented on GitHub (Sep 27, 2024):

We look at available memory which should be buffer cache aware, along with swap free space to establish a threshold so we can block model loads that exceed that. Can you describe your scenario a bit more? Are you loading different models in rapid fire where we unload one to make room for the next, but still think there isn't room due to stale memory information? For GPUs we wait up to 5s for the VRAM reporting to converge, but we don't currently have code in place to do that for system memory. If that's the scenario you're running into, maybe that enhancement would help address the problem.

I don't think there is a major issue with the current logic, but it does not work well in case where I really want to push my system and run the largest model I can get away with. It seems a little too conservative.

I have a suspicion that it's due to other programs in my laptop using memory that confuses ollama.

A very simple solution is to allow users to override this value with an environment variable. It keeps the default safe behaviour, while allowing us to run with large models right around the memory threshold.

@ddpasa commented on GitHub (Sep 27, 2024): > We look at available memory which should be buffer cache aware, along with swap free space to establish a threshold so we can block model loads that exceed that. Can you describe your scenario a bit more? Are you loading different models in rapid fire where we unload one to make room for the next, but still think there isn't room due to stale memory information? For GPUs we wait up to 5s for the VRAM reporting to converge, but we don't currently have code in place to do that for system memory. If that's the scenario you're running into, maybe that enhancement would help address the problem. I don't think there is a major issue with the current logic, but it does not work well in case where I really want to push my system and run the largest model I can get away with. It seems a little too conservative. I have a suspicion that it's due to other programs in my laptop using memory that confuses ollama. A very simple solution is to allow users to override this value with an environment variable. It keeps the default safe behaviour, while allowing us to run with large models right around the memory threshold.

GiteaMirror commented

2026-04-28 17:20:24 -05:00

@xgdgsc commented on GitHub (Nov 20, 2024):

I also face this issue on x elite arm windows laptop and a m2 macbook both with 32gb ram. I often have tons of background edge/vscode windows that I don' t close that could be moved to swap safely. And I think the Windows swap size is dynamic? So I need an option to manually specify the max memory ollama ask from system so I could run models I need more easily. Currently I' m restricted to smaller models by this.

@xgdgsc commented on GitHub (Nov 20, 2024): I also face this issue on x elite arm windows laptop and a m2 macbook both with 32gb ram. I often have tons of background edge/vscode windows that I don' t close that could be moved to swap safely. And I think the Windows swap size is dynamic? So I need an option to manually specify the max memory ollama ask from system so I could run models I need more easily. Currently I' m restricted to smaller models by this.

GiteaMirror commented

2026-04-28 17:20:25 -05:00

@ddpasa commented on GitHub (Nov 20, 2024):

I think overriding with a environment variable ist he cleanest solution.

@ddpasa commented on GitHub (Nov 20, 2024): I think overriding with a environment variable ist he cleanest solution.

GiteaMirror commented

2026-04-28 17:20:27 -05:00

@rick-github commented on GitHub (Nov 20, 2024):

Does adding swap not solve the problem?

@rick-github commented on GitHub (Nov 20, 2024): Does adding swap not solve the problem?

GiteaMirror commented

2026-04-28 17:20:31 -05:00

@xgdgsc commented on GitHub (Nov 20, 2024):

https://discussions.apple.com/thread/7417584?answerId=7417584021&sortBy=rank#7417584021 seems macos has no option of manually add. So considering both macos and Windows manages swap dynamically by default, adding an env variable should work.

@xgdgsc commented on GitHub (Nov 20, 2024): https://discussions.apple.com/thread/7417584?answerId=7417584021&sortBy=rank#7417584021 seems macos has no option of manually add. So considering both macos and Windows manages swap dynamically by default, adding an env variable should work.

GiteaMirror commented

2026-04-28 17:20:32 -05:00

@rick-github commented on GitHub (Nov 20, 2024):

OLLAMA_MEM=30000000000 perl -e '$x="a" x $ENV{OLLAMA_MEM}; exec ("ollama","run","some-big-model","");'

@rick-github commented on GitHub (Nov 20, 2024): ``` OLLAMA_MEM=30000000000 perl -e '$x="a" x $ENV{OLLAMA_MEM}; exec ("ollama","run","some-big-model","");' ```

GiteaMirror commented

2026-04-28 17:20:33 -05:00

@rick-github commented on GitHub (Nov 20, 2024):

Python is more likely to be installed on a windows system than perl, so for better cross platform support:

#!/usr/bin/env python3

# For systems with dynamic swap, alloc a large buffer to force the
# system to add swap, then exec ollama into that free space.
# Install this in a path before the actual ollama binary, and
# adjust `ollama_binary` below to point to the real ollama.
#
# use:  OLLAMA_MEMORY=10000000000 ollama run some-large-model

import os
import platform
import sys

ollama_binary = "ollama"
if platform.system() == "Windows":
  ollama_binary = "ollama.exe"

_ = "a"*int(os.environ.get("OLLAMA_MEMORY",0))
os.execvp(ollama_binary, [ollama_binary]+sys.argv[1:])

@rick-github commented on GitHub (Nov 20, 2024): Python is more likely to be installed on a windows system than perl, so for better cross platform support: ```python #!/usr/bin/env python3 # For systems with dynamic swap, alloc a large buffer to force the # system to add swap, then exec ollama into that free space. # Install this in a path before the actual ollama binary, and # adjust `ollama_binary` below to point to the real ollama. # # use: OLLAMA_MEMORY=10000000000 ollama run some-large-model import os import platform import sys ollama_binary = "ollama" if platform.system() == "Windows": ollama_binary = "ollama.exe" _ = "a"*int(os.environ.get("OLLAMA_MEMORY",0)) os.execvp(ollama_binary, [ollama_binary]+sys.argv[1:]) ```

GiteaMirror commented

2026-04-28 17:20:34 -05:00

@xgdgsc commented on GitHub (Nov 21, 2024):

Thanks. Works for me.

@xgdgsc commented on GitHub (Nov 21, 2024): Thanks. Works for me.

GiteaMirror commented

2026-04-28 17:20:35 -05:00

@unicorn667 commented on GitHub (Mar 6, 2025):

worked not for me

@unicorn667 commented on GitHub (Mar 6, 2025): worked not for me

GiteaMirror commented

2026-04-28 17:20:36 -05:00

@rick-github commented on GitHub (Mar 6, 2025):

You'll have to be more specific about what's not working.

@rick-github commented on GitHub (Mar 6, 2025): You'll have to be more specific about what's not working.

GiteaMirror commented

2026-04-28 17:20:36 -05:00

@thojo0 commented on GitHub (Mar 17, 2026):

For me I also can't load models even if enough memory is available.

We look at available memory which should be buffer cache aware, along with swap free space to establish a threshold so we can block model loads that exceed that.

I don't think, this is working correctly under linux (ollama v0.18.0): free -h

               total        used        free      shared  buff/cache   available
Mem:            32Gi        37Mi        10Gi        64Ki        21Gi        31Gi
Swap:          512Mi       4.4Mi       507Mi

In this case ollama refuses to load models bigger than 10G, but should be able to load models up to 31G.

@thojo0 commented on GitHub (Mar 17, 2026): For me I also can't load models even if enough memory is available. > We look at available memory which should be buffer cache aware, along with swap free space to establish a threshold so we can block model loads that exceed that. I don't think, this is working correctly under linux (ollama v0.18.0): `free -h` ``` total used free shared buff/cache available Mem: 32Gi 37Mi 10Gi 64Ki 21Gi 31Gi Swap: 512Mi 4.4Mi 507Mi ``` In this case ollama refuses to load models bigger than 10G, but should be able to load models up to 31G.

GiteaMirror commented

2026-04-28 17:20:37 -05:00

@aldem commented on GitHub (Mar 30, 2026):

Just hit similar issue - it reports that I have "not enough" RAM.

Checking MemFree is not reliable (even without containers), because MemFree does not reflect available memory, which is reported as MemAvailable:

MemAvailable %lu (since Linux 3.14)
  An estimate of how much memory is available for starting new applications, without swapping.

MemFree only shows not used memory, this means in particular that if most of the RAM is used for caches then it will be low, like in my case:

$ free -h
               total        used        free      shared  buff/cache   available
Mem:            64Gi        30Gi       557Mi       4.7Gi        37Gi        33Gi
Swap:             0B          0B          0B

I have 33G available but ollama refuses to run with message Error: model requires more system memory (539.6 MiB) than is available (486.9 MiB).

@aldem commented on GitHub (Mar 30, 2026): Just hit similar issue - it reports that I have "not enough" RAM. Checking `MemFree` is *not reliable* (even without containers), because `MemFree` *does not* reflect *available* memory, which is reported as `MemAvailable`: ``` MemAvailable %lu (since Linux 3.14) An estimate of how much memory is available for starting new applications, without swapping. ``` `MemFree` only shows *not used* memory, this means in particular that if most of the RAM is used for caches then it will be low, like in my case: ``` $ free -h total used free shared buff/cache available Mem: 64Gi 30Gi 557Mi 4.7Gi 37Gi 33Gi Swap: 0B 0B 0B ``` I have 33G available but `ollama` refuses to run with message `Error: model requires more system memory (539.6 MiB) than is available (486.9 MiB)`.

GiteaMirror commented

2026-04-28 17:20:38 -05:00

@rick-github commented on GitHub (Mar 30, 2026):

Checking MemFree is not reliable (even without containers), because MemFree does not reflect available memory, which is reported as MemAvailable:

ollama uses MemAvailable. Server logs will aid in debugging.

@rick-github commented on GitHub (Mar 30, 2026): > Checking MemFree is not reliable (even without containers), because MemFree does not reflect available memory, which is reported as MemAvailable: ollama uses MemAvailable. [Server logs](https://docs.ollama.com/troubleshooting) will aid in debugging.

GiteaMirror commented

2026-04-28 17:20:38 -05:00

@aldem commented on GitHub (Mar 30, 2026):

I am not sure it does, because (v0.19.0):

time=2026-03-31T01:37:08.450+02:00 level=INFO source=sched.go:484 msg="system memory" total="64.0 GiB" free="2.4 GiB" free_swap="0 B"

Shows free 2.4 GiB, which matches meminfo:

MemTotal:       67108864 kB
MemFree:         2283532 kB
MemAvailable:   34857276 kB

Besides, as I mentioned above, it refused to run with 33 GiB available.

@aldem commented on GitHub (Mar 30, 2026): I am not sure it does, because (v0.19.0): ``` time=2026-03-31T01:37:08.450+02:00 level=INFO source=sched.go:484 msg="system memory" total="64.0 GiB" free="2.4 GiB" free_swap="0 B" ``` Shows free 2.4 GiB, which matches meminfo: ``` MemTotal: 67108864 kB MemFree: 2283532 kB MemAvailable: 34857276 kB ``` Besides, as I mentioned above, it refused to run with 33 GiB available.

GiteaMirror commented

2026-04-28 17:20:39 -05:00

@rick-github commented on GitHub (Mar 30, 2026):

31f968fe1f/discover/cpu_linux.go (L43)

Server logs will aid in debugging.

@rick-github commented on GitHub (Mar 30, 2026): https://github.com/ollama/ollama/blob/31f968fe1f0f774fe20ee0c64f749e90d54147fd/discover/cpu_linux.go#L43 [Server logs](https://docs.ollama.com/troubleshooting) will aid in debugging.

GiteaMirror commented

2026-04-28 17:20:40 -05:00

@aldem commented on GitHub (Mar 31, 2026):

I saw the code, but... Debugging. Ok. Lets see:

time=2026-03-31T02:04:14.212+02:00 level=INFO source=sched.go:484 msg="system memory" total="64.0 GiB" free="131.0 MiB" free_swap="0 B"
...
time=2026-03-31T02:04:14.212+02:00 level=WARN source=server.go:1046 msg="model request too large for system" requested="330.0 MiB" available="131.0 MiB" total="64.0 GiB" free="131.0 MiB" swap="0 B"
time=2026-03-31T02:04:14.212+02:00 level=INFO source=sched.go:511 msg="Load failed" model=/home/ollama/.ollama/models/blobs/sha256-33a8a1b6a1cbba662f292d32bb55f8d109c0e6cb02de2d243a1b70705ea20986 error="model requires more system memory (330.0 MiB) than is available (131.0 MiB)"

My cache is full, and meminfo is:

$ egrep Mem /proc/meminfo
MemTotal:       67108864 kB
MemFree:          131010 kB
MemAvailable:   35081704 kB

Does it look that it actually uses MemAvailable?

@aldem commented on GitHub (Mar 31, 2026): I saw the code, but... Debugging. Ok. Lets see: ``` time=2026-03-31T02:04:14.212+02:00 level=INFO source=sched.go:484 msg="system memory" total="64.0 GiB" free="131.0 MiB" free_swap="0 B" ... time=2026-03-31T02:04:14.212+02:00 level=WARN source=server.go:1046 msg="model request too large for system" requested="330.0 MiB" available="131.0 MiB" total="64.0 GiB" free="131.0 MiB" swap="0 B" time=2026-03-31T02:04:14.212+02:00 level=INFO source=sched.go:511 msg="Load failed" model=/home/ollama/.ollama/models/blobs/sha256-33a8a1b6a1cbba662f292d32bb55f8d109c0e6cb02de2d243a1b70705ea20986 error="model requires more system memory (330.0 MiB) than is available (131.0 MiB)" ``` My cache is full, and `meminfo` is: ``` $ egrep Mem /proc/meminfo MemTotal: 67108864 kB MemFree: 131010 kB MemAvailable: 35081704 kB ``` Does it look that it actually uses `MemAvailable`?

GiteaMirror commented

2026-04-28 17:20:40 -05:00

@rick-github commented on GitHub (Mar 31, 2026):

Does it look that it actually uses MemAvailable?

Hard to say. If only there were server logs to look at.

@rick-github commented on GitHub (Mar 31, 2026): > Does it look that it actually uses MemAvailable? Hard to say. If only there were [server logs](https://docs.ollama.com/troubleshooting) to look at.

GiteaMirror commented

2026-04-28 17:20:41 -05:00

@aldem commented on GitHub (Mar 31, 2026):

Sorry, which else server logs should I post? Or you mean that I have to post everything from the log, even unrelated to memory?

@aldem commented on GitHub (Mar 31, 2026): Sorry, which *else* server logs should I post? Or you mean that I have to post *everything* from the log, even unrelated to memory?

GiteaMirror commented

2026-04-28 17:20:41 -05:00

@rick-github commented on GitHub (Mar 31, 2026):

Server logs contains information about environment, device selection, model parameters, layer allocation etc which may or not be useful in debugging. Better to provide a full log that may have too much detail than 3 lines of log which may exclude relevant information.

@rick-github commented on GitHub (Mar 31, 2026): Server logs contains information about environment, device selection, model parameters, layer allocation etc which may or not be useful in debugging. Better to provide a full log that may have too much detail than 3 lines of log which may exclude relevant information.

GiteaMirror commented

2026-04-28 17:20:42 -05:00

@aldem commented on GitHub (Mar 31, 2026):

OK, the full log: https://gist.github.com/aldem/f3a3313d89fe83f8dbeb05564bcfec87

And failed attempt:

$ ./ollama run ordis/jina-embeddings-v2-base-code "some text here"; egrep Mem /proc/meminfo
Error: model requires more system memory (330.0 MiB) than is available (170.3 MiB)
MemTotal:       67108864 kB
MemFree:          154768 kB
MemAvailable:   35162983 kB

While the code should work, it doesn't... free memory in log is matching MemFree (with slight difference).

I am running it in LXC container (Proxmox 9, Debian 13) - not sure if this matters.

@aldem commented on GitHub (Mar 31, 2026): OK, the full log: https://gist.github.com/aldem/f3a3313d89fe83f8dbeb05564bcfec87 And failed attempt: ``` $ ./ollama run ordis/jina-embeddings-v2-base-code "some text here"; egrep Mem /proc/meminfo Error: model requires more system memory (330.0 MiB) than is available (170.3 MiB) MemTotal: 67108864 kB MemFree: 154768 kB MemAvailable: 35162983 kB ``` While the code should work, it doesn't... free memory in log is matching `MemFree` (with slight difference). I am running it in LXC container (Proxmox 9, Debian 13) - not sure if this matters.

GiteaMirror commented

2026-04-28 17:20:43 -05:00

@rick-github commented on GitHub (Mar 31, 2026):

What does the egrep return when it's run inside the LXC container?

@rick-github commented on GitHub (Mar 31, 2026): What does the `egrep` return when it's run inside the LXC container?

GiteaMirror commented

2026-04-28 17:20:44 -05:00

@aldem commented on GitHub (Mar 31, 2026):

ollama serve and ollama run ... with egrep are running in the same container, so it returns actual container data.

@aldem commented on GitHub (Mar 31, 2026): `ollama serve` and `ollama run ...` with `egrep` are running in the same container, so it returns actual container data.

GiteaMirror commented

2026-04-28 17:20:44 -05:00

@JordanLoehr commented on GitHub (Apr 7, 2026):

I noticed this too running ollama on k3s on a raspberry pi 5 (8gb) for testing.

Would get
model requires more system memory (7.3 GiB) than is available (2.4 GiB)

Despite /proc/meminfo showing:
MemAvailable: 7434880 kB

Looking at https://github.com/ollama/ollama/blob/main/discover/cpu_linux.go though, its not just always looking at MemAvailable, but it also does a pass that checks if it is in a cgroup (getCPUMemByCgroups(mem)) and overrides the MemAvailable with the cgroup values if present.

8c8f8f3450/discover/cpu_linux.go (L69-L79)

The problem with this is "/sys/fs/cgroup/memory.current" includes things such as the page cache, which MemAvailable doesn't.

On my system running in k3s this explains the discrepancy:

/sys/fs/cgroup/memory.max returns max, which causes getUint64ValueFromFile to error and keep the total from /proc/meminfo instead (8256640 kB), but /sys/fs/cgroup/memory.current returns 5825003520 (bytes, or 5825004 kB),

So 8256640-5825004 = 2431636 or 2.4GiB, which is what the original error is showing as free, even though MemAvailable is showing 7.4GiB free.

To get the cgroup value closer to what MemAvailable is, you have to subtract all the reclaimable memory from memory.current, most of which you can get from memory.stat.

eg:

available = total - (memory.current - (memory.stat.anon + memory.stat.kernel + memory.stat.slab_unreclaimable + memory.stat.kernel_stack + memory.stat.pagetables + memory.stat.sec_pagetables + memory.stat.sock + memory.stat.vmalloc)

However this isn't exactly the same as MemAvailable because that factors in some of the per zone low watermark values from /proc/zoneinfo. Or just copy what Kubernetes does https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/#memory-signals and use total - (memory.current - memory.stat.inactivefile) to get close enough.

tl;dr: if you are running in a cgroupv2 it's using the value from memory.current, not /proc/meminfo MemAvailable, to calculate the free memory, with the former including the page cache and other values.

@JordanLoehr commented on GitHub (Apr 7, 2026): I noticed this too running ollama on k3s on a raspberry pi 5 (8gb) for testing. Would get `model requires more system memory (7.3 GiB) than is available (2.4 GiB)` Despite /proc/meminfo showing: `MemAvailable: 7434880 kB` Looking at https://github.com/ollama/ollama/blob/main/discover/cpu_linux.go though, its not just always looking at MemAvailable, but it also does a pass that checks if it is in a cgroup (`getCPUMemByCgroups(mem)`) and overrides the MemAvailable with the cgroup values if present. https://github.com/ollama/ollama/blob/8c8f8f3450d39735355fc6cd7f2e436c8aa42ab1/discover/cpu_linux.go#L69-L79 The problem with this is `"/sys/fs/cgroup/memory.current"` includes things such as the page cache, which MemAvailable doesn't. On my system running in k3s this explains the discrepancy: `/sys/fs/cgroup/memory.max` returns `max`, which causes getUint64ValueFromFile to error and keep the total from /proc/meminfo instead (8256640 kB), but `/sys/fs/cgroup/memory.current` returns `5825003520` (bytes, or 5825004 kB), So 8256640-5825004 = 2431636 or 2.4GiB, which is what the original error is showing as free, even though MemAvailable is showing 7.4GiB free. To get the cgroup value closer to what MemAvailable is, you have to subtract all the reclaimable memory from memory.current, most of which you can get from memory.stat. eg: available = total - (memory.current - (memory.stat.anon + memory.stat.kernel + memory.stat.slab_unreclaimable + memory.stat.kernel_stack + memory.stat.pagetables + memory.stat.sec_pagetables + memory.stat.sock + memory.stat.vmalloc) However this isn't exactly the same as MemAvailable because that factors in some of the per zone low watermark values from /proc/zoneinfo. Or just copy what Kubernetes does https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/#memory-signals and use `total - (memory.current - memory.stat.inactivefile)` to get close enough. tl;dr: if you are running in a cgroupv2 it's using the value from memory.current, not /proc/meminfo MemAvailable, to calculate the free memory, with the former including the page cache and other values.

GiteaMirror commented

2026-04-28 17:20:45 -05:00

@aldem commented on GitHub (Apr 7, 2026):

@JordanLoehr Exactly, you have nailed it! 👍

@aldem commented on GitHub (Apr 7, 2026): @JordanLoehr Exactly, you have nailed it! 👍

GiteaMirror commented

2026-04-28 17:20:46 -05:00

@mirceanis commented on GitHub (Apr 17, 2026):

I'm facing a similar issue.
Running ollama in a container.
On 32GB RAM + 8GB VRAM I can run gemma4:26b-a4b @ 4 bit quantization with a 192k context window.
BUT, as soon as the model is shut down it won't restart

@mirceanis commented on GitHub (Apr 17, 2026): I'm facing a similar issue. Running ollama in a container. On 32GB RAM + 8GB VRAM I can run gemma4:26b-a4b @ 4 bit quantization with a 192k context window. BUT, as soon as the model is shut down it won't restart

GiteaMirror commented

2026-04-28 17:20:47 -05:00

@markasoftware-tc commented on GitHub (Apr 17, 2026):

Yep I believe my pr #13782 fixes this exact issue

@markasoftware-tc commented on GitHub (Apr 17, 2026): Yep I believe my pr #13782 fixes this exact issue

Sign in to join this conversation.

Branches Tags

main

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-remove-claude-desktop-launch

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#50889