[PR #15961] discover: skip cgroup FreeMemory override when memory.max is unlimited #77669

Open
opened 2026-05-05 10:20:41 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15961
Author: @hknet
Created: 5/4/2026
Status: 🔄 Open

Base: mainHead: fix-15650-cgroup-mem-override-page-cache


📝 Commits (1)

  • 4c8f8ce discover: skip cgroup FreeMemory override when memory.max is unlimited

📊 Changes

1 file changed (+10 additions, -3 deletions)

View changed files

📝 discover/cpu_linux.go (+10 -3)

📄 Description

When memory.max is "max" (cgroup v2 default for containers without an explicit memory cap), getCPUMemByCgroups still overwrote mem.FreeMemory with TotalMemory - memory.current. Because memory.current counts the cgroup's reclaimable page cache against "used", this clobbers the MemAvailable value already read from /proc/meminfo and produces a small, MemFree-equivalent figure once mmap'd model blobs have aged into pagecache. Subsequent model loads then fail with "model requires more system memory than is available" even though tens of GiB are freely reclaimable.

Guard the FreeMemory override on a successful parse of memory.max so it only takes effect when a real numeric cgroup limit is present (Docker or k8s with explicit --memory caps). Containers without a hard limit fall back to the correct /proc/meminfo MemAvailable value, which already accounts for reclaimable cache.

memory.max parsing also reuses errMax instead of a shared err so the guard cannot accidentally trigger if memory.current happens to parse while memory.max did not.

Reproduction in an unprivileged-but-uncapped LXC (memory.max="max") on a 32 GiB host: mmap roughly 28 GiB of model blobs to inflate memory.current. Pre-patch, ollama logs:

sched.go msg="system memory" total="32.0 GiB" free="3.7 GiB"
sched.go msg="Load failed" error="model requires more system memory
(1.1 GiB) than is available (3.7 GiB)"

(eventually dropping to <1 GiB free as cache grows). Post-patch with the same pagecache pressure:

sched.go msg="system memory" total="32.0 GiB" free="31.7 GiB"

and the load succeeds, because the kernel's MemAvailable already counts inactive_file as reclaimable.

Fixes #15650.

The bug

MemAvailable in /proc/meminfo is computed by the kernel as roughly MemFree + reclaimable_file + reclaimable_slab, capped at MemTotal. It is the field intended for "how much memory can a new allocation get without going to swap or OOM" and is what getCPUMem() already reads.

getCPUMemByCgroups() is then called and unconditionally rewrites mem.FreeMemory with mem.TotalMemory - memory.current if memory.current is readable. In cgroup v2, memory.current is the cgroup's full charged memory, including:

  • anonymous pages (memory.stat:anon)
  • file cache (memory.stat:file, of which inactive_file is trivially reclaimable)
  • kernel slab, sockets, etc.

So TotalMemory - memory.current is essentially MemFree, not MemAvailable. Anywhere the cgroup has accumulated reclaimable cache — which in Ollama's case is exactly the mmap'd GGUF blobs from previous model loads — this number is a wild underestimate.

When memory.max is "max" (the default for many container setups including unprivileged-but-uncapped LXC and rootful Podman without --memory), ParseUint("max") errors and mem.TotalMemory is left as the kernel's view, but the second branch still rewrites FreeMemory because memory.current parses fine. The result: the carefully computed MemAvailable from getCPUMem() is silently clobbered.

Real-world impact

Observed in production on a 32 GiB unprivileged LXC running Ollama plus several llama.cpp sidecars on a shared GPU. After about 30 GiB of GGUF blobs had been mmap'd by the active sidecar and idle Ollama models, attempting to load gpt-oss:20b failed:

sched.go:484 msg="system memory" total="32.0 GiB" free="997.6 MiB" free_swap="0 B"
device.go:245 msg="model weights" device=CPU size="1.1 GiB"
sched.go:511 msg="Load failed" error="model requires more system memory (1.1 GiB) than is available (997.6 MiB)"

MemAvailable from /proc/meminfo at the same instant: ~33 GiB. inactive_file alone (trivially reclaimable): 21 GiB.

This pattern is hit by anyone running Ollama in:

  • unprivileged-but-uncapped LXC / LXD containers
  • rootful Podman without --memory
  • Docker run without --memory
  • Kubernetes pods without resources.limits.memory
  • bare-metal hosts where /sys/fs/cgroup/memory.max happens to be readable but is "max" (e.g. systemd's user/system slices)

Other reports of the same symptom: #15650 (the tracking issue this PR fixes), #10256, #7942, #7423, #8667.

The fix

func getCPUMemByCgroups(mem memInfo) memInfo {
    total, errMax := getUint64ValueFromFile("/sys/fs/cgroup/memory.max")
    if errMax == nil {
        mem.TotalMemory = total
    }
    used, err := getUint64ValueFromFile("/sys/fs/cgroup/memory.current")
    if err == nil && errMax == nil {
        mem.FreeMemory = mem.TotalMemory - used
    }
    return mem
}

Two changes:

  1. memory.max parsing now uses its own errMax so the second branch can inspect it independently.
  2. The FreeMemory rewrite is gated on errMax == nil, i.e. only when memory.max is a real numeric value.

When memory.max == "max", ParseUint errors, errMax != nil, and FreeMemory is left as the kernel's MemAvailable from /proc/meminfo. When memory.max is a real cap (Docker / k8s with explicit --memory), behavior is unchanged: the cgroup is the real ceiling, and total - current is the appropriate "free" inside it.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15961 **Author:** [@hknet](https://github.com/hknet) **Created:** 5/4/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `fix-15650-cgroup-mem-override-page-cache` --- ### 📝 Commits (1) - [`4c8f8ce`](https://github.com/ollama/ollama/commit/4c8f8ce8d448d41ce10d30f4380c84c989f9859a) discover: skip cgroup FreeMemory override when memory.max is unlimited ### 📊 Changes **1 file changed** (+10 additions, -3 deletions) <details> <summary>View changed files</summary> 📝 `discover/cpu_linux.go` (+10 -3) </details> ### 📄 Description When memory.max is "max" (cgroup v2 default for containers without an explicit memory cap), getCPUMemByCgroups still overwrote mem.FreeMemory with TotalMemory - memory.current. Because memory.current counts the cgroup's reclaimable page cache against "used", this clobbers the MemAvailable value already read from /proc/meminfo and produces a small, MemFree-equivalent figure once mmap'd model blobs have aged into pagecache. Subsequent model loads then fail with "model requires more system memory than is available" even though tens of GiB are freely reclaimable. Guard the FreeMemory override on a successful parse of memory.max so it only takes effect when a real numeric cgroup limit is present (Docker or k8s with explicit --memory caps). Containers without a hard limit fall back to the correct /proc/meminfo MemAvailable value, which already accounts for reclaimable cache. memory.max parsing also reuses errMax instead of a shared err so the guard cannot accidentally trigger if memory.current happens to parse while memory.max did not. Reproduction in an unprivileged-but-uncapped LXC (memory.max="max") on a 32 GiB host: mmap roughly 28 GiB of model blobs to inflate memory.current. Pre-patch, ollama logs: sched.go msg="system memory" total="32.0 GiB" free="3.7 GiB" sched.go msg="Load failed" error="model requires more system memory (1.1 GiB) than is available (3.7 GiB)" (eventually dropping to <1 GiB free as cache grows). Post-patch with the same pagecache pressure: sched.go msg="system memory" total="32.0 GiB" free="31.7 GiB" and the load succeeds, because the kernel's MemAvailable already counts inactive_file as reclaimable. Fixes #15650. ## The bug `MemAvailable` in `/proc/meminfo` is computed by the kernel as roughly `MemFree + reclaimable_file + reclaimable_slab`, capped at `MemTotal`. It is the field intended for "how much memory can a new allocation get without going to swap or OOM" and is what `getCPUMem()` already reads. `getCPUMemByCgroups()` is then called and unconditionally rewrites `mem.FreeMemory` with `mem.TotalMemory - memory.current` if `memory.current` is readable. In cgroup v2, `memory.current` is the cgroup's full charged memory, including: - anonymous pages (`memory.stat:anon`) - file cache (`memory.stat:file`, of which `inactive_file` is trivially reclaimable) - kernel slab, sockets, etc. So `TotalMemory - memory.current` is essentially `MemFree`, not `MemAvailable`. Anywhere the cgroup has accumulated reclaimable cache — which in Ollama's case is exactly the mmap'd GGUF blobs from previous model loads — this number is a wild underestimate. When `memory.max` is `"max"` (the default for many container setups including unprivileged-but-uncapped LXC and rootful Podman without `--memory`), `ParseUint("max")` errors and `mem.TotalMemory` is left as the kernel's view, but the second branch still rewrites `FreeMemory` because `memory.current` parses fine. The result: the carefully computed `MemAvailable` from `getCPUMem()` is silently clobbered. ## Real-world impact Observed in production on a 32 GiB unprivileged LXC running Ollama plus several llama.cpp sidecars on a shared GPU. After about 30 GiB of GGUF blobs had been mmap'd by the active sidecar and idle Ollama models, attempting to load `gpt-oss:20b` failed: ``` sched.go:484 msg="system memory" total="32.0 GiB" free="997.6 MiB" free_swap="0 B" device.go:245 msg="model weights" device=CPU size="1.1 GiB" sched.go:511 msg="Load failed" error="model requires more system memory (1.1 GiB) than is available (997.6 MiB)" ``` `MemAvailable` from `/proc/meminfo` at the same instant: ~33 GiB. `inactive_file` alone (trivially reclaimable): 21 GiB. This pattern is hit by anyone running Ollama in: - unprivileged-but-uncapped LXC / LXD containers - rootful Podman without `--memory` - Docker run without `--memory` - Kubernetes pods without `resources.limits.memory` - bare-metal hosts where `/sys/fs/cgroup/memory.max` happens to be readable but is `"max"` (e.g. systemd's user/system slices) Other reports of the same symptom: #15650 (the tracking issue this PR fixes), #10256, #7942, #7423, #8667. ## The fix ```go func getCPUMemByCgroups(mem memInfo) memInfo { total, errMax := getUint64ValueFromFile("/sys/fs/cgroup/memory.max") if errMax == nil { mem.TotalMemory = total } used, err := getUint64ValueFromFile("/sys/fs/cgroup/memory.current") if err == nil && errMax == nil { mem.FreeMemory = mem.TotalMemory - used } return mem } ``` Two changes: 1. `memory.max` parsing now uses its own `errMax` so the second branch can inspect it independently. 2. The `FreeMemory` rewrite is gated on `errMax == nil`, i.e. only when `memory.max` is a real numeric value. When `memory.max == "max"`, `ParseUint` errors, `errMax != nil`, and `FreeMemory` is left as the kernel's `MemAvailable` from `/proc/meminfo`. When `memory.max` is a real cap (Docker / k8s with explicit `--memory`), behavior is unchanged: the cgroup is the real ceiling, and `total - current` is the appropriate "free" inside it. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 10:20:41 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#77669