[GH-ISSUE #5700] zfs ARC leads to incorrect system memory prediction and refusal to load models that could work #29314

Open
opened 2026-04-22 08:04:27 -05:00 by GiteaMirror · 14 comments
Owner

Originally created by @arthurmelton on GitHub (Jul 15, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/5700

Originally assigned to: @dhiltgen on GitHub.

I would like if there would be a flag to ignore this condition e9f7f36029/llm/server.go (L128-L135)

I use Truenas Scale to store my models and to run the models. It uses zfs as the filesystem, so that means that there is ARC using a lot the memory. I don't know what specifically Truenas does to do this, but they have the ARC size to behave like BSD where it naturally tries to use as much memory as possible. It will decrease if something else tries to use more ram though.

Ollama thus freaks out when I try and run a model that will make it OOM, but I actually do have enough memory. In my mind a flag would be the easiest to implement, but maybe it could try being smart and remove zfs arc from the calculations?

Originally created by @arthurmelton on GitHub (Jul 15, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/5700 Originally assigned to: @dhiltgen on GitHub. I would like if there would be a flag to ignore this condition https://github.com/ollama/ollama/blob/e9f7f3602961d2b0beaff27144ec89301c2173ca/llm/server.go#L128-L135 I use Truenas Scale to store my models and to run the models. It uses zfs as the filesystem, so that means that there is ARC using a lot the memory. I don't know what specifically Truenas does to do this, but they have the ARC size to behave like BSD where it naturally tries to use as much memory as possible. It will decrease if something else tries to use more ram though. Ollama thus freaks out when I try and run a model that will make it OOM, but I actually do have enough memory. In my mind a flag would be the easiest to implement, but maybe it could try being smart and remove zfs arc from the calculations?
GiteaMirror added the memoryfeature request labels 2026-04-22 08:04:27 -05:00
Author
Owner

@dhiltgen commented on GitHub (Jul 15, 2024):

Looking at https://github.com/openzfs/zfs/issues/10255 it seems a workaround until we find an optimal solution may be to set zfs_arc_sys_free to set aside now much memory you want for models on system memory.

<!-- gh-comment-id:2228868528 --> @dhiltgen commented on GitHub (Jul 15, 2024): Looking at https://github.com/openzfs/zfs/issues/10255 it seems a workaround until we find an optimal solution may be to set `zfs_arc_sys_free` to set aside now much memory you want for models on system memory.
Author
Owner

@AncientMystic commented on GitHub (Jul 27, 2024):

Another option would be to add/edit

/etc/modprobe.d/zfs.conf

Add

options zfs zfs_arc_max=1073741824

(1073741824 = 1gb)

To limit arc memory usage (multiply by # of gb, by default it will use up to 50% of system ram)

Run:

update-initramfs -u -k all

Reboot & profit

You can also specify an cache device and set ram arc to metadata only (best to have a UPS if using SLOG/write cache)

<!-- gh-comment-id:2254120130 --> @AncientMystic commented on GitHub (Jul 27, 2024): Another option would be to add/edit /etc/modprobe.d/zfs.conf Add options zfs zfs_arc_max=1073741824 (1073741824 = 1gb) To limit arc memory usage (multiply by # of gb, by default it will use up to 50% of system ram) Run: update-initramfs -u -k all Reboot & profit You can also specify an cache device and set ram arc to metadata only (best to have a UPS if using SLOG/write cache)
Author
Owner

@Samega7Cattac commented on GitHub (Aug 8, 2024):

Another workaround is to enable swap.
Bc of a bug Truenas devs disabled the swap.
To manually enable it u need to use:

swapon -d=once /dev/mapper/md127

Loaded llama3.1 even with the dashboard saying I had 2.9Gb free.
This way there's no need to limit the zfs cache.

<!-- gh-comment-id:2276466573 --> @Samega7Cattac commented on GitHub (Aug 8, 2024): Another workaround is to enable swap. Bc of a bug Truenas devs disabled the swap. To manually enable it u need to use: ``` swapon -d=once /dev/mapper/md127 ``` Loaded llama3.1 even with the dashboard saying I had 2.9Gb free. This way there's no need to limit the zfs cache.
Author
Owner

@atropos112 commented on GitHub (Aug 24, 2024):

I am sorry, but as I read the PR and the issue here I am somewhat confused, I suspect there is some context I am unaware of.

Is the decision on this to have a better error message for some while making the software unusable for others? Is the number of people using ollama and ZFS low enough for this to be worth it? If yes that's fair, if no then why not just revert that change until a fix exists.

Using the above swap solution or zfs_arc_sys_free are limiting and performance impacting for the whole system, just to make a error message go away, that feels very wrong because it is.

<!-- gh-comment-id:2308401363 --> @atropos112 commented on GitHub (Aug 24, 2024): I am sorry, but as I read the PR and the issue here I am somewhat confused, I suspect there is some context I am unaware of. Is the decision on this to have a better error message for some while making the software unusable for others? Is the number of people using ollama and ZFS low enough for this to be worth it? If yes that's fair, if no then why not just revert that change until a fix exists. Using the above swap solution or `zfs_arc_sys_free` are limiting and performance impacting for the whole system, just to make a error message go away, that feels very wrong because it is.
Author
Owner

@RomeSilvanus commented on GitHub (Sep 30, 2024):

I second this, it’s really annoying.

<!-- gh-comment-id:2382207310 --> @RomeSilvanus commented on GitHub (Sep 30, 2024): I second this, it’s really annoying.
Author
Owner

@palves commented on GitHub (Dec 7, 2024):

Ran into this on my system with 128GB RAM, with plenty of it free.

I'm not really sure why checking the memory upfront is a good idea, rather than just letting the model run and fail if indeed here is not enough memory, but ...

Note that several tools take the ZFS ARC cache memory into account when computing "free" memory. For example, the ubiquitous htop does it.

I think that if the memory prediction check is to stay, than making it take into account the ZFS ARC cache size would be ideal.

See here:

https://forum.proxmox.com/threads/zfs-ram-usage-more-than-90-of-the-system.74273/
https://www.reddit.com/r/zfs/comments/mvg0ui/how_do_you_determine_actual_ram_usage_on_zfs/

From first link above:
"ZFS will however always yield ARC memory to other processes if needed, so the ARC should never lead to out-of-memory situations."

E.g., on my system, I get:

$ cat /proc/spl/kstat/zfs/arcstats | grep "^size"
size 4 67526426312

That last field is the ARC cache size in bytes. As ZFS releases ARC cache if needed transparently, you can consider it free memory. So, the solution would be just to extract the ARC size from /proc/spl/kstat/zfs/arcstats, and add it to:

available := systemFreeMemory + systemSwapFreeMemory

<!-- gh-comment-id:2525079641 --> @palves commented on GitHub (Dec 7, 2024): Ran into this on my system with 128GB RAM, with plenty of it free. I'm not really sure why checking the memory upfront is a good idea, rather than just letting the model run and fail if indeed here is not enough memory, but ... Note that several tools take the ZFS ARC cache memory into account when computing "free" memory. For example, the ubiquitous htop does it. I think that if the memory prediction check is to stay, than making it take into account the ZFS ARC cache size would be ideal. See here: https://forum.proxmox.com/threads/zfs-ram-usage-more-than-90-of-the-system.74273/ https://www.reddit.com/r/zfs/comments/mvg0ui/how_do_you_determine_actual_ram_usage_on_zfs/ From first link above: "ZFS will however always yield ARC memory to other processes if needed, so the ARC should never lead to out-of-memory situations." E.g., on my system, I get: $ cat /proc/spl/kstat/zfs/arcstats | grep "^size" size 4 67526426312 That last field is the ARC cache size in bytes. As ZFS releases ARC cache if needed transparently, you can consider it free memory. So, the solution would be just to extract the ARC size from /proc/spl/kstat/zfs/arcstats, and add it to: available := systemFreeMemory + systemSwapFreeMemory
Author
Owner

@arthurmelton commented on GitHub (Dec 24, 2024):

@dhiltgen, if there is no desire to take out arc memory from the system used memory, would a flag to ignore the OOM check suffice? I know it would not give a clean result but I feel that for now atleast it would be the best solution.

<!-- gh-comment-id:2561477995 --> @arthurmelton commented on GitHub (Dec 24, 2024): @dhiltgen, if there is no desire to take out arc memory from the system used memory, would a flag to ignore the OOM check suffice? I know it would not give a clean result but I feel that for now atleast it would be the best solution.
Author
Owner

@justinkahrs commented on GitHub (Feb 13, 2025):

Any progress/update on this issue? Or other workarounds? I'd rather not modify my ZFS cache limit or enable swap when it's causing TrueNAS issues...

<!-- gh-comment-id:2656876409 --> @justinkahrs commented on GitHub (Feb 13, 2025): Any progress/update on this issue? Or other workarounds? I'd rather not modify my ZFS cache limit or enable swap when it's causing TrueNAS issues...
Author
Owner

@daemon-byte commented on GitHub (Feb 18, 2025):

I was really hoping for a flag to disable this check but I guess not :(

<!-- gh-comment-id:2666962384 --> @daemon-byte commented on GitHub (Feb 18, 2025): I was really hoping for a flag to disable this check but I guess not :(
Author
Owner

@k3d3 commented on GitHub (May 4, 2025):

Yeah, I would honestly be happy to just have an "I know what I am doing" flag that lets me override this behaviour. I also have 128GB of ram, and because of ZFS ARC, I get an error saying I only have 10GB available, despite running nothing else.

<!-- gh-comment-id:2848918812 --> @k3d3 commented on GitHub (May 4, 2025): Yeah, I would honestly be happy to just have an "I know what I am doing" flag that lets me override this behaviour. I also have 128GB of ram, and because of ZFS ARC, I get an error saying I only have 10GB available, despite running nothing else.
Author
Owner

@KostyalBalint commented on GitHub (May 10, 2025):

What if we put an env variable which would allow us to disable this memory precheck. That could be an easy fix for this issue. In my understanding the model can be properly loaded on a ZFS system even when there is not enough memory, because of ZFS cache

<!-- gh-comment-id:2868776106 --> @KostyalBalint commented on GitHub (May 10, 2025): What if we put an env variable which would allow us to disable this memory precheck. That could be an easy fix for this issue. In my understanding the model can be properly loaded on a ZFS system even when there is not enough memory, because of ZFS cache
Author
Owner

@Ejmathewp commented on GitHub (Aug 8, 2025):

It is bonkers to tell people running ZFS with 128GB RAM they are out of luck unless they handicap the core of their system because you want to prevent people from trying to load a 50 GB model onto 16 GB RAM.

In https://github.com/ollama/ollama/issues/10920 the team states "Unfortunately some knowledge on behalf of the user is expected, such that informed decisions are made." - Stick to that!

<!-- gh-comment-id:3166796755 --> @Ejmathewp commented on GitHub (Aug 8, 2025): It is bonkers to tell people running ZFS with 128GB RAM they are out of luck unless they handicap the core of their system because you want to prevent people from trying to load a 50 GB model onto 16 GB RAM. In https://github.com/ollama/ollama/issues/10920 the team states "Unfortunately some knowledge on behalf of the user is expected, such that informed decisions are made." - Stick to that!
Author
Owner

@arunskurian commented on GitHub (Oct 14, 2025):

We have been using the ZFS mem check fix changes in this PR for a few months. Pushing this upstream in case these changes are valuable for the broader community. There are a couple of additions in this PR

  1. Calculates the evictable memory thats allocated to ZFS Arc Cache.
  2. For Advanced users and for development, allow an environment variable based control over the memory check.

cc @dhiltgen

<!-- gh-comment-id:3403253995 --> @arunskurian commented on GitHub (Oct 14, 2025): We have been using the ZFS mem check fix changes in this [PR](https://github.com/ollama/ollama/pull/12616) for a few months. Pushing this upstream in case these changes are valuable for the broader community. There are a couple of additions in this PR 1. Calculates the evictable memory thats allocated to ZFS Arc Cache. 2. For Advanced users and for development, allow an environment variable based control over the memory check. cc @dhiltgen
Author
Owner

@87fox87 commented on GitHub (Nov 15, 2025):

It seems the problem was fixed somewhere down the road, as in ollama version 1.1.38 (based on ollama 0.12.10) on truenas 25.10.0 i had zero issues.
With the most recent update 1.1.39 (based on ollama 0.12.11) it started complaining about the lack of system memory (error 500).
So something in said update patched the problem back in it seems.

#13095

<!-- gh-comment-id:3536251312 --> @87fox87 commented on GitHub (Nov 15, 2025): It seems the problem was fixed somewhere down the road, as in ollama version 1.1.38 (based on ollama 0.12.10) on truenas 25.10.0 i had zero issues. With the most recent update 1.1.39 (based on ollama 0.12.11) it started complaining about the lack of system memory (error 500). So something in said update patched the problem back in it seems. #13095
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#29314