[GH-ISSUE #13095] error 500 not enough system memory. Truenas 25.10.0 #8667

Open
opened 2026-04-12 21:25:48 -05:00 by GiteaMirror · 18 comments
Owner

Originally created by @87fox87 on GitHub (Nov 15, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13095

What is the issue?

I am running truenas 25.10.0 with 64GB ram which worked great so far.

Truenas has a habit of building up cache in the ram tho and after i updated ollama to 1.1.39 yesterday, it lost the ability to free up ram delegating said cache instead throwing an error.

Relevant log output


OS

Linux

GPU

AMD

CPU

AMD

Ollama version

1.1.39

Originally created by @87fox87 on GitHub (Nov 15, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13095 ### What is the issue? I am running truenas 25.10.0 with 64GB ram which worked great so far. Truenas has a habit of building up cache in the ram tho and after i updated ollama to 1.1.39 yesterday, it lost the ability to free up ram delegating said cache instead throwing an error. ### Relevant log output ```shell ``` ### OS Linux ### GPU AMD ### CPU AMD ### Ollama version 1.1.39
GiteaMirror added the bug label 2026-04-12 21:25:48 -05:00
Author
Owner

@rick-github commented on GitHub (Nov 15, 2025):

#5700

<!-- gh-comment-id:3536239284 --> @rick-github commented on GitHub (Nov 15, 2025): #5700
Author
Owner

@87fox87 commented on GitHub (Nov 15, 2025):

@rick-github

Thank you for your answer.

This might be related. But in fact this time it's oviously got to do with the latest 1.1.39 update.

After i rolled back to 1.1.38 the issue was gone.

So i have a feeling just pointing me to issue 5700 won't solve the problem.

<!-- gh-comment-id:3536248121 --> @87fox87 commented on GitHub (Nov 15, 2025): @rick-github Thank you for your answer. This might be related. But in fact this time it's oviously got to do with the latest 1.1.39 update. After i rolled back to 1.1.38 the issue was gone. So i have a feeling just pointing me to issue 5700 won't solve the problem.
Author
Owner

@rick-github commented on GitHub (Nov 15, 2025):

Where are you getting the version number from?

<!-- gh-comment-id:3536301208 --> @rick-github commented on GitHub (Nov 15, 2025): Where are you getting the version number from?
Author
Owner

@87fox87 commented on GitHub (Nov 15, 2025):

I installed via the truenas appmarket.
That's the version number it shows there.

Ah now i see where you are getting at. Upstream it seems to be corresponding with 0.12.11

The last working version would be 0.12.10 according to my install.

Image

Image

<!-- gh-comment-id:3536377739 --> @87fox87 commented on GitHub (Nov 15, 2025): I installed via the truenas appmarket. That's the version number it shows there. Ah now i see where you are getting at. Upstream it seems to be corresponding with 0.12.11 The last working version would be 0.12.10 according to my install. ![Image](https://github.com/user-attachments/assets/0bfd5c16-bce7-4ece-972a-f1b52105a74f) ![Image](https://github.com/user-attachments/assets/81240d8d-bfb0-4dc3-bceb-9fc1288e5864)
Author
Owner

@rick-github commented on GitHub (Nov 15, 2025):

Can you retrieve the server log from both versions, including when an error occurs?

<!-- gh-comment-id:3536384975 --> @rick-github commented on GitHub (Nov 15, 2025): Can you retrieve the server log from both versions, including when an error occurs?
Author
Owner

@87fox87 commented on GitHub (Nov 15, 2025):

@rick-github
okay. problem does persist. and i already deleted the outdated instance. dunno how to reinstall an older version.

do you mean docker logs? docker logs ix-ollama-ollama-1?

btw thats the error i get in open-webui:

Image

<!-- gh-comment-id:3536541570 --> @87fox87 commented on GitHub (Nov 15, 2025): @rick-github okay. problem does persist. and i already deleted the outdated instance. dunno how to reinstall an older version. do you mean docker logs? docker logs ix-ollama-ollama-1? btw thats the error i get in open-webui: ![Image](https://github.com/user-attachments/assets/e972412d-d8a0-4095-ad1c-a1cebc2df1d6)
Author
Owner

@rick-github commented on GitHub (Nov 15, 2025):

do you mean docker logs? docker logs ix-ollama-ollama-1?

Seems likely.

<!-- gh-comment-id:3536545345 --> @rick-github commented on GitHub (Nov 15, 2025): > do you mean docker logs? docker logs ix-ollama-ollama-1? Seems likely.
Author
Owner

@87fox87 commented on GitHub (Nov 15, 2025):

thats the part of the log when the error happens:

time=2025-11-15T14:22:49.355Z level=INFO source=server.go:209 msg="enabling flash attention"

time=2025-11-15T14:22:49.355Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-e7b273f9636059a689e3ddcab3716e4f65abe0143ac978e46673ad0e52d09efb --port 38385"
time=2025-11-15T14:22:49.355Z level=INFO source=sched.go:443 msg="system memory" total="62.2 GiB" free="3.7 GiB" free_swap="0 B"
time=2025-11-15T14:22:49.355Z level=INFO source=server.go:702 msg="loading model" "model layers"=25 requested=-1
time=2025-11-15T14:22:49.367Z level=INFO source=runner.go:1398 msg="starting ollama engine"
time=2025-11-15T14:22:49.368Z level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:38385"
time=2025-11-15T14:22:49.378Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-11-15T14:22:49.429Z level=INFO source=ggml.go:136 msg="" architecture=gptoss file_type=MXFP4 name="" description="" num_tensors=459 num_key_values=32
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so
time=2025-11-15T14:22:49.434Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
time=2025-11-15T14:22:49.456Z level=WARN source=server.go:989 msg="model request too large for system" requested="13.1 GiB" available="3.7 GiB" total="62.2 GiB" free="3.7 GiB" swap="0 B"
time=2025-11-15T14:22:49.457Z level=INFO source=runner.go:1271 msg=load request="{Operation:close LoraPath:[] Parallel:0 BatchSize:0 FlashAttention:false KvSize:0 KvCacheType: NumThreads:0 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-11-15T14:22:49.457Z level=INFO source=device.go:245 msg="model weights" device=CPU size="12.8 GiB"
time=2025-11-15T14:22:49.457Z level=INFO source=device.go:256 msg="kv cache" device=CPU size="192.0 MiB"
time=2025-11-15T14:22:49.457Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="94.8 MiB"
time=2025-11-15T14:22:49.457Z level=INFO source=device.go:272 msg="total memory" size="13.1 GiB"
time=2025-11-15T14:22:49.457Z level=INFO source=sched.go:470 msg="Load failed" model=/root/.ollama/models/blobs/sha256-e7b273f9636059a689e3ddcab3716e4f65abe0143ac978e46673ad0e52d09efb error="model requires more system memory (13.1 GiB) than is available (3.7 GiB)"
[GIN] 2025/11/15 - 14:22:49 | 500 | 450.397461ms | 172.16.2.3 | POST "/api/chat"
[GIN] 2025/11/15 - 14:23:08 | 200 | 795.758µs | 172.16.2.3 | GET "/api/tags"
[GIN] 2025/11/15 - 14:23:08 | 200 | 17.102µs | 172.16.2.3 | GET "/api/ps"
[GIN] 2025/11/15 - 14:25:13 | 200 | 867.242µs | 172.16.2.3 | GET "/api/tags"
[GIN] 2025/11/15 - 14:25:13 | 200 | 13.746µs | 172.16.2.3 | GET "/api/ps"
[GIN] 2025/11/15 - 14:25:43 | 200 | 667.837µs | 172.16.2.3 | GET "/api/tags"
[GIN] 2025/11/15 - 14:25:43 | 200 | 13.546µs | 172.16.2.3 | GET "/api/ps"

So in version 0.12.10 i guess here it would start freing up space from the zfs cache. instead it just stops working with this error.

<!-- gh-comment-id:3536561741 --> @87fox87 commented on GitHub (Nov 15, 2025): thats the part of the log when the error happens: > time=2025-11-15T14:22:49.355Z level=INFO source=server.go:209 msg="enabling flash attention" > > time=2025-11-15T14:22:49.355Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-e7b273f9636059a689e3ddcab3716e4f65abe0143ac978e46673ad0e52d09efb --port 38385" > time=2025-11-15T14:22:49.355Z level=INFO source=sched.go:443 msg="system memory" total="62.2 GiB" free="3.7 GiB" free_swap="0 B" > time=2025-11-15T14:22:49.355Z level=INFO source=server.go:702 msg="loading model" "model layers"=25 requested=-1 > time=2025-11-15T14:22:49.367Z level=INFO source=runner.go:1398 msg="starting ollama engine" > time=2025-11-15T14:22:49.368Z level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:38385" > time=2025-11-15T14:22:49.378Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" > time=2025-11-15T14:22:49.429Z level=INFO source=ggml.go:136 msg="" architecture=gptoss file_type=MXFP4 name="" description="" num_tensors=459 num_key_values=32 > load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so > time=2025-11-15T14:22:49.434Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) > time=2025-11-15T14:22:49.456Z level=WARN source=server.go:989 msg="model request too large for system" requested="13.1 GiB" available="3.7 GiB" total="62.2 GiB" free="3.7 GiB" swap="0 B" > time=2025-11-15T14:22:49.457Z level=INFO source=runner.go:1271 msg=load request="{Operation:close LoraPath:[] Parallel:0 BatchSize:0 FlashAttention:false KvSize:0 KvCacheType: NumThreads:0 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" > time=2025-11-15T14:22:49.457Z level=INFO source=device.go:245 msg="model weights" device=CPU size="12.8 GiB" > time=2025-11-15T14:22:49.457Z level=INFO source=device.go:256 msg="kv cache" device=CPU size="192.0 MiB" > time=2025-11-15T14:22:49.457Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="94.8 MiB" > time=2025-11-15T14:22:49.457Z level=INFO source=device.go:272 msg="total memory" size="13.1 GiB" > time=2025-11-15T14:22:49.457Z level=INFO source=sched.go:470 msg="Load failed" model=/root/.ollama/models/blobs/sha256-e7b273f9636059a689e3ddcab3716e4f65abe0143ac978e46673ad0e52d09efb **error="model requires more system memory (13.1 GiB) than is available (3.7 GiB)"** > [GIN] 2025/11/15 - 14:22:49 | 500 | 450.397461ms | 172.16.2.3 | POST "/api/chat" > [GIN] 2025/11/15 - 14:23:08 | 200 | 795.758µs | 172.16.2.3 | GET "/api/tags" > [GIN] 2025/11/15 - 14:23:08 | 200 | 17.102µs | 172.16.2.3 | GET "/api/ps" > [GIN] 2025/11/15 - 14:25:13 | 200 | 867.242µs | 172.16.2.3 | GET "/api/tags" > [GIN] 2025/11/15 - 14:25:13 | 200 | 13.746µs | 172.16.2.3 | GET "/api/ps" > [GIN] 2025/11/15 - 14:25:43 | 200 | 667.837µs | 172.16.2.3 | GET "/api/tags" > [GIN] 2025/11/15 - 14:25:43 | 200 | 13.546µs | 172.16.2.3 | GET "/api/ps" So in version 0.12.10 i guess here it would start freing up space from the zfs cache. instead it just stops working with this error.
Author
Owner

@rick-github commented on GitHub (Nov 16, 2025):

ollama doesn't do anything with the zfs cache, that's what #5700 is about. I don't have a Truenas machine to test with, but I'm pretty sure that there's no change in behaviour with respect to determining the amount of free memory between 0.12.10 and 0.12.11. However, if 0.12.10 seems to work, then sticking with that seems the easiest work-around in the short term.

<!-- gh-comment-id:3538945007 --> @rick-github commented on GitHub (Nov 16, 2025): ollama doesn't do anything with the zfs cache, that's what #5700 is about. I don't have a Truenas machine to test with, but I'm pretty sure that there's no change in behaviour with respect to determining the amount of free memory between 0.12.10 and 0.12.11. However, if 0.12.10 seems to work, then sticking with that seems the easiest work-around in the short term.
Author
Owner

@87fox87 commented on GitHub (Nov 16, 2025):

You are right. After further testing it seems it was mere coincidence that there was enough free memory before on my system and after the update there was not.

Would it be an option to just comment the memory check out in the code? truenas seems to be programmed that it frees up it's cache automatically to serve to the needs of the apps running. So if it wasn't for that memory check it would already run perfectly.

Thanks in advance.

<!-- gh-comment-id:3538954906 --> @87fox87 commented on GitHub (Nov 16, 2025): You are right. After further testing it seems it was mere coincidence that there was enough free memory before on my system and after the update there was not. Would it be an option to just comment the memory check out in the code? truenas seems to be programmed that it frees up it's cache automatically to serve to the needs of the apps running. So if it wasn't for that memory check it would already run perfectly. Thanks in advance.
Author
Owner

@timschmidt commented on GitHub (Nov 16, 2025):

Adding a note that I have also run into this issue. Tracked the error down to this area of the code:

dd0ed0ef17/llm/server.go (L989)

<!-- gh-comment-id:3538974441 --> @timschmidt commented on GitHub (Nov 16, 2025): Adding a note that I have also run into this issue. Tracked the error down to this area of the code: https://github.com/ollama/ollama/blob/dd0ed0ef172cdc270ef062ac764a58780c5c8093/llm/server.go#L989
Author
Owner

@rick-github commented on GitHub (Nov 16, 2025):

Would it be an option to just comment the memory check out in the code?

If you want to build the program yourself, sure. Be aware that the amount of memory that ARC will release for eviction is rate limited with the zfs_arc_shrinker_limit option, and so skipping the memory check and doing a model load may still fail because there is not enough free memory.

A manual workaround (if you have root access on the machine) is to disable rate limiting and then flush the cache before doing a model load.

echo 0 > /sys/module/zfs/parameters/zfs_arc_shrinker_limit
echo 3 > /proc/sys/vm/drop_caches
<!-- gh-comment-id:3538977037 --> @rick-github commented on GitHub (Nov 16, 2025): > Would it be an option to just comment the memory check out in the code? If you want to build the program yourself, sure. Be aware that the amount of memory that ARC will release for eviction is rate limited with the `zfs_arc_shrinker_limit` option, and so skipping the memory check and doing a model load may still fail because there is not enough free memory. A manual workaround (if you have root access on the machine) is to disable rate limiting and then flush the cache before doing a model load. ``` echo 0 > /sys/module/zfs/parameters/zfs_arc_shrinker_limit echo 3 > /proc/sys/vm/drop_caches ```
Author
Owner

@87fox87 commented on GitHub (Nov 16, 2025):

So there are no plans of making a version that will be compatible with truenas zfs cache system?

<!-- gh-comment-id:3538983638 --> @87fox87 commented on GitHub (Nov 16, 2025): So there are no plans of making a version that will be compatible with truenas zfs cache system?
Author
Owner

@rick-github commented on GitHub (Nov 16, 2025):

I'm actually in the process of building a server that will run ollama along side ZFS (not in Truenas) so will be looking at this issue closer.

<!-- gh-comment-id:3538988808 --> @rick-github commented on GitHub (Nov 16, 2025): I'm actually in the process of building a server that will run ollama along side ZFS (not in Truenas) so will be looking at this issue closer.
Author
Owner

@87fox87 commented on GitHub (Nov 16, 2025):

So the question for me would be...does zfs file system come with zfs cache / arc automatically and whatever you will create might adapt to truenas with relative ease or is that too far fetched?

Either way. Sounds promising. Thank you for your efforts :)

<!-- gh-comment-id:3539070713 --> @87fox87 commented on GitHub (Nov 16, 2025): So the question for me would be...does zfs file system come with zfs cache / arc automatically and whatever you will create might adapt to truenas with relative ease or is that too far fetched? Either way. Sounds promising. Thank you for your efforts :)
Author
Owner

@rick-github commented on GitHub (Nov 16, 2025):

whatever you will create might adapt to truenas with relative ease

That's the goal. It's the goal of ollama in general, reduce the friction and just make it easy to run models. This does mean there are cases where ollama is not suitable and another product might work better. but it's the current scope of ollama work.

<!-- gh-comment-id:3539074076 --> @rick-github commented on GitHub (Nov 16, 2025): > whatever you will create might adapt to truenas with relative ease That's the goal. It's the goal of ollama in general, reduce the friction and just make it easy to run models. This does mean there are cases where ollama is not suitable and another product might work better. but it's the current scope of ollama work.
Author
Owner

@87fox87 commented on GitHub (Nov 16, 2025):

Amazing. If i can support you in any way. F.e. testing on truenas, let me know :)

<!-- gh-comment-id:3539190865 --> @87fox87 commented on GitHub (Nov 16, 2025): Amazing. If i can support you in any way. F.e. testing on truenas, let me know :)
Author
Owner

@markasoftware-tc commented on GitHub (Jan 19, 2026):

I believe this is due to a bug in how ollama detects free memory inside a container, see my pr #13782

<!-- gh-comment-id:3770249279 --> @markasoftware-tc commented on GitHub (Jan 19, 2026): I believe this is due to a bug in how ollama detects free memory inside a container, see my pr #13782
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#8667