[GH-ISSUE #13227] Ollama evicts previously loaded model, although system memory is suficient #8745

Closed
opened 2026-04-12 21:30:52 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @ioannis-soukas on GitHub (Nov 24, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13227

Originally assigned to: @jessegross on GitHub.

What is the issue?

Hello,

I am running Ollama v0.13.0 CPU only, on a dedicated ubuntu server with 512 GB RAM.

Although in previous versions I could keep multiple models loaded in memory (Environment="OLLAMA_KEEP_ALIVE=-1"), after the last update it only keeps one model in memory, even when using only small models. Model eviction happens as soon as I load another model.

For example, it can not have gemma3:1b & gemma3:27b loaded at the same time, although the total ram they consume is small, compared to my free RAM. The server is only used for ollama & open-webui, there is no other software running or consuming RAM.

Thank you!

Relevant log output

Nov 24 13:55:06 llm ollama[7189]: time=2025-11-24T13:55:06.999Z level=INFO source=sched.go:443 msg="system memory" total="503.6 GiB" free="498.1 GiB" free_swap="8.0 GiB"
Nov 24 13:55:06 llm ollama[7189]: time=2025-11-24T13:55:06.999Z level=INFO source=server.go:702 msg="loading model" "model layers"=63 requested=-1
Nov 24 13:55:06 llm ollama[7189]: time=2025-11-24T13:55:06.999Z level=INFO source=server.go:974 msg="model requires more memory than is currently available, evicting a model to make space" "loaded layers"=0
Nov 24 13:55:07 llm ollama[7189]: time=2025-11-24T13:55:07.022Z level=INFO source=runner.go:1398 msg="starting ollama engine"
Nov 24 13:55:07 llm ollama[7189]: time=2025-11-24T13:55:07.022Z level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:41401"
Nov 24 13:55:07 llm ollama[7189]: time=2025-11-24T13:55:07.352Z level=INFO source=sched.go:443 msg="system memory" total="503.6 GiB" free="499.3 GiB" free_swap="8.0 GiB"
Nov 24 13:55:07 llm ollama[7189]: time=2025-11-24T13:55:07.352Z level=INFO source=server.go:702 msg="loading model" "model layers"=63 requested=-1
Nov 24 13:55:07 llm ollama[7189]: time=2025-11-24T13:55:07.353Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:28 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Nov 24 13:55:07 llm ollama[7189]: time=2025-11-24T13:55:07.454Z level=INFO source=ggml.go:136 msg="" architecture=gemma3 file_type=Q4_0 name="" description="" num_tensors=1247 num_key_values=40

OS

Ubuntu 24.04.3 LTS

GPU

None

CPU

2xIntel E5-2680v4

Ollama version

0.13.0

Originally created by @ioannis-soukas on GitHub (Nov 24, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13227 Originally assigned to: @jessegross on GitHub. ### What is the issue? Hello, I am running Ollama v0.13.0 CPU only, on a dedicated ubuntu server with 512 GB RAM. Although in previous versions I could keep multiple models loaded in memory (Environment="OLLAMA_KEEP_ALIVE=-1"), after the last update it only keeps one model in memory, even when using only small models. Model eviction happens as soon as I load another model. For example, it can not have gemma3:1b & gemma3:27b loaded at the same time, although the total ram they consume is small, compared to my free RAM. The server is only used for ollama & open-webui, there is no other software running or consuming RAM. Thank you! ### Relevant log output ```shell Nov 24 13:55:06 llm ollama[7189]: time=2025-11-24T13:55:06.999Z level=INFO source=sched.go:443 msg="system memory" total="503.6 GiB" free="498.1 GiB" free_swap="8.0 GiB" Nov 24 13:55:06 llm ollama[7189]: time=2025-11-24T13:55:06.999Z level=INFO source=server.go:702 msg="loading model" "model layers"=63 requested=-1 Nov 24 13:55:06 llm ollama[7189]: time=2025-11-24T13:55:06.999Z level=INFO source=server.go:974 msg="model requires more memory than is currently available, evicting a model to make space" "loaded layers"=0 Nov 24 13:55:07 llm ollama[7189]: time=2025-11-24T13:55:07.022Z level=INFO source=runner.go:1398 msg="starting ollama engine" Nov 24 13:55:07 llm ollama[7189]: time=2025-11-24T13:55:07.022Z level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:41401" Nov 24 13:55:07 llm ollama[7189]: time=2025-11-24T13:55:07.352Z level=INFO source=sched.go:443 msg="system memory" total="503.6 GiB" free="499.3 GiB" free_swap="8.0 GiB" Nov 24 13:55:07 llm ollama[7189]: time=2025-11-24T13:55:07.352Z level=INFO source=server.go:702 msg="loading model" "model layers"=63 requested=-1 Nov 24 13:55:07 llm ollama[7189]: time=2025-11-24T13:55:07.353Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:28 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" Nov 24 13:55:07 llm ollama[7189]: time=2025-11-24T13:55:07.454Z level=INFO source=ggml.go:136 msg="" architecture=gemma3 file_type=Q4_0 name="" description="" num_tensors=1247 num_key_values=40 ``` ### OS Ubuntu 24.04.3 LTS ### GPU None ### CPU 2xIntel E5-2680v4 ### Ollama version 0.13.0
GiteaMirror added the bug label 2026-04-12 21:30:52 -05:00
Author
Owner

@Bottlecap202 commented on GitHub (Nov 24, 2025):

Ask GitHub copilot using grok.

On Mon, Nov 24, 2025, 8:22 AM ioannis-soukas @.***>
wrote:

ioannis-soukas created an issue (ollama/ollama#13227)
https://github.com/ollama/ollama/issues/13227
What is the issue?

Hello,

I am running Ollama v0.13.0 CPU only, on a dedicated ubuntu server with
512 GB RAM.

Although in previous versions I could keep multiple models loaded in
memory (Environment="OLLAMA_KEEP_ALIVE=-1"), after the last update it only
keeps one model in memory, even when using only small models. Model
eviction happens as soon as I load another model.

For example, it can not have gemma3:1b & gemma3:27b loaded at the same
time, although the total ram they consume is small, compared to my free
RAM. The server is only used for ollama & open-webui, there is no other
software running or consuming RAM.

Thank you!
Relevant log output

Nov 24 13:55:06 llm ollama[7189]: time=2025-11-24T13:55:06.999Z level=INFO source=sched.go:443 msg="system memory" total="503.6 GiB" free="498.1 GiB" free_swap="8.0 GiB"
Nov 24 13:55:06 llm ollama[7189]: time=2025-11-24T13:55:06.999Z level=INFO source=server.go:702 msg="loading model" "model layers"=63 requested=-1
Nov 24 13:55:06 llm ollama[7189]: time=2025-11-24T13:55:06.999Z level=INFO source=server.go:974 msg="model requires more memory than is currently available, evicting a model to make space" "loaded layers"=0
Nov 24 13:55:07 llm ollama[7189]: time=2025-11-24T13:55:07.022Z level=INFO source=runner.go:1398 msg="starting ollama engine"
Nov 24 13:55:07 llm ollama[7189]: time=2025-11-24T13:55:07.022Z level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:41401"
Nov 24 13:55:07 llm ollama[7189]: time=2025-11-24T13:55:07.352Z level=INFO source=sched.go:443 msg="system memory" total="503.6 GiB" free="499.3 GiB" free_swap="8.0 GiB"
Nov 24 13:55:07 llm ollama[7189]: time=2025-11-24T13:55:07.352Z level=INFO source=server.go:702 msg="loading model" "model layers"=63 requested=-1
Nov 24 13:55:07 llm ollama[7189]: time=2025-11-24T13:55:07.353Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:28 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Nov 24 13:55:07 llm ollama[7189]: time=2025-11-24T13:55:07.454Z level=INFO source=ggml.go:136 msg="" architecture=gemma3 file_type=Q4_0 name="" description="" num_tensors=1247 num_key_values=40

OS

No response
GPU

No response
CPU

No response
Ollama version

No response


Reply to this email directly, view it on GitHub
https://github.com/ollama/ollama/issues/13227, or unsubscribe
https://github.com/notifications/unsubscribe-auth/BDHQPUNBXD5B6KZXQYOP7CT36MICLAVCNFSM6AAAAACNBAIQ76VHI2DSMVQWIX3LMV43ASLTON2WKOZTGY2TSMJTGIYDANI
.
You are receiving this because you are subscribed to this thread.Message
ID: @.***>

<!-- gh-comment-id:3571042516 --> @Bottlecap202 commented on GitHub (Nov 24, 2025): Ask GitHub copilot using grok. On Mon, Nov 24, 2025, 8:22 AM ioannis-soukas ***@***.***> wrote: > *ioannis-soukas* created an issue (ollama/ollama#13227) > <https://github.com/ollama/ollama/issues/13227> > What is the issue? > > Hello, > > I am running Ollama v0.13.0 CPU only, on a dedicated ubuntu server with > 512 GB RAM. > > Although in previous versions I could keep multiple models loaded in > memory (Environment="OLLAMA_KEEP_ALIVE=-1"), after the last update it only > keeps one model in memory, even when using only small models. Model > eviction happens as soon as I load another model. > > For example, it can not have gemma3:1b & gemma3:27b loaded at the same > time, although the total ram they consume is small, compared to my free > RAM. The server is only used for ollama & open-webui, there is no other > software running or consuming RAM. > > Thank you! > Relevant log output > > Nov 24 13:55:06 llm ollama[7189]: time=2025-11-24T13:55:06.999Z level=INFO source=sched.go:443 msg="system memory" total="503.6 GiB" free="498.1 GiB" free_swap="8.0 GiB" > Nov 24 13:55:06 llm ollama[7189]: time=2025-11-24T13:55:06.999Z level=INFO source=server.go:702 msg="loading model" "model layers"=63 requested=-1 > Nov 24 13:55:06 llm ollama[7189]: time=2025-11-24T13:55:06.999Z level=INFO source=server.go:974 msg="model requires more memory than is currently available, evicting a model to make space" "loaded layers"=0 > Nov 24 13:55:07 llm ollama[7189]: time=2025-11-24T13:55:07.022Z level=INFO source=runner.go:1398 msg="starting ollama engine" > Nov 24 13:55:07 llm ollama[7189]: time=2025-11-24T13:55:07.022Z level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:41401" > Nov 24 13:55:07 llm ollama[7189]: time=2025-11-24T13:55:07.352Z level=INFO source=sched.go:443 msg="system memory" total="503.6 GiB" free="499.3 GiB" free_swap="8.0 GiB" > Nov 24 13:55:07 llm ollama[7189]: time=2025-11-24T13:55:07.352Z level=INFO source=server.go:702 msg="loading model" "model layers"=63 requested=-1 > Nov 24 13:55:07 llm ollama[7189]: time=2025-11-24T13:55:07.353Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:28 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" > Nov 24 13:55:07 llm ollama[7189]: time=2025-11-24T13:55:07.454Z level=INFO source=ggml.go:136 msg="" architecture=gemma3 file_type=Q4_0 name="" description="" num_tensors=1247 num_key_values=40 > > OS > > *No response* > GPU > > *No response* > CPU > > *No response* > Ollama version > > *No response* > > — > Reply to this email directly, view it on GitHub > <https://github.com/ollama/ollama/issues/13227>, or unsubscribe > <https://github.com/notifications/unsubscribe-auth/BDHQPUNBXD5B6KZXQYOP7CT36MICLAVCNFSM6AAAAACNBAIQ76VHI2DSMVQWIX3LMV43ASLTON2WKOZTGY2TSMJTGIYDANI> > . > You are receiving this because you are subscribed to this thread.Message > ID: ***@***.***> >
Author
Owner

@rick-github commented on GitHub (Nov 24, 2025):

A full log with OLLAMA_DEBUG=1 in the server environment might help in debugging.

<!-- gh-comment-id:3571081738 --> @rick-github commented on GitHub (Nov 24, 2025): A full log with `OLLAMA_DEBUG=1` in the server environment might help in debugging.
Author
Owner

@ioannis-soukas commented on GitHub (Nov 24, 2025):

ollama_debug.txt

<!-- gh-comment-id:3571163734 --> @ioannis-soukas commented on GitHub (Nov 24, 2025): [ollama_debug.txt](https://github.com/user-attachments/files/23725448/ollama_debug.txt)
Author
Owner

@ioannis-soukas commented on GitHub (Nov 24, 2025):

Tried to downgrade a couple of versions & see when this issue first appears.
It seems that 0.12.6 works fine.
The behavior that I described above first appears in v0.12.7.

<!-- gh-comment-id:3572798207 --> @ioannis-soukas commented on GitHub (Nov 24, 2025): Tried to downgrade a couple of versions & see when this issue first appears. It seems that 0.12.6 works fine. The behavior that I described above first appears in v0.12.7.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#8745