[GH-ISSUE #10288] Possible memory leak running gemma3-12b #32517

Closed
opened 2026-04-22 13:51:56 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @ivoras on GitHub (Apr 15, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10288

What is the issue?

After some time, the ollama runner process starts heavily swapping. I'm running a bunch of test prompts sequentially over the OpenAI API, and after about 50-ish prompts, the system just grinds to a stop because of memory swapping. The process is eventually killed by the OOM killer.

Apr 15 16:22:58 ml-2 systemd[1]: ollama.service: A process of this unit has been killed by the OOM killer.

See the attached screenshot. This is on NVIDIA T4 with 16 GB VRAM, on a server with 16 GB system RAM and 16 GB swap space.

Image

Relevant log output

PID USER      PR  NI    VIRT    RES   SWAP    SHR S  %CPU  %MEM     TIME+ COMMAND
 182076 ollama    20   0   96.8g   9.6g   9.5g   7.3g S 100.0  65.7  65:06.21 /usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-e8ad13eff07a78d89926e9e8b882317d082ef5bf9768ad7b50fcdbbcd63748de --ctx-size 16384 --batch-size 512 --n-gpu-layers 49 --threads 2 --no-mmap --parallel 1 --port 36227

ivoras@ml-2 ~> ollama --version
ollama version is 0.6.5

ii  nvidia-driver-570                 570.124.06-0ubuntu1                     amd64        NVIDIA driver metapackage

Here's how it looks like in nvitop:

Image

There's tantalising visual correlation between GPU utilisation and memory usage (memory usage increases when new requests are handled).

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.6.5

Originally created by @ivoras on GitHub (Apr 15, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10288 ### What is the issue? After some time, the `ollama runner` process starts heavily swapping. I'm running a bunch of test prompts sequentially over the OpenAI API, and after about 50-ish prompts, the system just grinds to a stop because of memory swapping. The process is eventually killed by the OOM killer. ``` Apr 15 16:22:58 ml-2 systemd[1]: ollama.service: A process of this unit has been killed by the OOM killer. ``` See the attached screenshot. This is on NVIDIA T4 with 16 GB VRAM, on a server with 16 GB system RAM and 16 GB swap space. ![Image](https://github.com/user-attachments/assets/c35efd13-e74e-4041-873e-f31f6dde851f) ### Relevant log output ```shell PID USER PR NI VIRT RES SWAP SHR S %CPU %MEM TIME+ COMMAND 182076 ollama 20 0 96.8g 9.6g 9.5g 7.3g S 100.0 65.7 65:06.21 /usr/local/bin/ollama runner --ollama-engine --model /usr/share/ollama/.ollama/models/blobs/sha256-e8ad13eff07a78d89926e9e8b882317d082ef5bf9768ad7b50fcdbbcd63748de --ctx-size 16384 --batch-size 512 --n-gpu-layers 49 --threads 2 --no-mmap --parallel 1 --port 36227 ivoras@ml-2 ~> ollama --version ollama version is 0.6.5 ii nvidia-driver-570 570.124.06-0ubuntu1 amd64 NVIDIA driver metapackage ``` Here's how it looks like in nvitop: ![Image](https://github.com/user-attachments/assets/cbc5fc38-8c96-432c-b559-1f4bb65b4da8) There's tantalising visual correlation between GPU utilisation and memory usage (memory usage increases when new requests are handled). ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.6.5
GiteaMirror added the bug label 2026-04-22 13:51:56 -05:00
Author
Owner

@rick-github commented on GitHub (Apr 15, 2025):

#10040

<!-- gh-comment-id:2806839809 --> @rick-github commented on GitHub (Apr 15, 2025): #10040
Author
Owner

@ivoras commented on GitHub (Apr 15, 2025):

Agreed, it looks like a duplicate of that issue.

<!-- gh-comment-id:2807141206 --> @ivoras commented on GitHub (Apr 15, 2025): Agreed, it looks like a duplicate of that issue.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#32517