[GH-ISSUE #9693] ollama 0.6.0 Gemma 3:27B eats whole vram #68384

Closed
opened 2026-05-04 13:43:40 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @sapphirepro on GitHub (Mar 12, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9693

What is the issue?

Regarding issue. On previous versions when nvidia uvm was enabled and extra flag to use nvidia uvm, in vram was remaining 1GB free approximately, which allowed to run model and still use system. With current situation it eats 100% of what leaving totally nothing, making impossible to use system as in can not draw even simple things, due to whole vram busy.

Needed to make something to leave at least 500MB of vram still free.

OpenSUSE Tumbleweed x86_64.
GPU: NVidia Quadro p5000 mobile 16 gb vram
Mobile xeon 6th gen 4 cores.
RAM: 64 GB DDR4 2400 Mhz

Relevant log output


OS

OpenSUSE Tumbleweed

GPU

NVidia Quadro P5000 mobile

CPU

Intel® Xeon® CPU E3-1575M v5 @ 3.00GHz

Ollama version

0.6.0

Originally created by @sapphirepro on GitHub (Mar 12, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9693 ### What is the issue? Regarding issue. On previous versions when nvidia uvm was enabled and extra flag to use nvidia uvm, in vram was remaining 1GB free approximately, which allowed to run model and still use system. With current situation it eats 100% of what leaving totally nothing, making impossible to use system as in can not draw even simple things, due to whole vram busy. Needed to make something to leave at least 500MB of vram still free. OpenSUSE Tumbleweed x86_64. GPU: NVidia Quadro p5000 mobile 16 gb vram Mobile xeon 6th gen 4 cores. RAM: 64 GB DDR4 2400 Mhz ### Relevant log output ```shell ``` ### OS OpenSUSE Tumbleweed ### GPU NVidia Quadro P5000 mobile ### CPU Intel® Xeon® CPU E3-1575M v5 @ 3.00GHz ### Ollama version 0.6.0
GiteaMirror added the bug label 2026-05-04 13:43:40 -05:00
Author
Owner

@sapphirepro commented on GitHub (Mar 12, 2025):

Also to make models work normally without crashing I use this custom exec as separate ".sh" file

`#!/bin/bash

Set the environment variable and run ollama serve

GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 /usr/local/bin/ollama serve`

One of requests would be also on update to not overwrite ollama.service, since then need to point it use different binary every time. But still critical is that totally no vram left free at all

<!-- gh-comment-id:2718456873 --> @sapphirepro commented on GitHub (Mar 12, 2025): Also to make models work normally without crashing I use this custom exec as separate ".sh" file `#!/bin/bash # Set the environment variable and run ollama serve GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 /usr/local/bin/ollama serve` One of requests would be also on update to not overwrite ollama.service, since then need to point it use different binary every time. But still critical is that totally no vram left free at all
Author
Owner

@sapphirepro commented on GitHub (Mar 12, 2025):

[

ollama_logs.txt

](

<!-- gh-comment-id:2718523031 --> @sapphirepro commented on GitHub (Mar 12, 2025): [ [ollama_logs.txt](https://github.com/user-attachments/files/19214109/ollama_logs.txt) ](
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#68384