[GH-ISSUE #9695] Gemma3 27B uses 2.5x VRAM of Gemma2 27B #52841

Closed
opened 2026-04-29 01:07:39 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @hg0428 on GitHub (Mar 12, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9695

What is the issue?

https://discord.com/channels/1128867683291627614/1349413021805580461/1349413021805580461

Gemma3 27B Q4_K_M is using 40GB of memory, while I only have 36GB. Gemma2 27B did not do this and used far less.

Relevant log output


OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.6.0

Originally created by @hg0428 on GitHub (Mar 12, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9695 ### What is the issue? https://discord.com/channels/1128867683291627614/1349413021805580461/1349413021805580461 Gemma3 27B Q4_K_M is using 40GB of memory, while I only have 36GB. Gemma2 27B did not do this and used far less. ### Relevant log output ```shell ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.6.0
GiteaMirror added the bug label 2026-04-29 01:07:39 -05:00
Author
Owner

@EldanRing commented on GitHub (Mar 12, 2025):

I'm having the same issue, all of Gemma 3 models are using an absurd amount of memory. Also testing in macOS with an M4 pro.

<!-- gh-comment-id:2718572962 --> @EldanRing commented on GitHub (Mar 12, 2025): I'm having the same issue, all of Gemma 3 models are using an absurd amount of memory. Also testing in macOS with an M4 pro.
Author
Owner

@sirajperson commented on GitHub (Mar 12, 2025):

I'm having the same issue on Ubuntu 24.04. The model seems to have loaded to the GPU, but there are like 47 processes associated wit attempting to run it. Ollama eventually times out:

ollama run gemma3:27b-it-fp16
Error: timed out waiting for llama runner to start - progress 0.00 -

After exiting ollama, Gemma stays in the GPUs and the process that were trying to continue to run on.

<!-- gh-comment-id:2718692785 --> @sirajperson commented on GitHub (Mar 12, 2025): I'm having the same issue on Ubuntu 24.04. The model seems to have loaded to the GPU, but there are like 47 processes associated wit attempting to run it. Ollama eventually times out: ollama run gemma3:27b-it-fp16 Error: timed out waiting for llama runner to start - progress 0.00 - After exiting ollama, Gemma stays in the GPUs and the process that were trying to continue to run on.
Author
Owner

@Igorgro commented on GitHub (Mar 12, 2025):

Having the same issue. I have no problem running qwen2.5:32b which exactly fits into my 24Gb of VRAM, but Gemma3 eats the whole VRAM and then starts to offloading to RAM. It eats whole 24Gb of RAM and get killed by OOM.

<!-- gh-comment-id:2718748190 --> @Igorgro commented on GitHub (Mar 12, 2025): Having the same issue. I have no problem running qwen2.5:32b which exactly fits into my 24Gb of VRAM, but Gemma3 eats the whole VRAM and then starts to offloading to RAM. It eats whole 24Gb of RAM and get killed by OOM.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#52841