[GH-ISSUE #15356] Gemma4 takes up too much space #56337

Closed
opened 2026-04-29 10:40:02 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @chllei on GitHub (Apr 6, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15356

In Google's AI Edge Gallery app, the E2B model requires only about 2GB of space, whereas the E2B model in Ollama needs 7.2GB. This substantial memory footprint is highly unsuitable for laptops with 16GB of RAM and no dedicated GPU, yet the model is purportedly designed for low-end devices, including Raspberry Pis.

Originally created by @chllei on GitHub (Apr 6, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15356 In Google's AI Edge Gallery app, the E2B model requires only about 2GB of space, whereas the E2B model in Ollama needs 7.2GB. This substantial memory footprint is highly unsuitable for laptops with 16GB of RAM and no dedicated GPU, yet the model is purportedly designed for low-end devices, including Raspberry Pis.
GiteaMirror added the model label 2026-04-29 10:40:03 -05:00
Author
Owner

@rick-github commented on GitHub (Apr 6, 2026):

The model as released by Google is 10G. The version used in the Gallery app is tuned for the LiteRT-LM framework and uses a more heavily quantized version of model. LiteRT-LM is specifcally designed for edge devices and has Raspberry pi support, if the environment is resource constrained then it may be a better framework than the more general purpose ollama.

<!-- gh-comment-id:4191479266 --> @rick-github commented on GitHub (Apr 6, 2026): The model as released by Google is [10G](https://huggingface.co/google/gemma-4-E2B-it/tree/main). The version used in the Gallery app is tuned for the [LiteRT-LM framework](https://ai.google.dev/edge/litert-lm/overview#featured_model_gemma-4-e2b) and uses a more heavily [quantized](https://huggingface.co/litert-community/gemma-4-E2B-it-litert-lm#:~:text=It%20uses%20the%20Gemma%20quantization%20scheme%20that%20employs%20a%20mixture%20of%202bit%2C%204bit%20and%208bit%20weights.) version of model. LiteRT-LM is specifcally designed for edge devices and has [Raspberry pi](https://github.com/google-ai-edge/LiteRT-LM#:~:text=Windows%20(WSL)%20or-,Raspberry%20Pi,-with%20the%20LiteRT) support, if the environment is resource constrained then it may be a better framework than the more general purpose ollama.
Author
Owner

@pdevine commented on GitHub (Apr 7, 2026):

The audio and vision tensors take up a substantial amount of space. We could release a version which doesn't include those, but no plans to do that yet.

<!-- gh-comment-id:4202460461 --> @pdevine commented on GitHub (Apr 7, 2026): The audio and vision tensors take up a substantial amount of space. We could release a version which doesn't include those, but no plans to do that yet.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#56337