[GH-ISSUE #12678] deepseek-coder-v2:16b 8.9GB on disk unpacks to 20GB #8410

Closed
opened 2026-04-12 21:04:36 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @codeliger on GitHub (Oct 17, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/12678

What is the issue?

Can someone confirm this is normal that a 9gb model unpacks to 20GB exceeding 16GB of VRAM on a 9070 XT?

NAME                         ID              SIZE      MODIFIED       
deepseek-coder-v2:16b        63fb193b3a9b    8.9 GB    15 minutes ago
NAME                     ID              SIZE     PROCESSOR          CONTEXT    UNTIL   
deepseek-coder-v2:16b    63fb193b3a9b    20 GB    19%/81% CPU/GPU    8164       Forever
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_CONTEXT_LENGTH=8164"
Environment="OLLAMA_KEEP_ALIVE=-1"
Environment="OLLAMA_FLASH_ATTENTION=1"
Environment="OLLAMA_MAX_LOADED_MODELS=3"
Environment="OLLAMA_NUM_PARALLEL=4"
Environment="OLLAMA_DEBUG=1"
Environment="OLLAMA_MODELS=/opt/ollama_models/"

Is the file size to unpack size directly affected by how many parameters are in a model? I was assuming a linear unpacking size. Normally 9GB turns to ~12GB to ~15gb.

Relevant log output


OS

Arch Linux

GPU

9070 XT

CPU

AMD 3900X

Ollama version

0.12.6

Originally created by @codeliger on GitHub (Oct 17, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/12678 ### What is the issue? Can someone confirm this is normal that a 9gb model unpacks to 20GB exceeding 16GB of VRAM on a 9070 XT? ``` NAME ID SIZE MODIFIED deepseek-coder-v2:16b 63fb193b3a9b 8.9 GB 15 minutes ago ``` ``` NAME ID SIZE PROCESSOR CONTEXT UNTIL deepseek-coder-v2:16b 63fb193b3a9b 20 GB 19%/81% CPU/GPU 8164 Forever ``` ``` [Service] Environment="OLLAMA_HOST=0.0.0.0:11434" Environment="OLLAMA_CONTEXT_LENGTH=8164" Environment="OLLAMA_KEEP_ALIVE=-1" Environment="OLLAMA_FLASH_ATTENTION=1" Environment="OLLAMA_MAX_LOADED_MODELS=3" Environment="OLLAMA_NUM_PARALLEL=4" Environment="OLLAMA_DEBUG=1" Environment="OLLAMA_MODELS=/opt/ollama_models/" ``` Is the file size to unpack size directly affected by how many parameters are in a model? I was assuming a linear unpacking size. Normally 9GB turns to ~12GB to ~15gb. ### Relevant log output ```shell ``` ### OS Arch Linux ### GPU 9070 XT ### CPU AMD 3900X ### Ollama version 0.12.6
GiteaMirror added the bug label 2026-04-12 21:04:36 -05:00
Author
Owner

@codeliger commented on GitHub (Oct 17, 2025):

It looks like Environment="OLLAMA_NUM_PARALLEL=4" was causing the 20GB vram problem.
I guess it preallocates 8164 * 4 or something.

Solved.

<!-- gh-comment-id:3415864426 --> @codeliger commented on GitHub (Oct 17, 2025): It looks like Environment="OLLAMA_NUM_PARALLEL=4" was causing the 20GB vram problem. I guess it preallocates 8164 * 4 or something. Solved.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#8410