[GH-ISSUE #7208] insufficient VRAM to load any model layers #4577

Closed
opened 2026-04-12 15:30:26 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @goactiongo on GitHub (Oct 15, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7208

What is the issue?

"I want to know why the model prompts 'GPU has too little memory to allocate any layers.' I have four GPU cards, with available memory of 23.3 GiB, 23.3 GiB, 16.8 GiB, and 9.7 GiB respectively."

10月 14 12:47:30 gpu ollama[24746]: time=2024-10-14T12:47:30.994+08:00 level=DEBUG source=sched.go:224 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-b506a070d1152798d435ec4e7687336567ae653b3106f73b7b4ac7be1cbc4449
10月 14 12:47:30 gpu ollama[24746]: time=2024-10-14T12:47:30.994+08:00 level=DEBUG source=memory.go:103 msg=evaluating library=cuda gpu_count=1 available="[23.3 GiB]"
10月 14 12:47:30 gpu ollama[24746]: time=2024-10-14T12:47:30.995+08:00 level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=GPU-ac079011-c45b-de29-f2e2-71b2e5d2d7f4 library=cuda variant=v12 compute=8.0 driver=12.2 name="NVIDIA A30" total="23.5 GiB" available="23.3 GiB" minimum_memory=479199232 layer_size="484.5 MiB" gpu_zer_overhead="0 B" partial_offload="24.6 GiB" full_offload="24.2 GiB"
10月 14 12:47:30 gpu ollama[24746]: time=2024-10-14T12:47:30.995+08:00 level=DEBUG source=memory.go:312 msg="insufficient VRAM to load any model layers"
10月 14 12:47:30 gpu ollama[24746]: time=2024-10-14T12:47:30.995+08:00 level=DEBUG source=memory.go:103 msg=evaluating library=cuda gpu_count=1 available="[23.3 GiB]"
10月 14 12:47:30 gpu ollama[24746]: time=2024-10-14T12:47:30.995+08:00 level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=GPU-1a5993d8-1f60-3ecd-b80f-55ca9f1e95d2 library=cuda variant=v12 compute=8.0 driver=12.2 name="NVIDIA A30" total="23.5 GiB" available="23.3 GiB" minimum_memory=479199232 layer_size="484.5 MiB" gpu_zer_overhead="0 B" partial_offload="24.6 GiB" full_offload="24.2 GiB"
10月 14 12:47:30 gpu ollama[24746]: time=2024-10-14T12:47:30.995+08:00 level=DEBUG source=memory.go:312 msg="insufficient VRAM to load any model layers"
10月 14 12:47:30 gpu ollama[24746]: time=2024-10-14T12:47:30.995+08:00 level=DEBUG source=memory.go:103 msg=evaluating library=cuda gpu_count=1 available="[16.8 GiB]"
10月 14 12:47:30 gpu ollama[24746]: time=2024-10-14T12:47:30.996+08:00 level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=GPU-6b83f2f6-dc65-7feb-5e02-0cd0087995e8 library=cuda variant=v12 compute=8.0 driver=12.2 name="NVIDIA A30" total="23.5 GiB" available="16.8 GiB" minimum_memory=479199232 layer_size="484.5 MiB" gpu_zer_overhead="0 B" partial_offload="24.6 GiB" full_offload="24.2 GiB"
10月 14 12:47:30 gpu ollama[24746]: time=2024-10-14T12:47:30.996+08:00 level=DEBUG source=memory.go:312 msg="insufficient VRAM to load any model layers"
10月 14 12:47:30 gpu ollama[24746]: time=2024-10-14T12:47:30.996+08:00 level=DEBUG source=memory.go:103 msg=evaluating library=cuda gpu_count=1 available="[9.7 GiB]"
10月 14 12:47:30 gpu ollama[24746]: time=2024-10-14T12:47:30.996+08:00 level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=GPU-ad4cba93-ee35-2ea2-dba7-7b5772a098ce library=cuda variant=v12 compute=8.0 driver=12.2 name="NVIDIA A30" total="23.5 GiB" available="9.7 GiB" minimum_memory=479199232 layer_size="484.5 MiB" gpu_zer_overhead="0 B" partial_offload="24.6 GiB" full_offload="24.2 GiB"
10月 14 12:47:30 gpu ollama[24746]: time=2024-10-14T12:47:30.996+08:00 level=DEBUG source=memory.go:312 msg="insufficient VRAM to load any model layers"
10月 14 12:47:30 gpu ollama[24746]: time=2024-10-14T12:47:30.996+08:00 level=DEBUG source=memory.go:103 msg=evaluating library=cuda gpu_count=1 available="[23.3 GiB]"
10月 14 12:47:31 gpu ollama[24746]: time=2024-10-14T12:47:30.997+08:00 level=INFO source=sched.go:714 msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-b506a070d1152798d435ec4e7687336567ae653b3106f73b7b4ac7be1cbc4449 gpu=GPU-ac079011-c45b-de29-f2e2-71b2e5d2d7f4 parallel=1 available=24986779648 required="15.1 GiB"

OS

centos7.9

GPU

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.06 Driver Version: 535.183.06 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A30 Off | 00000000:17:00.0 Off | 0 |
| N/A 28C P0 30W / 165W | 13896MiB / 24576MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA A30 Off | 00000000:31:00.0 Off | 0 |
| N/A 27C P0 27W / 165W | 6642MiB / 24576MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 2 NVIDIA A30 Off | 00000000:B1:00.0 Off | 0 |
| N/A 26C P0 25W / 165W | 3MiB / 24576MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 3 NVIDIA A30 Off | 00000000:CA:00.0 Off | 0 |
| N/A 26C P0 24W / 165W | 3MiB / 24576MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 12273 C /usr/local/bin/python3.8 5062MiB |
| 0 N/A N/A 36459 C /usr/bin/python3.10 8820MiB |
| 1 N/A N/A 36258 C /usr/bin/python3.10 6634MiB |
+---------------------------------------------------------------------------------------+

CPU

intel

Ollama version

0.3.11

Originally created by @goactiongo on GitHub (Oct 15, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7208 ### What is the issue? "I want to know why the model prompts 'GPU has too little memory to allocate any layers.' I have four GPU cards, with available memory of 23.3 GiB, 23.3 GiB, 16.8 GiB, and 9.7 GiB respectively." ``` 10月 14 12:47:30 gpu ollama[24746]: time=2024-10-14T12:47:30.994+08:00 level=DEBUG source=sched.go:224 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-b506a070d1152798d435ec4e7687336567ae653b3106f73b7b4ac7be1cbc4449 10月 14 12:47:30 gpu ollama[24746]: time=2024-10-14T12:47:30.994+08:00 level=DEBUG source=memory.go:103 msg=evaluating library=cuda gpu_count=1 available="[23.3 GiB]" 10月 14 12:47:30 gpu ollama[24746]: time=2024-10-14T12:47:30.995+08:00 level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=GPU-ac079011-c45b-de29-f2e2-71b2e5d2d7f4 library=cuda variant=v12 compute=8.0 driver=12.2 name="NVIDIA A30" total="23.5 GiB" available="23.3 GiB" minimum_memory=479199232 layer_size="484.5 MiB" gpu_zer_overhead="0 B" partial_offload="24.6 GiB" full_offload="24.2 GiB" 10月 14 12:47:30 gpu ollama[24746]: time=2024-10-14T12:47:30.995+08:00 level=DEBUG source=memory.go:312 msg="insufficient VRAM to load any model layers" 10月 14 12:47:30 gpu ollama[24746]: time=2024-10-14T12:47:30.995+08:00 level=DEBUG source=memory.go:103 msg=evaluating library=cuda gpu_count=1 available="[23.3 GiB]" 10月 14 12:47:30 gpu ollama[24746]: time=2024-10-14T12:47:30.995+08:00 level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=GPU-1a5993d8-1f60-3ecd-b80f-55ca9f1e95d2 library=cuda variant=v12 compute=8.0 driver=12.2 name="NVIDIA A30" total="23.5 GiB" available="23.3 GiB" minimum_memory=479199232 layer_size="484.5 MiB" gpu_zer_overhead="0 B" partial_offload="24.6 GiB" full_offload="24.2 GiB" 10月 14 12:47:30 gpu ollama[24746]: time=2024-10-14T12:47:30.995+08:00 level=DEBUG source=memory.go:312 msg="insufficient VRAM to load any model layers" 10月 14 12:47:30 gpu ollama[24746]: time=2024-10-14T12:47:30.995+08:00 level=DEBUG source=memory.go:103 msg=evaluating library=cuda gpu_count=1 available="[16.8 GiB]" 10月 14 12:47:30 gpu ollama[24746]: time=2024-10-14T12:47:30.996+08:00 level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=GPU-6b83f2f6-dc65-7feb-5e02-0cd0087995e8 library=cuda variant=v12 compute=8.0 driver=12.2 name="NVIDIA A30" total="23.5 GiB" available="16.8 GiB" minimum_memory=479199232 layer_size="484.5 MiB" gpu_zer_overhead="0 B" partial_offload="24.6 GiB" full_offload="24.2 GiB" 10月 14 12:47:30 gpu ollama[24746]: time=2024-10-14T12:47:30.996+08:00 level=DEBUG source=memory.go:312 msg="insufficient VRAM to load any model layers" 10月 14 12:47:30 gpu ollama[24746]: time=2024-10-14T12:47:30.996+08:00 level=DEBUG source=memory.go:103 msg=evaluating library=cuda gpu_count=1 available="[9.7 GiB]" 10月 14 12:47:30 gpu ollama[24746]: time=2024-10-14T12:47:30.996+08:00 level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=GPU-ad4cba93-ee35-2ea2-dba7-7b5772a098ce library=cuda variant=v12 compute=8.0 driver=12.2 name="NVIDIA A30" total="23.5 GiB" available="9.7 GiB" minimum_memory=479199232 layer_size="484.5 MiB" gpu_zer_overhead="0 B" partial_offload="24.6 GiB" full_offload="24.2 GiB" 10月 14 12:47:30 gpu ollama[24746]: time=2024-10-14T12:47:30.996+08:00 level=DEBUG source=memory.go:312 msg="insufficient VRAM to load any model layers" 10月 14 12:47:30 gpu ollama[24746]: time=2024-10-14T12:47:30.996+08:00 level=DEBUG source=memory.go:103 msg=evaluating library=cuda gpu_count=1 available="[23.3 GiB]" 10月 14 12:47:31 gpu ollama[24746]: time=2024-10-14T12:47:30.997+08:00 level=INFO source=sched.go:714 msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-b506a070d1152798d435ec4e7687336567ae653b3106f73b7b4ac7be1cbc4449 gpu=GPU-ac079011-c45b-de29-f2e2-71b2e5d2d7f4 parallel=1 available=24986779648 required="15.1 GiB" ``` ### OS centos7.9 ### GPU +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.183.06 Driver Version: 535.183.06 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA A30 Off | 00000000:17:00.0 Off | 0 | | N/A 28C P0 30W / 165W | 13896MiB / 24576MiB | 0% Default | | | | Disabled | +-----------------------------------------+----------------------+----------------------+ | 1 NVIDIA A30 Off | 00000000:31:00.0 Off | 0 | | N/A 27C P0 27W / 165W | 6642MiB / 24576MiB | 0% Default | | | | Disabled | +-----------------------------------------+----------------------+----------------------+ | 2 NVIDIA A30 Off | 00000000:B1:00.0 Off | 0 | | N/A 26C P0 25W / 165W | 3MiB / 24576MiB | 0% Default | | | | Disabled | +-----------------------------------------+----------------------+----------------------+ | 3 NVIDIA A30 Off | 00000000:CA:00.0 Off | 0 | | N/A 26C P0 24W / 165W | 3MiB / 24576MiB | 0% Default | | | | Disabled | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 12273 C /usr/local/bin/python3.8 5062MiB | | 0 N/A N/A 36459 C /usr/bin/python3.10 8820MiB | | 1 N/A N/A 36258 C /usr/bin/python3.10 6634MiB | +---------------------------------------------------------------------------------------+ ### CPU intel ### Ollama version 0.3.11
GiteaMirror added the bug label 2026-04-12 15:30:26 -05:00
Author
Owner

@rick-github commented on GitHub (Oct 15, 2024):

https://github.com/ollama/ollama/issues/7146#issuecomment-2415182486

<!-- gh-comment-id:2415229397 --> @rick-github commented on GitHub (Oct 15, 2024): https://github.com/ollama/ollama/issues/7146#issuecomment-2415182486
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#4577