[GH-ISSUE #9817] update to 0.6.1, the model seems don't be loaded to th vram #32186

Closed
opened 2026-04-22 13:13:40 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @itsmeaningless on GitHub (Mar 17, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9817

What is the issue?

output of command(ollama ps):
NAME ID SIZE PROCESSOR UNTIL
google_gemma-3-27b-it-IQ4_XS.gguf:latest b7aea856d9f1 18 GB 10%/90% CPU/GPU 4 minutes from now

output of nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.04 Driver Version: 570.124.04 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4080 ... Off | 00000000:01:00.0 Off | N/A |
| 0% 37C P8 4W / 320W | 39MiB / 16376MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1805 G /usr/lib/xorg/Xorg 9MiB |
| 0 N/A N/A 1937 G /usr/bin/gnome-shell 6MiB |
+-----------------------------------------------------------------------------------------+

output of log:
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: loading model tensors, this can take a while... (mmap = true)
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 0 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 1 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 2 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 3 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 4 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 5 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 6 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 7 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 8 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 9 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 10 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 11 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 12 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 13 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 14 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 15 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 16 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 17 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 18 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 19 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 20 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 21 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 22 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 23 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 24 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 25 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 26 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 27 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 28 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 29 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 30 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 31 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 32 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 33 assigned to device CPU
3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 34 assigned to device CPU

Relevant log output


OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

0.6.1

Originally created by @itsmeaningless on GitHub (Mar 17, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9817 ### What is the issue? **output of command(ollama ps):** NAME ID SIZE PROCESSOR UNTIL google_gemma-3-27b-it-IQ4_XS.gguf:latest b7aea856d9f1 18 GB 10%/90% CPU/GPU 4 minutes from now **output of nvidia-smi** +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 570.124.04 Driver Version: 570.124.04 CUDA Version: 12.8 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 4080 ... Off | 00000000:01:00.0 Off | N/A | | 0% 37C P8 4W / 320W | 39MiB / 16376MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 1805 G /usr/lib/xorg/Xorg 9MiB | | 0 N/A N/A 1937 G /usr/bin/gnome-shell 6MiB | +-----------------------------------------------------------------------------------------+ **output of log:** 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: loading model tensors, this can take a while... (mmap = true) 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 0 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 1 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 2 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 3 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 4 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 5 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 6 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 7 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 8 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 9 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 10 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 11 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 12 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 13 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 14 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 15 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 16 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 17 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 18 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 19 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 20 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 21 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 22 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 23 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 24 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 25 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 26 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 27 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 28 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 29 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 30 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 31 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 32 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 33 assigned to device CPU 3月 17 17:28:06 dy-canvas ollama[73083]: load_tensors: layer 34 assigned to device CPU ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU AMD ### Ollama version 0.6.1
GiteaMirror added the bug label 2026-04-22 13:13:40 -05:00
Author
Owner

@itsmeaningless commented on GitHub (Mar 17, 2025):

reboot the machaine, work well

<!-- gh-comment-id:2728941533 --> @itsmeaningless commented on GitHub (Mar 17, 2025): reboot the machaine, work well
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#32186