[GH-ISSUE #15387] No activity/response from gemma4:31b model #9841

Open
opened 2026-04-12 22:42:17 -05:00 by GiteaMirror · 37 comments
Owner

Originally created by @khteh on GitHub (Apr 7, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15387

What is the issue?

I am running 0.20.3 in k8s and I don't see any activity from gemma4:31b model. No log from kubectl logs output.

Relevant log output


OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.20.3

Originally created by @khteh on GitHub (Apr 7, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15387 ### What is the issue? I am running `0.20.3` in k8s and I don't see any activity from `gemma4:31b` model. No log from `kubectl logs` output. ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU Intel ### Ollama version 0.20.3
GiteaMirror added the bug label 2026-04-12 22:42:17 -05:00
Author
Owner

@rick-github commented on GitHub (Apr 7, 2026):

If there are no logs at all it means the server is not running, or is logging to someplace that k8s is not intercepting. Find the logs and it will be easier to see why gemma4:31b is not responding.

<!-- gh-comment-id:4198671194 --> @rick-github commented on GitHub (Apr 7, 2026): If there are no logs at all it means the server is not running, or is logging to someplace that k8s is not intercepting. Find the logs and it will be easier to see why gemma4:31b is not responding.
Author
Owner

@khteh commented on GitHub (Apr 7, 2026):

https://docs.ollama.com/troubleshooting is not helpful at all or is outdated. Where is the log?

root@ollama-0:/# ls -l ~/.ollama/logs/server.log
ls: cannot access '/root/.ollama/logs/server.log': No such file or directory
<!-- gh-comment-id:4198687954 --> @khteh commented on GitHub (Apr 7, 2026): https://docs.ollama.com/troubleshooting is not helpful at all or is outdated. Where is the log? ``` root@ollama-0:/# ls -l ~/.ollama/logs/server.log ls: cannot access '/root/.ollama/logs/server.log': No such file or directory ```
Author
Owner

@rick-github commented on GitHub (Apr 7, 2026):

~/.ollama/logs/server.log is for Mac. Are you running macOS inside k8s and running ollama inside that?

<!-- gh-comment-id:4198708153 --> @rick-github commented on GitHub (Apr 7, 2026): `~/.ollama/logs/server.log` is for Mac. Are you running macOS inside k8s and running ollama inside that?
Author
Owner

@khteh commented on GitHub (Apr 7, 2026):

root@ollama-0:/var/log# ollama ps
NAME          ID              SIZE     PROCESSOR    CONTEXT    UNTIL              
gemma4:31b    6316f0629137    46 GB    100% CPU     262144     4 minutes from now    
root@ollama-0:/var/log# ollama list
NAME                     ID              SIZE      MODIFIED       
embeddinggemma:latest    85462619ee72    621 MB    26 minutes ago    
gemma4:31b               6316f0629137    19 GB     26 minutes ago 
<!-- gh-comment-id:4198709025 --> @khteh commented on GitHub (Apr 7, 2026): ``` root@ollama-0:/var/log# ollama ps NAME ID SIZE PROCESSOR CONTEXT UNTIL gemma4:31b 6316f0629137 46 GB 100% CPU 262144 4 minutes from now root@ollama-0:/var/log# ollama list NAME ID SIZE MODIFIED embeddinggemma:latest 85462619ee72 621 MB 26 minutes ago gemma4:31b 6316f0629137 19 GB 26 minutes ago ```
Author
Owner

@khteh commented on GitHub (Apr 7, 2026):

No. All linux.

root@ollama-0:/var/log# uname -a
Linux ollama-0 6.17.0-20-generic #20-Ubuntu SMP PREEMPT_DYNAMIC Fri Mar 13 20:07:29 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux
<!-- gh-comment-id:4198710879 --> @khteh commented on GitHub (Apr 7, 2026): No. All linux. ``` root@ollama-0:/var/log# uname -a Linux ollama-0 6.17.0-20-generic #20-Ubuntu SMP PREEMPT_DYNAMIC Fri Mar 13 20:07:29 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux ```
Author
Owner

@rick-github commented on GitHub (Apr 7, 2026):

How is ollama started inside the k8s container?

<!-- gh-comment-id:4198714870 --> @rick-github commented on GitHub (Apr 7, 2026): How is ollama started inside the k8s container?
Author
Owner

@khteh commented on GitHub (Apr 7, 2026):

https://github.com/khteh/Ollama/blob/master/run.sh

<!-- gh-comment-id:4198718181 --> @khteh commented on GitHub (Apr 7, 2026): https://github.com/khteh/Ollama/blob/master/run.sh
Author
Owner

@khteh commented on GitHub (Apr 7, 2026):

root@ollama-0:/var/log# ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 11:18 ?        00:00:00 /bin/bash /usr/local/bin/run.sh bash
root          56       1  0 11:18 ?        00:00:02 ollama serve
root         160      56 99 11:19 ?        03:38:37 /usr/bin/ollama runner --ollama-engine --model /models/blobs/sha256-280af6832eca23cb322c4dcc65edfea98a21b8f8ab07dc7553bd6f7e6e7a3313 --port 43523
root         292       0  0 11:41 pts/0    00:00:00 bash
root         338     292  0 11:47 pts/0    00:00:00 ps -ef
<!-- gh-comment-id:4198721911 --> @khteh commented on GitHub (Apr 7, 2026): ``` root@ollama-0:/var/log# ps -ef UID PID PPID C STIME TTY TIME CMD root 1 0 0 11:18 ? 00:00:00 /bin/bash /usr/local/bin/run.sh bash root 56 1 0 11:18 ? 00:00:02 ollama serve root 160 56 99 11:19 ? 03:38:37 /usr/bin/ollama runner --ollama-engine --model /models/blobs/sha256-280af6832eca23cb322c4dcc65edfea98a21b8f8ab07dc7553bd6f7e6e7a3313 --port 43523 root 292 0 0 11:41 pts/0 00:00:00 bash root 338 292 0 11:47 pts/0 00:00:00 ps -ef ```
Author
Owner

@rick-github commented on GitHub (Apr 7, 2026):

So logging will be written to stderr of the ollama server. Is run.sh executed with I/O redirection?

<!-- gh-comment-id:4198727933 --> @rick-github commented on GitHub (Apr 7, 2026): So logging will be written to stderr of the ollama server. Is `run.sh` executed with I/O redirection?
Author
Owner

@khteh commented on GitHub (Apr 7, 2026):

https://github.com/khteh/Ollama/blob/master/Dockerfile

<!-- gh-comment-id:4198730579 --> @khteh commented on GitHub (Apr 7, 2026): https://github.com/khteh/Ollama/blob/master/Dockerfile
Author
Owner

@rick-github commented on GitHub (Apr 7, 2026):

Inside the container, what's the output of:

ls -l /proc/56/fd
<!-- gh-comment-id:4198749984 --> @rick-github commented on GitHub (Apr 7, 2026): Inside the container, what's the output of: ``` ls -l /proc/56/fd ```
Author
Owner

@khteh commented on GitHub (Apr 7, 2026):

root@ollama-0:/var/log# ls -l /proc/56/fd
total 0
lr-x------ 1 root root 64 Apr  7 11:53 0 -> /dev/null
l-wx------ 1 root root 64 Apr  7 11:53 1 -> 'pipe:[1464283]'
lr-x------ 1 root root 64 Apr  7 11:53 10 -> 'pipe:[1472117]'
lrwx------ 1 root root 64 Apr  7 11:53 15 -> 'anon_inode:[pidfd]'
l-wx------ 1 root root 64 Apr  7 11:53 2 -> 'pipe:[1464284]'
lrwx------ 1 root root 64 Apr  7 11:53 3 -> 'socket:[1464984]'
lrwx------ 1 root root 64 Apr  7 11:53 4 -> 'anon_inode:[eventpoll]'
lrwx------ 1 root root 64 Apr  7 11:53 5 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr  7 11:53 6 -> 'socket:[1471284]'
lr-x------ 1 root root 64 Apr  7 11:53 8 -> 'pipe:[1472116]'
lrwx------ 1 root root 64 Apr  7 11:53 9 -> 'socket:[1475197]'
<!-- gh-comment-id:4198752399 --> @khteh commented on GitHub (Apr 7, 2026): ``` root@ollama-0:/var/log# ls -l /proc/56/fd total 0 lr-x------ 1 root root 64 Apr 7 11:53 0 -> /dev/null l-wx------ 1 root root 64 Apr 7 11:53 1 -> 'pipe:[1464283]' lr-x------ 1 root root 64 Apr 7 11:53 10 -> 'pipe:[1472117]' lrwx------ 1 root root 64 Apr 7 11:53 15 -> 'anon_inode:[pidfd]' l-wx------ 1 root root 64 Apr 7 11:53 2 -> 'pipe:[1464284]' lrwx------ 1 root root 64 Apr 7 11:53 3 -> 'socket:[1464984]' lrwx------ 1 root root 64 Apr 7 11:53 4 -> 'anon_inode:[eventpoll]' lrwx------ 1 root root 64 Apr 7 11:53 5 -> 'anon_inode:[eventfd]' lrwx------ 1 root root 64 Apr 7 11:53 6 -> 'socket:[1471284]' lr-x------ 1 root root 64 Apr 7 11:53 8 -> 'pipe:[1472116]' lrwx------ 1 root root 64 Apr 7 11:53 9 -> 'socket:[1475197]' ```
Author
Owner

@rick-github commented on GitHub (Apr 7, 2026):

What's the output of:

sudo lsof +E 2>&- | grep 1464284
<!-- gh-comment-id:4198810542 --> @rick-github commented on GitHub (Apr 7, 2026): What's the output of: ``` sudo lsof +E 2>&- | grep 1464284 ```
Author
Owner

@khteh commented on GitHub (Apr 7, 2026):

Empty

<!-- gh-comment-id:4198812336 --> @khteh commented on GitHub (Apr 7, 2026): Empty
Author
Owner

@rick-github commented on GitHub (Apr 7, 2026):

Outside of the container.

<!-- gh-comment-id:4198818797 --> @rick-github commented on GitHub (Apr 7, 2026): Outside of the container.
Author
Owner

@khteh commented on GitHub (Apr 7, 2026):

$ sudo lsof +E 2>&- | grep 1464284
container 462233                                 root   15r     FIFO               0,15        0t0    1464284 pipe 478165,run.sh,2w 478227,ollama,2w
container 462233 462235 container                root   15r     FIFO               0,15        0t0    1464284 pipe 478165,run.sh,2w 478227,ollama,2w
container 462233 462236 container                root   15r     FIFO               0,15        0t0    1464284 pipe 478165,run.sh,2w 478227,ollama,2w
container 462233 462237 container                root   15r     FIFO               0,15        0t0    1464284 pipe 478165,run.sh,2w 478227,ollama,2w
container 462233 462238 container                root   15r     FIFO               0,15        0t0    1464284 pipe 478165,run.sh,2w 478227,ollama,2w
container 462233 462239 container                root   15r     FIFO               0,15        0t0    1464284 pipe 478165,run.sh,2w 478227,ollama,2w
container 462233 462240 container                root   15r     FIFO               0,15        0t0    1464284 pipe 478165,run.sh,2w 478227,ollama,2w
container 462233 462241 container                root   15r     FIFO               0,15        0t0    1464284 pipe 478165,run.sh,2w 478227,ollama,2w
container 462233 462242 container                root   15r     FIFO               0,15        0t0    1464284 pipe 478165,run.sh,2w 478227,ollama,2w
container 462233 462243 container                root   15r     FIFO               0,15        0t0    1464284 pipe 478165,run.sh,2w 478227,ollama,2w
container 462233 463563 container                root   15r     FIFO               0,15        0t0    1464284 pipe 478165,run.sh,2w 478227,ollama,2w
container 462233 478365 container                root   15r     FIFO               0,15        0t0    1464284 pipe 478165,run.sh,2w 478227,ollama,2w
container 462233 478572 container                root   15r     FIFO               0,15        0t0    1464284 pipe 478165,run.sh,2w 478227,ollama,2w
container 462233 482266 container                root   15r     FIFO               0,15        0t0    1464284 pipe 478165,run.sh,2w 478227,ollama,2w
run.sh    478165                                 root    2w     FIFO               0,15        0t0    1464284 pipe 462233,container,15r 478227,ollama,2w
ollama    478227                                 root    2w     FIFO               0,15        0t0    1464284 pipe 462233,container,15r 478165,run.sh,2w
ollama    478227 478229 ollama                   root    2w     FIFO               0,15        0t0    1464284 pipe 462233,container,15r 478165,run.sh,2w
ollama    478227 478230 ollama                   root    2w     FIFO               0,15        0t0    1464284 pipe 462233,container,15r 478165,run.sh,2w
ollama    478227 478231 ollama                   root    2w     FIFO               0,15        0t0    1464284 pipe 462233,container,15r 478165,run.sh,2w
ollama    478227 478232 ollama                   root    2w     FIFO               0,15        0t0    1464284 pipe 462233,container,15r 478165,run.sh,2w
ollama    478227 478233 ollama                   root    2w     FIFO               0,15        0t0    1464284 pipe 462233,container,15r 478165,run.sh,2w
ollama    478227 478234 ollama                   root    2w     FIFO               0,15        0t0    1464284 pipe 462233,container,15r 478165,run.sh,2w
ollama    478227 478235 ollama                   root    2w     FIFO               0,15        0t0    1464284 pipe 462233,container,15r 478165,run.sh,2w
ollama    478227 478236 ollama                   root    2w     FIFO               0,15        0t0    1464284 pipe 462233,container,15r 478165,run.sh,2w
ollama    478227 478237 ollama                   root    2w     FIFO               0,15        0t0    1464284 pipe 462233,container,15r 478165,run.sh,2w
ollama    478227 478238 ollama                   root    2w     FIFO               0,15        0t0    1464284 pipe 462233,container,15r 478165,run.sh,2w
ollama    478227 478239 ollama                   root    2w     FIFO               0,15        0t0    1464284 pipe 462233,container,15r 478165,run.sh,2w
ollama    478227 478240 ollama                   root    2w     FIFO               0,15        0t0    1464284 pipe 462233,container,15r 478165,run.sh,2w
ollama    478227 478241 ollama                   root    2w     FIFO               0,15        0t0    1464284 pipe 462233,container,15r 478165,run.sh,2w
ollama    478227 478856 ollama                   root    2w     FIFO               0,15        0t0    1464284 pipe 462233,container,15r 478165,run.sh,2w
ollama    478227 478857 ollama                   root    2w     FIFO               0,15        0t0    1464284 pipe 462233,container,15r 478165,run.sh,2w
ollama    478227 480063 ollama                   root    2w     FIFO               0,15        0t0    1464284 pipe 462233,container,15r 478165,run.sh,2w
ollama    478227 480064 ollama                   root    2w     FIFO               0,15        0t0    1464284 pipe 462233,container,15r 478165,run.sh,2w
ollama    478227 480065 ollama                   root    2w     FIFO               0,15        0t0    1464284 pipe 462233,container,15r 478165,run.sh,2w
ollama    478227 480066 ollama                   root    2w     FIFO               0,15        0t0    1464284 pipe 462233,container,15r 478165,run.sh,2w
ollama    478227 480082 ollama                   root    2w     FIFO               0,15        0t0    1464284 pipe 462233,container,15r 478165,run.sh,2w
ollama    478227 480083 ollama                   root    2w     FIFO               0,15        0t0    1464284 pipe 462233,container,15r 478165,run.sh,2w
<!-- gh-comment-id:4198824525 --> @khteh commented on GitHub (Apr 7, 2026): ``` $ sudo lsof +E 2>&- | grep 1464284 container 462233 root 15r FIFO 0,15 0t0 1464284 pipe 478165,run.sh,2w 478227,ollama,2w container 462233 462235 container root 15r FIFO 0,15 0t0 1464284 pipe 478165,run.sh,2w 478227,ollama,2w container 462233 462236 container root 15r FIFO 0,15 0t0 1464284 pipe 478165,run.sh,2w 478227,ollama,2w container 462233 462237 container root 15r FIFO 0,15 0t0 1464284 pipe 478165,run.sh,2w 478227,ollama,2w container 462233 462238 container root 15r FIFO 0,15 0t0 1464284 pipe 478165,run.sh,2w 478227,ollama,2w container 462233 462239 container root 15r FIFO 0,15 0t0 1464284 pipe 478165,run.sh,2w 478227,ollama,2w container 462233 462240 container root 15r FIFO 0,15 0t0 1464284 pipe 478165,run.sh,2w 478227,ollama,2w container 462233 462241 container root 15r FIFO 0,15 0t0 1464284 pipe 478165,run.sh,2w 478227,ollama,2w container 462233 462242 container root 15r FIFO 0,15 0t0 1464284 pipe 478165,run.sh,2w 478227,ollama,2w container 462233 462243 container root 15r FIFO 0,15 0t0 1464284 pipe 478165,run.sh,2w 478227,ollama,2w container 462233 463563 container root 15r FIFO 0,15 0t0 1464284 pipe 478165,run.sh,2w 478227,ollama,2w container 462233 478365 container root 15r FIFO 0,15 0t0 1464284 pipe 478165,run.sh,2w 478227,ollama,2w container 462233 478572 container root 15r FIFO 0,15 0t0 1464284 pipe 478165,run.sh,2w 478227,ollama,2w container 462233 482266 container root 15r FIFO 0,15 0t0 1464284 pipe 478165,run.sh,2w 478227,ollama,2w run.sh 478165 root 2w FIFO 0,15 0t0 1464284 pipe 462233,container,15r 478227,ollama,2w ollama 478227 root 2w FIFO 0,15 0t0 1464284 pipe 462233,container,15r 478165,run.sh,2w ollama 478227 478229 ollama root 2w FIFO 0,15 0t0 1464284 pipe 462233,container,15r 478165,run.sh,2w ollama 478227 478230 ollama root 2w FIFO 0,15 0t0 1464284 pipe 462233,container,15r 478165,run.sh,2w ollama 478227 478231 ollama root 2w FIFO 0,15 0t0 1464284 pipe 462233,container,15r 478165,run.sh,2w ollama 478227 478232 ollama root 2w FIFO 0,15 0t0 1464284 pipe 462233,container,15r 478165,run.sh,2w ollama 478227 478233 ollama root 2w FIFO 0,15 0t0 1464284 pipe 462233,container,15r 478165,run.sh,2w ollama 478227 478234 ollama root 2w FIFO 0,15 0t0 1464284 pipe 462233,container,15r 478165,run.sh,2w ollama 478227 478235 ollama root 2w FIFO 0,15 0t0 1464284 pipe 462233,container,15r 478165,run.sh,2w ollama 478227 478236 ollama root 2w FIFO 0,15 0t0 1464284 pipe 462233,container,15r 478165,run.sh,2w ollama 478227 478237 ollama root 2w FIFO 0,15 0t0 1464284 pipe 462233,container,15r 478165,run.sh,2w ollama 478227 478238 ollama root 2w FIFO 0,15 0t0 1464284 pipe 462233,container,15r 478165,run.sh,2w ollama 478227 478239 ollama root 2w FIFO 0,15 0t0 1464284 pipe 462233,container,15r 478165,run.sh,2w ollama 478227 478240 ollama root 2w FIFO 0,15 0t0 1464284 pipe 462233,container,15r 478165,run.sh,2w ollama 478227 478241 ollama root 2w FIFO 0,15 0t0 1464284 pipe 462233,container,15r 478165,run.sh,2w ollama 478227 478856 ollama root 2w FIFO 0,15 0t0 1464284 pipe 462233,container,15r 478165,run.sh,2w ollama 478227 478857 ollama root 2w FIFO 0,15 0t0 1464284 pipe 462233,container,15r 478165,run.sh,2w ollama 478227 480063 ollama root 2w FIFO 0,15 0t0 1464284 pipe 462233,container,15r 478165,run.sh,2w ollama 478227 480064 ollama root 2w FIFO 0,15 0t0 1464284 pipe 462233,container,15r 478165,run.sh,2w ollama 478227 480065 ollama root 2w FIFO 0,15 0t0 1464284 pipe 462233,container,15r 478165,run.sh,2w ollama 478227 480066 ollama root 2w FIFO 0,15 0t0 1464284 pipe 462233,container,15r 478165,run.sh,2w ollama 478227 480082 ollama root 2w FIFO 0,15 0t0 1464284 pipe 462233,container,15r 478165,run.sh,2w ollama 478227 480083 ollama root 2w FIFO 0,15 0t0 1464284 pipe 462233,container,15r 478165,run.sh,2w ```
Author
Owner

@rick-github commented on GitHub (Apr 7, 2026):

So ollama is writing logging to stderr, which is being read by the container manager. Since there's no output from kubectl logs the question is what is the container manager doing with it. Presumably there's some configuration file you use when you start the service, does it contain any logging controls?

<!-- gh-comment-id:4198852301 --> @rick-github commented on GitHub (Apr 7, 2026): So ollama is writing logging to stderr, which is being read by the container manager. Since there's no output from `kubectl logs` the question is what is the container manager doing with it. Presumably there's some configuration file you use when you start the service, does it contain any logging controls?
Author
Owner
<!-- gh-comment-id:4198858020 --> @khteh commented on GitHub (Apr 7, 2026): https://github.com/khteh/Kubernetes/blob/master/Ollama/ollama_config.yml
Author
Owner

@rick-github commented on GitHub (Apr 7, 2026):

I don't know where your logs are but I've seen enough to make an educated guess as to why gemma:31b seems unresponsive.

  OLLAMA_FLASH_ATTENTION: "true"

Flash attention is enabled.

This is likely https://github.com/ollama/ollama/issues/15237. Disable FA until https://github.com/ollama/ollama/pull/15378 is merged and released.

<!-- gh-comment-id:4198898466 --> @rick-github commented on GitHub (Apr 7, 2026): I don't know where your logs are but I've seen enough to make an educated guess as to why gemma:31b seems unresponsive. ``` OLLAMA_FLASH_ATTENTION: "true" ``` Flash attention is enabled. This is likely https://github.com/ollama/ollama/issues/15237. Disable FA until https://github.com/ollama/ollama/pull/15378 is merged and released.
Author
Owner

@khteh commented on GitHub (Apr 7, 2026):

I only see ollama start-up log:

[ollama-0 ollama] time=2026-04-07T12:24:54.376Z level=DEBUG source=runner.go:264 msg="refreshing free memory"
[ollama-0 ollama] time=2026-04-07T12:24:54.376Z level=DEBUG source=runner.go:328 msg="unable to refresh all GPUs with existing runners, performing bootstrap discovery"
[ollama-0 ollama] time=2026-04-07T12:24:54.376Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 43877"
[ollama-0 ollama] time=2026-04-07T12:24:54.384Z level=DEBUG source=server.go:433 msg=subprocess OLLAMA_MODELS=/models OLLAMA_SCHED_SPREAD=true OLLAMA_HOST=http://0.0.0.0:11434 OLLAMA_CONTEXT_LENGTH=262144 OLLAMA_DEBUG=true LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13
[ollama-0 ollama] time=2026-04-07T12:24:54.633Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=257.23591ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extra_envs=map[]
[ollama-0 ollama] time=2026-04-07T12:24:54.633Z level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=257.368031ms
[ollama-0 ollama] time=2026-04-07T12:24:54.634Z level=WARN source=cpu_linux.go:130 msg="failed to parse CPU allowed micro secs" error="strconv.ParseInt: parsing \"max\": invalid syntax"
[ollama-0 ollama] time=2026-04-07T12:24:54.634Z level=DEBUG source=sched.go:220 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1
[ollama-0 ollama] time=2026-04-07T12:24:54.634Z level=DEBUG source=sched.go:229 msg="loading first model" model=/models/blobs/sha256-280af6832eca23cb322c4dcc65edfea98a21b8f8ab07dc7553bd6f7e6e7a3313
[ollama-0 ollama] time=2026-04-07T12:24:54.764Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32
[ollama-0 ollama] time=2026-04-07T12:24:54.813Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32
[ollama-0 ollama] time=2026-04-07T12:24:54.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.pooling_type default=0
[ollama-0 ollama] time=2026-04-07T12:24:54.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eot_token_id default=106
[ollama-0 ollama] time=2026-04-07T12:24:54.823Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
[ollama-0 ollama] time=2026-04-07T12:24:54.823Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.attention.global_head_count_kv default=0
[ollama-0 ollama] time=2026-04-07T12:24:54.823Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_count default=0
[ollama-0 ollama] time=2026-04-07T12:24:54.823Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_used_count default=0
[ollama-0 ollama] time=2026-04-07T12:24:54.823Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.block_count default=0
[ollama-0 ollama] time=2026-04-07T12:24:54.823Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.embedding_length default=0
[ollama-0 ollama] time=2026-04-07T12:24:54.823Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --model /models/blobs/sha256-280af6832eca23cb322c4dcc65edfea98a21b8f8ab07dc7553bd6f7e6e7a3313 --port 34163"
[ollama-0 ollama] time=2026-04-07T12:24:54.823Z level=DEBUG source=server.go:433 msg=subprocess OLLAMA_MODELS=/models OLLAMA_SCHED_SPREAD=true OLLAMA_HOST=http://0.0.0.0:11434 OLLAMA_CONTEXT_LENGTH=262144 OLLAMA_DEBUG=true LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13
[ollama-0 ollama] time=2026-04-07T12:24:54.824Z level=INFO source=sched.go:484 msg="system memory" total="68.1 GiB" free="67.8 GiB" free_swap="44.9 MiB"
[ollama-0 ollama] time=2026-04-07T12:24:54.824Z level=INFO source=sched.go:491 msg="gpu memory" id=GPU-9762feba-cea4-7981-7353-533400b79c72 library=CUDA available="3.1 GiB" free="3.5 GiB" minimum="457.0 MiB" overhead="0 B"
[ollama-0 ollama] time=2026-04-07T12:24:54.824Z level=INFO source=server.go:759 msg="loading model" "model layers"=61 requested=-1
[ollama-0 ollama] time=2026-04-07T12:24:54.834Z level=INFO source=runner.go:1417 msg="starting ollama engine"
[ollama-0 ollama] time=2026-04-07T12:24:54.834Z level=INFO source=runner.go:1452 msg="Server listening on 127.0.0.1:34163"
[ollama-0 ollama] time=2026-04-07T12:24:54.836Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:8 GPULayers:61[ID:GPU-9762feba-cea4-7981-7353-533400b79c72 Layers:61(0..60)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
[ollama-0 ollama] time=2026-04-07T12:24:54.890Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32
[ollama-0 ollama] time=2026-04-07T12:24:54.891Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.name default=""
[ollama-0 ollama] time=2026-04-07T12:24:54.891Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.description default=""
[ollama-0 ollama] time=2026-04-07T12:24:54.891Z level=INFO source=ggml.go:136 msg="" architecture=gemma4 file_type=Q4_K_M name="" description="" num_tensors=1189 num_key_values=49
[ollama-0 ollama] time=2026-04-07T12:24:54.891Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama
[ollama-0 ollama] load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so
[ollama-0 ollama] time=2026-04-07T12:24:54.896Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/cuda_v13
[ollama-0 ollama] ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
[ollama-0 ollama] ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
[ollama-0 ollama] ggml_cuda_init: found 1 CUDA devices:
[ollama-0 ollama]   Device 0: NVIDIA RTX A2000 Laptop GPU, compute capability 8.6, VMM: yes, ID: GPU-9762feba-cea4-7981-7353-533400b79c72
[ollama-0 ollama] load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v13/libggml-cuda.so
[ollama-0 ollama] time=2026-04-07T12:24:55.046Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
[ollama-0 ollama] time=2026-04-07T12:24:55.052Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.pooling_type default=0
[ollama-0 ollama] time=2026-04-07T12:24:55.052Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eot_token_id default=106
[ollama-0 ollama] time=2026-04-07T12:24:55.052Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
[ollama-0 ollama] time=2026-04-07T12:24:55.052Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.attention.global_head_count_kv default=0
[ollama-0 ollama] time=2026-04-07T12:24:55.052Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_count default=0
[ollama-0 ollama] time=2026-04-07T12:24:55.052Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_used_count default=0
[ollama-0 ollama] time=2026-04-07T12:24:55.052Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.block_count default=0
[ollama-0 ollama] time=2026-04-07T12:24:55.052Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.embedding_length default=0
[ollama-0 ollama] time=2026-04-07T12:24:55.086Z level=INFO source=model.go:138 msg="vision: decode" elapsed=2.08823ms bounds=(0,0)-(2048,2048)
[ollama-0 ollama] time=2026-04-07T12:24:55.199Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=112.653606ms size="[768 768]"
[ollama-0 ollama] time=2026-04-07T12:24:55.199Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
[ollama-0 ollama] time=2026-04-07T12:24:55.199Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
[ollama-0 ollama] time=2026-04-07T12:24:55.200Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=115.757163ms shape="[5376 256]"
[ollama-0 ollama] time=2026-04-07T12:24:58.009Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1272 splits=1
[ollama-0 ollama] [GIN] 2026/04/07 - 12:24:59 | 200 |      33.647µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] [GIN] 2026/04/07 - 12:24:59 | 200 |      24.938µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] [GIN] 2026/04/07 - 12:25:04 | 200 |      34.107µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] [GIN] 2026/04/07 - 12:25:04 | 200 |      30.816µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] [GIN] 2026/04/07 - 12:25:09 | 200 |       39.27µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] [GIN] 2026/04/07 - 12:25:09 | 200 |      29.365µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] [GIN] 2026/04/07 - 12:25:14 | 200 |      41.286µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] [GIN] 2026/04/07 - 12:25:14 | 200 |      34.408µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] [GIN] 2026/04/07 - 12:25:19 | 200 |      33.858µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] [GIN] 2026/04/07 - 12:25:19 | 200 |      20.956µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] time=2026-04-07T12:25:24.204Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2752 splits=2
[ollama-0 ollama] [GIN] 2026/04/07 - 12:25:24 | 200 |       56.72µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] [GIN] 2026/04/07 - 12:25:24 | 200 |      56.806µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] [GIN] 2026/04/07 - 12:25:29 | 200 |      32.961µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] [GIN] 2026/04/07 - 12:25:29 | 200 |      20.627µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] time=2026-04-07T12:25:33.340Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2750 splits=2
[ollama-0 ollama] time=2026-04-07T12:25:33.341Z level=DEBUG source=device.go:240 msg="model weights" device=CUDA0 size="18.4 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:33.341Z level=DEBUG source=device.go:245 msg="model weights" device=CPU size="1.2 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:33.341Z level=DEBUG source=device.go:251 msg="kv cache" device=CUDA0 size="23.5 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:33.341Z level=DEBUG source=device.go:262 msg="compute graph" device=CUDA0 size="24.6 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:33.341Z level=DEBUG source=device.go:267 msg="compute graph" device=CPU size="10.5 MiB"
[ollama-0 ollama] time=2026-04-07T12:25:33.341Z level=DEBUG source=device.go:272 msg="total memory" size="67.7 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:33.341Z level=DEBUG source=server.go:784 msg=memory success=true required.InputWeights=1250426880 required.CPU.Graph=11010048 required.CUDA0.ID=GPU-9762feba-cea4-7981-7353-533400b79c72 required.CUDA0.Weights="[304974208 304974208 304974208 304974208 304974208 330264704 304974208 275169664 275169664 304974208 269492608 300460160 299297152 275169664 269492608 299297152 275169664 300460160 299297152 269492608 275169664 299297152 269492608 300460160 304974208 269492608 269492608 304974208 269492608 300460160 299297152 275169664 269492608 299297152 275169664 300460160 299297152 269492608 275169664 299297152 269492608 300460160 304974208 269492608 269492608 304974208 269492608 300460160 299297152 275169664 269492608 299297152 304974208 330264704 299297152 299297152 304974208 299297152 299297152 330264704 2260644352]" required.CUDA0.Cache="[75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 0]" required.CUDA0.Graph=26440636544
[ollama-0 ollama] time=2026-04-07T12:25:33.341Z level=DEBUG source=server.go:978 msg="available gpu" id=GPU-9762feba-cea4-7981-7353-533400b79c72 library=CUDA "available layer vram"="0 B" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="24.6 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:33.341Z level=DEBUG source=server.go:1059 msg="insufficient VRAM to load any model layers"
[ollama-0 ollama] time=2026-04-07T12:25:33.341Z level=DEBUG source=server.go:795 msg="new layout created" layers=[]
[ollama-0 ollama] time=2026-04-07T12:25:33.341Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
[ollama-0 ollama] time=2026-04-07T12:25:33.398Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32
[ollama-0 ollama] time=2026-04-07T12:25:33.406Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.pooling_type default=0
[ollama-0 ollama] time=2026-04-07T12:25:33.406Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eot_token_id default=106
[ollama-0 ollama] time=2026-04-07T12:25:33.407Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
[ollama-0 ollama] time=2026-04-07T12:25:33.407Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.attention.global_head_count_kv default=0
[ollama-0 ollama] time=2026-04-07T12:25:33.407Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_count default=0
[ollama-0 ollama] time=2026-04-07T12:25:33.407Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_used_count default=0
[ollama-0 ollama] time=2026-04-07T12:25:33.407Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.block_count default=0
[ollama-0 ollama] time=2026-04-07T12:25:33.407Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.embedding_length default=0
[ollama-0 ollama] time=2026-04-07T12:25:33.423Z level=INFO source=model.go:138 msg="vision: decode" elapsed=3.65082ms bounds=(0,0)-(2048,2048)
[ollama-0 ollama] time=2026-04-07T12:25:33.552Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=129.412007ms size="[768 768]"
[ollama-0 ollama] time=2026-04-07T12:25:33.552Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
[ollama-0 ollama] time=2026-04-07T12:25:33.552Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
[ollama-0 ollama] time=2026-04-07T12:25:33.553Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=133.987785ms shape="[5376 256]"
[ollama-0 ollama] time=2026-04-07T12:25:33.555Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1272 splits=1
[ollama-0 ollama] time=2026-04-07T12:25:34.203Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2752 splits=1
[ollama-0 ollama] time=2026-04-07T12:25:34.207Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2750 splits=1
[ollama-0 ollama] time=2026-04-07T12:25:34.208Z level=DEBUG source=device.go:245 msg="model weights" device=CPU size="19.6 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:34.208Z level=DEBUG source=device.go:256 msg="kv cache" device=CPU size="23.5 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:34.208Z level=DEBUG source=device.go:267 msg="compute graph" device=CPU size="16.1 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:34.208Z level=DEBUG source=device.go:272 msg="total memory" size="59.2 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:34.208Z level=DEBUG source=server.go:784 msg=memory success=true required.InputWeights=1250426880 required.CPU.Weights="[304972832 304972832 304972832 304972832 304972832 330263584 304972832 275168288 275168288 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 304972832 330263584 299295776 299295776 304972832 299295776 299295776 330263584 2260638912]" required.CPU.Cache="[75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 0]" required.CPU.Graph=17280008192
[ollama-0 ollama] time=2026-04-07T12:25:34.208Z level=DEBUG source=server.go:978 msg="available gpu" id=GPU-9762feba-cea4-7981-7353-533400b79c72 library=CUDA "available layer vram"="3.1 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="0 B"
[ollama-0 ollama] time=2026-04-07T12:25:34.208Z level=DEBUG source=server.go:795 msg="new layout created" layers="3[ID:GPU-9762feba-cea4-7981-7353-533400b79c72 Layers:3(57..59)]"
[ollama-0 ollama] time=2026-04-07T12:25:34.208Z level=DEBUG source=server.go:820 msg="exploring intermediate layers" layer=2
[ollama-0 ollama] time=2026-04-07T12:25:34.208Z level=DEBUG source=server.go:978 msg="available gpu" id=GPU-9762feba-cea4-7981-7353-533400b79c72 library=CUDA "available layer vram"="3.1 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="0 B"
[ollama-0 ollama] time=2026-04-07T12:25:34.208Z level=DEBUG source=server.go:828 msg="new layout created" layers="2[ID:GPU-9762feba-cea4-7981-7353-533400b79c72 Layers:2(58..59)]"
[ollama-0 ollama] time=2026-04-07T12:25:34.208Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:8 GPULayers:2[ID:GPU-9762feba-cea4-7981-7353-533400b79c72 Layers:2(58..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
[ollama-0 ollama] time=2026-04-07T12:25:34.264Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32
[ollama-0 ollama] time=2026-04-07T12:25:34.271Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.pooling_type default=0
[ollama-0 ollama] time=2026-04-07T12:25:34.271Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eot_token_id default=106
[ollama-0 ollama] time=2026-04-07T12:25:34.272Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
[ollama-0 ollama] time=2026-04-07T12:25:34.272Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.attention.global_head_count_kv default=0
[ollama-0 ollama] time=2026-04-07T12:25:34.272Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_count default=0
[ollama-0 ollama] time=2026-04-07T12:25:34.272Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_used_count default=0
[ollama-0 ollama] time=2026-04-07T12:25:34.272Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.block_count default=0
[ollama-0 ollama] time=2026-04-07T12:25:34.272Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.embedding_length default=0
[ollama-0 ollama] [GIN] 2026/04/07 - 12:25:34 | 200 |      34.019µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] [GIN] 2026/04/07 - 12:25:34 | 200 |      48.054µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] time=2026-04-07T12:25:34.286Z level=INFO source=model.go:138 msg="vision: decode" elapsed=833.65µs bounds=(0,0)-(2048,2048)
[ollama-0 ollama] time=2026-04-07T12:25:34.411Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=124.942791ms size="[768 768]"
[ollama-0 ollama] time=2026-04-07T12:25:34.411Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
[ollama-0 ollama] time=2026-04-07T12:25:34.411Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
[ollama-0 ollama] time=2026-04-07T12:25:34.412Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=126.806024ms shape="[5376 256]"
[ollama-0 ollama] time=2026-04-07T12:25:34.417Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1272 splits=355
[ollama-0 ollama] time=2026-04-07T12:25:35.188Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2752 splits=940
[ollama-0 ollama] time=2026-04-07T12:25:35.193Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2750 splits=3
[ollama-0 ollama] time=2026-04-07T12:25:35.194Z level=DEBUG source=device.go:240 msg="model weights" device=CUDA0 size="600.4 MiB"
[ollama-0 ollama] time=2026-04-07T12:25:35.194Z level=DEBUG source=device.go:245 msg="model weights" device=CPU size="19.0 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:35.194Z level=DEBUG source=device.go:251 msg="kv cache" device=CUDA0 size="2.1 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:35.194Z level=DEBUG source=device.go:256 msg="kv cache" device=CPU size="21.4 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:35.194Z level=DEBUG source=device.go:262 msg="compute graph" device=CUDA0 size="26.7 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:35.194Z level=DEBUG source=device.go:267 msg="compute graph" device=CPU size="32.2 MiB"
[ollama-0 ollama] time=2026-04-07T12:25:35.194Z level=DEBUG source=device.go:272 msg="total memory" size="69.8 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:35.194Z level=DEBUG source=server.go:837 msg=memory success=true required.InputWeights=1250426880 required.CPU.Weights="[304972832 304972832 304972832 304972832 304972832 330263584 304972832 275168288 275168288 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 304972832 330263584 299295776 299295776 304972832 299295776 0 0 2260638912]" required.CPU.Cache="[75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 0 0 0]" required.CPU.Graph=33728512 required.CUDA0.ID=GPU-9762feba-cea4-7981-7353-533400b79c72 required.CUDA0.Weights="[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 299297152 330264704 0]" required.CUDA0.Cache="[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 75497472 2147483648 0]" required.CUDA0.Graph=28661715968
[ollama-0 ollama] time=2026-04-07T12:25:35.194Z level=DEBUG source=server.go:978 msg="available gpu" id=GPU-9762feba-cea4-7981-7353-533400b79c72 library=CUDA "available layer vram"="0 B" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="26.7 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:35.194Z level=DEBUG source=server.go:1059 msg="insufficient VRAM to load any model layers"
[ollama-0 ollama] time=2026-04-07T12:25:35.194Z level=DEBUG source=server.go:845 msg="verifying layout" layers=[]
[ollama-0 ollama] time=2026-04-07T12:25:35.194Z level=DEBUG source=server.go:820 msg="exploring intermediate layers" layer=1
[ollama-0 ollama] time=2026-04-07T12:25:35.194Z level=DEBUG source=server.go:978 msg="available gpu" id=GPU-9762feba-cea4-7981-7353-533400b79c72 library=CUDA "available layer vram"="3.1 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="0 B"
[ollama-0 ollama] time=2026-04-07T12:25:35.194Z level=DEBUG source=server.go:828 msg="new layout created" layers="1[ID:GPU-9762feba-cea4-7981-7353-533400b79c72 Layers:1(59..59)]"
[ollama-0 ollama] time=2026-04-07T12:25:35.195Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:8 GPULayers:1[ID:GPU-9762feba-cea4-7981-7353-533400b79c72 Layers:1(59..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
[ollama-0 ollama] time=2026-04-07T12:25:35.253Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32
[ollama-0 ollama] time=2026-04-07T12:25:35.260Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.pooling_type default=0
[ollama-0 ollama] time=2026-04-07T12:25:35.260Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eot_token_id default=106
[ollama-0 ollama] time=2026-04-07T12:25:35.261Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
[ollama-0 ollama] time=2026-04-07T12:25:35.261Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.attention.global_head_count_kv default=0
[ollama-0 ollama] time=2026-04-07T12:25:35.261Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_count default=0
[ollama-0 ollama] time=2026-04-07T12:25:35.261Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_used_count default=0
[ollama-0 ollama] time=2026-04-07T12:25:35.261Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.block_count default=0
[ollama-0 ollama] time=2026-04-07T12:25:35.261Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.embedding_length default=0
[ollama-0 ollama] time=2026-04-07T12:25:35.282Z level=INFO source=model.go:138 msg="vision: decode" elapsed=770.123µs bounds=(0,0)-(2048,2048)
[ollama-0 ollama] time=2026-04-07T12:25:35.446Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=163.773076ms size="[768 768]"
[ollama-0 ollama] time=2026-04-07T12:25:35.446Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
[ollama-0 ollama] time=2026-04-07T12:25:35.446Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
[ollama-0 ollama] time=2026-04-07T12:25:35.447Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=166.162382ms shape="[5376 256]"
[ollama-0 ollama] time=2026-04-07T12:25:35.453Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1272 splits=355
[ollama-0 ollama] time=2026-04-07T12:25:36.276Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2752 splits=956
[ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2750 splits=3
[ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=device.go:240 msg="model weights" device=CUDA0 size="315.0 MiB"
[ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=device.go:245 msg="model weights" device=CPU size="19.3 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=device.go:251 msg="kv cache" device=CUDA0 size="2.0 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=device.go:256 msg="kv cache" device=CPU size="21.5 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=device.go:262 msg="compute graph" device=CUDA0 size="26.7 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=device.go:267 msg="compute graph" device=CPU size="32.2 MiB"
[ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=device.go:272 msg="total memory" size="69.8 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=server.go:837 msg=memory success=true required.InputWeights=1250426880 required.CPU.Weights="[304972832 304972832 304972832 304972832 304972832 330263584 304972832 275168288 275168288 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 304972832 330263584 299295776 299295776 304972832 299295776 299295776 0 2260638912]" required.CPU.Cache="[75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 0 0]" required.CPU.Graph=33728512 required.CUDA0.ID=GPU-9762feba-cea4-7981-7353-533400b79c72 required.CUDA0.Weights="[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 330264704 0]" required.CUDA0.Cache="[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2147483648 0]" required.CUDA0.Graph=28661715968
[ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=server.go:978 msg="available gpu" id=GPU-9762feba-cea4-7981-7353-533400b79c72 library=CUDA "available layer vram"="0 B" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="26.7 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=server.go:1059 msg="insufficient VRAM to load any model layers"
[ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=server.go:845 msg="verifying layout" layers=[]
[ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=server.go:820 msg="exploring intermediate layers" layer=0
[ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=server.go:978 msg="available gpu" id=GPU-9762feba-cea4-7981-7353-533400b79c72 library=CUDA "available layer vram"="3.1 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="0 B"
[ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=server.go:1059 msg="insufficient VRAM to load any model layers"
[ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=server.go:828 msg="new layout created" layers=[]
[ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
[ollama-0 ollama] time=2026-04-07T12:25:36.336Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32
[ollama-0 ollama] time=2026-04-07T12:25:36.344Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.pooling_type default=0
[ollama-0 ollama] time=2026-04-07T12:25:36.344Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eot_token_id default=106
[ollama-0 ollama] time=2026-04-07T12:25:36.345Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
[ollama-0 ollama] time=2026-04-07T12:25:36.345Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.attention.global_head_count_kv default=0
[ollama-0 ollama] time=2026-04-07T12:25:36.345Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_count default=0
[ollama-0 ollama] time=2026-04-07T12:25:36.345Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_used_count default=0
[ollama-0 ollama] time=2026-04-07T12:25:36.345Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.block_count default=0
[ollama-0 ollama] time=2026-04-07T12:25:36.345Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.embedding_length default=0
[ollama-0 ollama] time=2026-04-07T12:25:36.362Z level=INFO source=model.go:138 msg="vision: decode" elapsed=1.145509ms bounds=(0,0)-(2048,2048)
[ollama-0 ollama] time=2026-04-07T12:25:36.503Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=140.876134ms size="[768 768]"
[ollama-0 ollama] time=2026-04-07T12:25:36.503Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
[ollama-0 ollama] time=2026-04-07T12:25:36.503Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
[ollama-0 ollama] time=2026-04-07T12:25:36.504Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=142.98229ms shape="[5376 256]"
[ollama-0 ollama] time=2026-04-07T12:25:36.505Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1272 splits=1
[ollama-0 ollama] time=2026-04-07T12:25:37.338Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2752 splits=1
[ollama-0 ollama] time=2026-04-07T12:25:37.342Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2750 splits=1
[ollama-0 ollama] time=2026-04-07T12:25:37.342Z level=DEBUG source=device.go:245 msg="model weights" device=CPU size="19.6 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:37.342Z level=DEBUG source=device.go:256 msg="kv cache" device=CPU size="23.5 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:37.342Z level=DEBUG source=device.go:267 msg="compute graph" device=CPU size="16.1 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:37.342Z level=DEBUG source=device.go:272 msg="total memory" size="59.2 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:37.342Z level=DEBUG source=server.go:837 msg=memory success=true required.InputWeights=1250426880 required.CPU.Weights="[304972832 304972832 304972832 304972832 304972832 330263584 304972832 275168288 275168288 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 304972832 330263584 299295776 299295776 304972832 299295776 299295776 330263584 2260638912]" required.CPU.Cache="[75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 0]" required.CPU.Graph=17280008192
[ollama-0 ollama] time=2026-04-07T12:25:37.342Z level=DEBUG source=server.go:978 msg="available gpu" id=GPU-9762feba-cea4-7981-7353-533400b79c72 library=CUDA "available layer vram"="3.1 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="0 B"
[ollama-0 ollama] time=2026-04-07T12:25:37.342Z level=DEBUG source=server.go:845 msg="verifying layout" layers="3[ID:GPU-9762feba-cea4-7981-7353-533400b79c72 Layers:3(57..59)]"
[ollama-0 ollama] time=2026-04-07T12:25:37.343Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
[ollama-0 ollama] time=2026-04-07T12:25:37.404Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32
[ollama-0 ollama] time=2026-04-07T12:25:37.424Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.pooling_type default=0
[ollama-0 ollama] time=2026-04-07T12:25:37.424Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eot_token_id default=106
[ollama-0 ollama] time=2026-04-07T12:25:37.425Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
[ollama-0 ollama] time=2026-04-07T12:25:37.425Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.attention.global_head_count_kv default=0
[ollama-0 ollama] time=2026-04-07T12:25:37.425Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_count default=0
[ollama-0 ollama] time=2026-04-07T12:25:37.425Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_used_count default=0
[ollama-0 ollama] time=2026-04-07T12:25:37.425Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.block_count default=0
[ollama-0 ollama] time=2026-04-07T12:25:37.425Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.embedding_length default=0
[ollama-0 ollama] time=2026-04-07T12:25:37.451Z level=INFO source=model.go:138 msg="vision: decode" elapsed=1.711867ms bounds=(0,0)-(2048,2048)
[ollama-0 ollama] time=2026-04-07T12:25:37.601Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=150.209742ms size="[768 768]"
[ollama-0 ollama] time=2026-04-07T12:25:37.604Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
[ollama-0 ollama] time=2026-04-07T12:25:37.604Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
[ollama-0 ollama] time=2026-04-07T12:25:37.605Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=156.28355ms shape="[5376 256]"
[ollama-0 ollama] time=2026-04-07T12:25:37.607Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1272 splits=1
[ollama-0 ollama] [GIN] 2026/04/07 - 12:25:39 | 200 |       39.85µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] [GIN] 2026/04/07 - 12:25:39 | 200 |      43.372µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] [GIN] 2026/04/07 - 12:25:44 | 200 |      40.613µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] [GIN] 2026/04/07 - 12:25:44 | 200 |      20.983µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] time=2026-04-07T12:25:46.509Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2752 splits=1
[ollama-0 ollama] time=2026-04-07T12:25:46.557Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2750 splits=1
[ollama-0 ollama] time=2026-04-07T12:25:46.557Z level=DEBUG source=device.go:245 msg="model weights" device=CPU size="19.6 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:46.557Z level=DEBUG source=device.go:256 msg="kv cache" device=CPU size="23.5 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:46.557Z level=DEBUG source=device.go:267 msg="compute graph" device=CPU size="16.1 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:46.557Z level=DEBUG source=device.go:272 msg="total memory" size="59.2 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:46.557Z level=DEBUG source=server.go:784 msg=memory success=true required.InputWeights=1250426880 required.CPU.Weights="[304972832 304972832 304972832 304972832 304972832 330263584 304972832 275168288 275168288 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 304972832 330263584 299295776 299295776 304972832 299295776 299295776 330263584 2260638912]" required.CPU.Cache="[75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 0]" required.CPU.Graph=17280008192
[ollama-0 ollama] time=2026-04-07T12:25:46.557Z level=DEBUG source=server.go:978 msg="available gpu" id=GPU-9762feba-cea4-7981-7353-533400b79c72 library=CUDA "available layer vram"="3.1 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="0 B"
[ollama-0 ollama] time=2026-04-07T12:25:46.557Z level=DEBUG source=server.go:795 msg="new layout created" layers="3[ID:GPU-9762feba-cea4-7981-7353-533400b79c72 Layers:3(57..59)]"
[ollama-0 ollama] time=2026-04-07T12:25:46.557Z level=DEBUG source=server.go:820 msg="exploring intermediate layers" layer=2
[ollama-0 ollama] time=2026-04-07T12:25:46.557Z level=DEBUG source=server.go:978 msg="available gpu" id=GPU-9762feba-cea4-7981-7353-533400b79c72 library=CUDA "available layer vram"="3.1 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="0 B"
[ollama-0 ollama] time=2026-04-07T12:25:46.557Z level=DEBUG source=server.go:828 msg="new layout created" layers="2[ID:GPU-9762feba-cea4-7981-7353-533400b79c72 Layers:2(58..59)]"
[ollama-0 ollama] time=2026-04-07T12:25:46.558Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:8 GPULayers:2[ID:GPU-9762feba-cea4-7981-7353-533400b79c72 Layers:2(58..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
[ollama-0 ollama] time=2026-04-07T12:25:47.491Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32
[ollama-0 ollama] time=2026-04-07T12:25:47.510Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.pooling_type default=0
[ollama-0 ollama] time=2026-04-07T12:25:47.510Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eot_token_id default=106
[ollama-0 ollama] time=2026-04-07T12:25:47.511Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
[ollama-0 ollama] time=2026-04-07T12:25:47.512Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.attention.global_head_count_kv default=0
[ollama-0 ollama] time=2026-04-07T12:25:47.512Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_count default=0
[ollama-0 ollama] time=2026-04-07T12:25:47.512Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_used_count default=0
[ollama-0 ollama] time=2026-04-07T12:25:47.512Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.block_count default=0
[ollama-0 ollama] time=2026-04-07T12:25:47.512Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.embedding_length default=0
[ollama-0 ollama] time=2026-04-07T12:25:47.533Z level=INFO source=model.go:138 msg="vision: decode" elapsed=1.775305ms bounds=(0,0)-(2048,2048)
[ollama-0 ollama] time=2026-04-07T12:25:47.710Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=177.372135ms size="[768 768]"
[ollama-0 ollama] time=2026-04-07T12:25:47.714Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
[ollama-0 ollama] time=2026-04-07T12:25:47.714Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
[ollama-0 ollama] time=2026-04-07T12:25:47.714Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=183.203596ms shape="[5376 256]"
[ollama-0 ollama] time=2026-04-07T12:25:47.729Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1272 splits=355
[ollama-0 ollama] [GIN] 2026/04/07 - 12:25:49 | 200 |      38.499µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] [GIN] 2026/04/07 - 12:25:49 | 200 |      60.868µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] [GIN] 2026/04/07 - 12:25:54 | 200 |      53.539µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] [GIN] 2026/04/07 - 12:25:54 | 200 |      47.436µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] ggml_backend_cuda_buffer_type_alloc_buffer: allocating 19123.94 MiB on device 0: cudaMalloc failed: out of memory
[ollama-0 ollama] ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 20052907008
[ollama-0 ollama] time=2026-04-07T12:25:55.955Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2752 splits=940
[ollama-0 ollama] time=2026-04-07T12:25:56.924Z level=DEBUG source=device.go:240 msg="model weights" device=CUDA0 size="600.4 MiB"
[ollama-0 ollama] time=2026-04-07T12:25:56.924Z level=DEBUG source=device.go:245 msg="model weights" device=CPU size="19.0 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:56.924Z level=DEBUG source=device.go:251 msg="kv cache" device=CUDA0 size="2.1 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:56.924Z level=DEBUG source=device.go:256 msg="kv cache" device=CPU size="21.4 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:56.924Z level=DEBUG source=device.go:262 msg="compute graph" device=CUDA0 size="18.7 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:56.924Z level=DEBUG source=device.go:267 msg="compute graph" device=CPU size="16.0 MiB"
[ollama-0 ollama] time=2026-04-07T12:25:56.924Z level=DEBUG source=device.go:272 msg="total memory" size="61.8 GiB"
[ollama-0 ollama] time=2026-04-07T12:25:56.924Z level=DEBUG source=server.go:837 msg=memory success=false required.InputWeights=1250426880 required.CPU.Weights="[304972832 304972832 304972832 304972832 304972832 330263584 304972832 275168288 275168288 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 304972832 330263584 299295776 299295776 304972832 299295776 0 0 2260638912]" required.CPU.Cache="[75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 0 0 0]" required.CPU.Graph=16777216 required.CUDA0.ID=GPU-9762feba-cea4-7981-7353-533400b79c72 required.CUDA0.Weights="[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 299297152 330264704 0]" required.CUDA0.Cache="[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 75497472 2147483648 0]" required.CUDA0.Graph=20078072832
[ollama-0 ollama] time=2026-04-07T12:25:56.924Z level=DEBUG source=server.go:820 msg="exploring intermediate layers" layer=1
[ollama-0 ollama] time=2026-04-07T12:25:56.924Z level=DEBUG source=server.go:978 msg="available gpu" id=GPU-9762feba-cea4-7981-7353-533400b79c72 library=CUDA "available layer vram"="3.1 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="0 B"
[ollama-0 ollama] time=2026-04-07T12:25:56.924Z level=DEBUG source=server.go:828 msg="new layout created" layers="1[ID:GPU-9762feba-cea4-7981-7353-533400b79c72 Layers:1(59..59)]"
[ollama-0 ollama] time=2026-04-07T12:25:56.924Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:8 GPULayers:1[ID:GPU-9762feba-cea4-7981-7353-533400b79c72 Layers:1(59..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
[ollama-0 ollama] time=2026-04-07T12:25:56.978Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32
[ollama-0 ollama] time=2026-04-07T12:25:56.987Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.pooling_type default=0
[ollama-0 ollama] time=2026-04-07T12:25:56.987Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eot_token_id default=106
[ollama-0 ollama] time=2026-04-07T12:25:56.988Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
[ollama-0 ollama] time=2026-04-07T12:25:56.988Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.attention.global_head_count_kv default=0
[ollama-0 ollama] time=2026-04-07T12:25:56.988Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_count default=0
[ollama-0 ollama] time=2026-04-07T12:25:56.988Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_used_count default=0
[ollama-0 ollama] time=2026-04-07T12:25:56.988Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.block_count default=0
[ollama-0 ollama] time=2026-04-07T12:25:56.988Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.embedding_length default=0
[ollama-0 ollama] time=2026-04-07T12:25:57.006Z level=INFO source=model.go:138 msg="vision: decode" elapsed=2.345549ms bounds=(0,0)-(2048,2048)
[ollama-0 ollama] time=2026-04-07T12:25:57.150Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=143.368125ms size="[768 768]"
[ollama-0 ollama] time=2026-04-07T12:25:57.153Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
[ollama-0 ollama] time=2026-04-07T12:25:57.153Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
[ollama-0 ollama] time=2026-04-07T12:25:57.154Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=150.467504ms shape="[5376 256]"
[ollama-0 ollama] time=2026-04-07T12:25:57.161Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1272 splits=355
[ollama-0 ollama] [GIN] 2026/04/07 - 12:25:59 | 200 |      39.271µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] [GIN] 2026/04/07 - 12:25:59 | 200 |      28.811µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] [GIN] 2026/04/07 - 12:26:04 | 200 |      58.757µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] [GIN] 2026/04/07 - 12:26:04 | 200 |      21.524µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] ggml_backend_cuda_buffer_type_alloc_buffer: allocating 19123.94 MiB on device 0: cudaMalloc failed: out of memory
[ollama-0 ollama] ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 20052907008
[ollama-0 ollama] time=2026-04-07T12:26:05.321Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2752 splits=956
[ollama-0 ollama] time=2026-04-07T12:26:06.236Z level=DEBUG source=device.go:240 msg="model weights" device=CUDA0 size="315.0 MiB"
[ollama-0 ollama] time=2026-04-07T12:26:06.236Z level=DEBUG source=device.go:245 msg="model weights" device=CPU size="19.3 GiB"
[ollama-0 ollama] time=2026-04-07T12:26:06.236Z level=DEBUG source=device.go:251 msg="kv cache" device=CUDA0 size="2.0 GiB"
[ollama-0 ollama] time=2026-04-07T12:26:06.236Z level=DEBUG source=device.go:256 msg="kv cache" device=CPU size="21.5 GiB"
[ollama-0 ollama] time=2026-04-07T12:26:06.236Z level=DEBUG source=device.go:262 msg="compute graph" device=CUDA0 size="18.7 GiB"
[ollama-0 ollama] time=2026-04-07T12:26:06.236Z level=DEBUG source=device.go:267 msg="compute graph" device=CPU size="16.0 MiB"
[ollama-0 ollama] time=2026-04-07T12:26:06.236Z level=DEBUG source=device.go:272 msg="total memory" size="61.8 GiB"
[ollama-0 ollama] time=2026-04-07T12:26:06.236Z level=DEBUG source=server.go:837 msg=memory success=false required.InputWeights=1250426880 required.CPU.Weights="[304972832 304972832 304972832 304972832 304972832 330263584 304972832 275168288 275168288 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 304972832 330263584 299295776 299295776 304972832 299295776 299295776 0 2260638912]" required.CPU.Cache="[75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 0 0]" required.CPU.Graph=16777216 required.CUDA0.ID=GPU-9762feba-cea4-7981-7353-533400b79c72 required.CUDA0.Weights="[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 330264704 0]" required.CUDA0.Cache="[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2147483648 0]" required.CUDA0.Graph=20078072832
[ollama-0 ollama] time=2026-04-07T12:26:06.236Z level=DEBUG source=server.go:820 msg="exploring intermediate layers" layer=0
[ollama-0 ollama] time=2026-04-07T12:26:06.236Z level=DEBUG source=server.go:978 msg="available gpu" id=GPU-9762feba-cea4-7981-7353-533400b79c72 library=CUDA "available layer vram"="3.1 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="0 B"
[ollama-0 ollama] time=2026-04-07T12:26:06.236Z level=DEBUG source=server.go:1059 msg="insufficient VRAM to load any model layers"
[ollama-0 ollama] time=2026-04-07T12:26:06.236Z level=DEBUG source=server.go:828 msg="new layout created" layers=[]
[ollama-0 ollama] time=2026-04-07T12:26:06.236Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
[ollama-0 ollama] time=2026-04-07T12:26:06.291Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32
[ollama-0 ollama] time=2026-04-07T12:26:06.298Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.pooling_type default=0
[ollama-0 ollama] time=2026-04-07T12:26:06.298Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eot_token_id default=106
[ollama-0 ollama] time=2026-04-07T12:26:06.299Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
[ollama-0 ollama] time=2026-04-07T12:26:06.300Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.attention.global_head_count_kv default=0
[ollama-0 ollama] time=2026-04-07T12:26:06.300Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_count default=0
[ollama-0 ollama] time=2026-04-07T12:26:06.300Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_used_count default=0
[ollama-0 ollama] time=2026-04-07T12:26:06.300Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.block_count default=0
[ollama-0 ollama] time=2026-04-07T12:26:06.300Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.embedding_length default=0
[ollama-0 ollama] time=2026-04-07T12:26:06.321Z level=INFO source=model.go:138 msg="vision: decode" elapsed=1.773695ms bounds=(0,0)-(2048,2048)
[ollama-0 ollama] time=2026-04-07T12:26:06.470Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=149.050579ms size="[768 768]"
[ollama-0 ollama] time=2026-04-07T12:26:06.472Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
[ollama-0 ollama] time=2026-04-07T12:26:06.472Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
[ollama-0 ollama] time=2026-04-07T12:26:06.473Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=153.434664ms shape="[5376 256]"
[ollama-0 ollama] time=2026-04-07T12:26:06.474Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1272 splits=1
[ollama-0 ollama] [GIN] 2026/04/07 - 12:26:09 | 200 |      32.004µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] [GIN] 2026/04/07 - 12:26:09 | 200 |      32.395µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] [GIN] 2026/04/07 - 12:26:14 | 200 |      36.028µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] [GIN] 2026/04/07 - 12:26:14 | 200 |      47.483µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] time=2026-04-07T12:26:15.272Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2752 splits=1
[ollama-0 ollama] time=2026-04-07T12:26:15.326Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2750 splits=1
[ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=DEBUG source=device.go:245 msg="model weights" device=CPU size="19.6 GiB"
[ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=DEBUG source=device.go:256 msg="kv cache" device=CPU size="23.5 GiB"
[ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=DEBUG source=device.go:267 msg="compute graph" device=CPU size="16.1 GiB"
[ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=DEBUG source=device.go:272 msg="total memory" size="59.2 GiB"
[ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=DEBUG source=server.go:837 msg=memory success=true required.InputWeights=1250426880 required.CPU.Weights="[304972832 304972832 304972832 304972832 304972832 330263584 304972832 275168288 275168288 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 304972832 330263584 299295776 299295776 304972832 299295776 299295776 330263584 2260638912]" required.CPU.Cache="[75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 0]" required.CPU.Graph=17280008192
[ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=DEBUG source=server.go:978 msg="available gpu" id=GPU-9762feba-cea4-7981-7353-533400b79c72 library=CUDA "available layer vram"="3.1 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="0 B"
[ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=DEBUG source=server.go:845 msg="verifying layout" layers="3[ID:GPU-9762feba-cea4-7981-7353-533400b79c72 Layers:3(57..59)]"
[ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=INFO source=runner.go:1290 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
[ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU"
[ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
[ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=INFO source=ggml.go:494 msg="offloaded 0/61 layers to GPU"
[ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=INFO source=device.go:245 msg="model weights" device=CPU size="19.6 GiB"
[ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=INFO source=device.go:256 msg="kv cache" device=CPU size="23.5 GiB"
[ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="16.1 GiB"
[ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=INFO source=device.go:272 msg="total memory" size="59.2 GiB"
[ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=INFO source=sched.go:561 msg="loaded runners" count=1
[ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=INFO source=server.go:1352 msg="waiting for llama runner to start responding"
[ollama-0 ollama] time=2026-04-07T12:26:15.328Z level=INFO source=server.go:1386 msg="waiting for server to become available" status="llm server loading model"
[ollama-0 ollama] time=2026-04-07T12:26:15.328Z level=DEBUG source=server.go:1396 msg="model load progress 0.00"
[ollama-0 ollama] time=2026-04-07T12:26:15.579Z level=DEBUG source=server.go:1396 msg="model load progress 0.03"
[ollama-0 ollama] time=2026-04-07T12:26:15.831Z level=DEBUG source=server.go:1396 msg="model load progress 0.07"
[ollama-0 ollama] time=2026-04-07T12:26:16.082Z level=DEBUG source=server.go:1396 msg="model load progress 0.11"
[ollama-0 ollama] time=2026-04-07T12:26:16.333Z level=DEBUG source=server.go:1396 msg="model load progress 0.14"
[ollama-0 ollama] time=2026-04-07T12:26:16.584Z level=DEBUG source=server.go:1396 msg="model load progress 0.17"
[ollama-0 ollama] time=2026-04-07T12:26:16.834Z level=DEBUG source=server.go:1396 msg="model load progress 0.21"
[ollama-0 ollama] time=2026-04-07T12:26:17.088Z level=DEBUG source=server.go:1396 msg="model load progress 0.25"
[ollama-0 ollama] time=2026-04-07T12:26:17.339Z level=DEBUG source=server.go:1396 msg="model load progress 0.28"
[ollama-0 ollama] time=2026-04-07T12:26:17.591Z level=DEBUG source=server.go:1396 msg="model load progress 0.31"
[ollama-0 ollama] time=2026-04-07T12:26:17.843Z level=DEBUG source=server.go:1396 msg="model load progress 0.35"
[ollama-0 ollama] time=2026-04-07T12:26:18.095Z level=DEBUG source=server.go:1396 msg="model load progress 0.39"
[ollama-0 ollama] time=2026-04-07T12:26:18.346Z level=DEBUG source=server.go:1396 msg="model load progress 0.42"
[ollama-0 ollama] time=2026-04-07T12:26:18.600Z level=DEBUG source=server.go:1396 msg="model load progress 0.44"
[ollama-0 ollama] time=2026-04-07T12:26:18.851Z level=DEBUG source=server.go:1396 msg="model load progress 0.48"
[ollama-0 ollama] time=2026-04-07T12:26:19.103Z level=DEBUG source=server.go:1396 msg="model load progress 0.51"
[ollama-0 ollama] [GIN] 2026/04/07 - 12:26:19 | 200 |     137.068µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] [GIN] 2026/04/07 - 12:26:19 | 200 |     104.876µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] time=2026-04-07T12:26:19.355Z level=DEBUG source=server.go:1396 msg="model load progress 0.54"
[ollama-0 ollama] time=2026-04-07T12:26:19.608Z level=DEBUG source=server.go:1396 msg="model load progress 0.57"
[ollama-0 ollama] time=2026-04-07T12:26:19.860Z level=DEBUG source=server.go:1396 msg="model load progress 0.60"
[ollama-0 ollama] time=2026-04-07T12:26:20.112Z level=DEBUG source=server.go:1396 msg="model load progress 0.63"
[ollama-0 ollama] time=2026-04-07T12:26:20.364Z level=DEBUG source=server.go:1396 msg="model load progress 0.67"
[ollama-0 ollama] time=2026-04-07T12:26:20.615Z level=DEBUG source=server.go:1396 msg="model load progress 0.70"
[ollama-0 ollama] time=2026-04-07T12:26:20.866Z level=DEBUG source=server.go:1396 msg="model load progress 0.73"
[ollama-0 ollama] time=2026-04-07T12:26:21.118Z level=DEBUG source=server.go:1396 msg="model load progress 0.76"
[ollama-0 ollama] time=2026-04-07T12:26:21.369Z level=DEBUG source=server.go:1396 msg="model load progress 0.79"
[ollama-0 ollama] time=2026-04-07T12:26:21.621Z level=DEBUG source=server.go:1396 msg="model load progress 0.83"
[ollama-0 ollama] time=2026-04-07T12:26:21.873Z level=DEBUG source=server.go:1396 msg="model load progress 0.86"
[ollama-0 ollama] time=2026-04-07T12:26:22.124Z level=DEBUG source=server.go:1396 msg="model load progress 0.89"
[ollama-0 ollama] time=2026-04-07T12:26:22.376Z level=DEBUG source=server.go:1396 msg="model load progress 0.92"
[ollama-0 ollama] time=2026-04-07T12:26:22.627Z level=DEBUG source=server.go:1396 msg="model load progress 0.94"
[ollama-0 ollama] time=2026-04-07T12:26:23.129Z level=DEBUG source=server.go:1396 msg="model load progress 0.95"
[ollama-0 ollama] time=2026-04-07T12:26:23.380Z level=DEBUG source=server.go:1396 msg="model load progress 0.96"
[ollama-0 ollama] time=2026-04-07T12:26:23.631Z level=DEBUG source=server.go:1396 msg="model load progress 0.97"
[ollama-0 ollama] time=2026-04-07T12:26:23.882Z level=DEBUG source=server.go:1396 msg="model load progress 0.98"
[ollama-0 ollama] time=2026-04-07T12:26:24.132Z level=DEBUG source=server.go:1396 msg="model load progress 0.99"
[ollama-0 ollama] [GIN] 2026/04/07 - 12:26:24 | 200 |     692.611µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] [GIN] 2026/04/07 - 12:26:24 | 200 |      49.769µs |   192.168.0.141 | GET      "/api/version"
[ollama-0 ollama] time=2026-04-07T12:26:24.383Z level=DEBUG source=server.go:1396 msg="model load progress 0.99"
[ollama-0 ollama] time=2026-04-07T12:26:24.634Z level=DEBUG source=server.go:1396 msg="model load progress 0.99"
[ollama-0 ollama] time=2026-04-07T12:26:24.786Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.pooling_type default=0
[ollama-0 ollama] time=2026-04-07T12:26:24.884Z level=INFO source=server.go:1390 msg="llama runner started in 90.06 seconds"
[ollama-0 ollama] time=2026-04-07T12:26:24.884Z level=DEBUG source=sched.go:573 msg="finished setting up" runner.name=registry.ollama.ai/library/gemma4:31b runner.size="59.2 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=161 runner.model=/models/blobs/sha256-280af6832eca23cb322c4dcc65edfea98a21b8f8ab07dc7553bd6f7e6e7a3313 runner.num_ctx=262144
[ollama-0 ollama] time=2026-04-07T12:26:25.012Z level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=33899 format=""
[ollama-0 ollama] time=2026-04-07T12:26:25.155Z level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=0 prompt=7904 used=0 remaining=7904
<!-- gh-comment-id:4198924740 --> @khteh commented on GitHub (Apr 7, 2026): I only see ollama start-up log: ``` [ollama-0 ollama] time=2026-04-07T12:24:54.376Z level=DEBUG source=runner.go:264 msg="refreshing free memory" [ollama-0 ollama] time=2026-04-07T12:24:54.376Z level=DEBUG source=runner.go:328 msg="unable to refresh all GPUs with existing runners, performing bootstrap discovery" [ollama-0 ollama] time=2026-04-07T12:24:54.376Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 43877" [ollama-0 ollama] time=2026-04-07T12:24:54.384Z level=DEBUG source=server.go:433 msg=subprocess OLLAMA_MODELS=/models OLLAMA_SCHED_SPREAD=true OLLAMA_HOST=http://0.0.0.0:11434 OLLAMA_CONTEXT_LENGTH=262144 OLLAMA_DEBUG=true LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13 [ollama-0 ollama] time=2026-04-07T12:24:54.633Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=257.23591ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extra_envs=map[] [ollama-0 ollama] time=2026-04-07T12:24:54.633Z level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=257.368031ms [ollama-0 ollama] time=2026-04-07T12:24:54.634Z level=WARN source=cpu_linux.go:130 msg="failed to parse CPU allowed micro secs" error="strconv.ParseInt: parsing \"max\": invalid syntax" [ollama-0 ollama] time=2026-04-07T12:24:54.634Z level=DEBUG source=sched.go:220 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1 [ollama-0 ollama] time=2026-04-07T12:24:54.634Z level=DEBUG source=sched.go:229 msg="loading first model" model=/models/blobs/sha256-280af6832eca23cb322c4dcc65edfea98a21b8f8ab07dc7553bd6f7e6e7a3313 [ollama-0 ollama] time=2026-04-07T12:24:54.764Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32 [ollama-0 ollama] time=2026-04-07T12:24:54.813Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32 [ollama-0 ollama] time=2026-04-07T12:24:54.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.pooling_type default=0 [ollama-0 ollama] time=2026-04-07T12:24:54.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eot_token_id default=106 [ollama-0 ollama] time=2026-04-07T12:24:54.823Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 [ollama-0 ollama] time=2026-04-07T12:24:54.823Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.attention.global_head_count_kv default=0 [ollama-0 ollama] time=2026-04-07T12:24:54.823Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_count default=0 [ollama-0 ollama] time=2026-04-07T12:24:54.823Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_used_count default=0 [ollama-0 ollama] time=2026-04-07T12:24:54.823Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.block_count default=0 [ollama-0 ollama] time=2026-04-07T12:24:54.823Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.embedding_length default=0 [ollama-0 ollama] time=2026-04-07T12:24:54.823Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --model /models/blobs/sha256-280af6832eca23cb322c4dcc65edfea98a21b8f8ab07dc7553bd6f7e6e7a3313 --port 34163" [ollama-0 ollama] time=2026-04-07T12:24:54.823Z level=DEBUG source=server.go:433 msg=subprocess OLLAMA_MODELS=/models OLLAMA_SCHED_SPREAD=true OLLAMA_HOST=http://0.0.0.0:11434 OLLAMA_CONTEXT_LENGTH=262144 OLLAMA_DEBUG=true LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13 [ollama-0 ollama] time=2026-04-07T12:24:54.824Z level=INFO source=sched.go:484 msg="system memory" total="68.1 GiB" free="67.8 GiB" free_swap="44.9 MiB" [ollama-0 ollama] time=2026-04-07T12:24:54.824Z level=INFO source=sched.go:491 msg="gpu memory" id=GPU-9762feba-cea4-7981-7353-533400b79c72 library=CUDA available="3.1 GiB" free="3.5 GiB" minimum="457.0 MiB" overhead="0 B" [ollama-0 ollama] time=2026-04-07T12:24:54.824Z level=INFO source=server.go:759 msg="loading model" "model layers"=61 requested=-1 [ollama-0 ollama] time=2026-04-07T12:24:54.834Z level=INFO source=runner.go:1417 msg="starting ollama engine" [ollama-0 ollama] time=2026-04-07T12:24:54.834Z level=INFO source=runner.go:1452 msg="Server listening on 127.0.0.1:34163" [ollama-0 ollama] time=2026-04-07T12:24:54.836Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:8 GPULayers:61[ID:GPU-9762feba-cea4-7981-7353-533400b79c72 Layers:61(0..60)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" [ollama-0 ollama] time=2026-04-07T12:24:54.890Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32 [ollama-0 ollama] time=2026-04-07T12:24:54.891Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.name default="" [ollama-0 ollama] time=2026-04-07T12:24:54.891Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.description default="" [ollama-0 ollama] time=2026-04-07T12:24:54.891Z level=INFO source=ggml.go:136 msg="" architecture=gemma4 file_type=Q4_K_M name="" description="" num_tensors=1189 num_key_values=49 [ollama-0 ollama] time=2026-04-07T12:24:54.891Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama [ollama-0 ollama] load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so [ollama-0 ollama] time=2026-04-07T12:24:54.896Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/cuda_v13 [ollama-0 ollama] ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no [ollama-0 ollama] ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no [ollama-0 ollama] ggml_cuda_init: found 1 CUDA devices: [ollama-0 ollama] Device 0: NVIDIA RTX A2000 Laptop GPU, compute capability 8.6, VMM: yes, ID: GPU-9762feba-cea4-7981-7353-533400b79c72 [ollama-0 ollama] load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v13/libggml-cuda.so [ollama-0 ollama] time=2026-04-07T12:24:55.046Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) [ollama-0 ollama] time=2026-04-07T12:24:55.052Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.pooling_type default=0 [ollama-0 ollama] time=2026-04-07T12:24:55.052Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eot_token_id default=106 [ollama-0 ollama] time=2026-04-07T12:24:55.052Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 [ollama-0 ollama] time=2026-04-07T12:24:55.052Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.attention.global_head_count_kv default=0 [ollama-0 ollama] time=2026-04-07T12:24:55.052Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_count default=0 [ollama-0 ollama] time=2026-04-07T12:24:55.052Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_used_count default=0 [ollama-0 ollama] time=2026-04-07T12:24:55.052Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.block_count default=0 [ollama-0 ollama] time=2026-04-07T12:24:55.052Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.embedding_length default=0 [ollama-0 ollama] time=2026-04-07T12:24:55.086Z level=INFO source=model.go:138 msg="vision: decode" elapsed=2.08823ms bounds=(0,0)-(2048,2048) [ollama-0 ollama] time=2026-04-07T12:24:55.199Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=112.653606ms size="[768 768]" [ollama-0 ollama] time=2026-04-07T12:24:55.199Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 [ollama-0 ollama] time=2026-04-07T12:24:55.199Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 [ollama-0 ollama] time=2026-04-07T12:24:55.200Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=115.757163ms shape="[5376 256]" [ollama-0 ollama] time=2026-04-07T12:24:58.009Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1272 splits=1 [ollama-0 ollama] [GIN] 2026/04/07 - 12:24:59 | 200 | 33.647µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] [GIN] 2026/04/07 - 12:24:59 | 200 | 24.938µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] [GIN] 2026/04/07 - 12:25:04 | 200 | 34.107µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] [GIN] 2026/04/07 - 12:25:04 | 200 | 30.816µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] [GIN] 2026/04/07 - 12:25:09 | 200 | 39.27µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] [GIN] 2026/04/07 - 12:25:09 | 200 | 29.365µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] [GIN] 2026/04/07 - 12:25:14 | 200 | 41.286µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] [GIN] 2026/04/07 - 12:25:14 | 200 | 34.408µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] [GIN] 2026/04/07 - 12:25:19 | 200 | 33.858µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] [GIN] 2026/04/07 - 12:25:19 | 200 | 20.956µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] time=2026-04-07T12:25:24.204Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2752 splits=2 [ollama-0 ollama] [GIN] 2026/04/07 - 12:25:24 | 200 | 56.72µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] [GIN] 2026/04/07 - 12:25:24 | 200 | 56.806µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] [GIN] 2026/04/07 - 12:25:29 | 200 | 32.961µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] [GIN] 2026/04/07 - 12:25:29 | 200 | 20.627µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] time=2026-04-07T12:25:33.340Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2750 splits=2 [ollama-0 ollama] time=2026-04-07T12:25:33.341Z level=DEBUG source=device.go:240 msg="model weights" device=CUDA0 size="18.4 GiB" [ollama-0 ollama] time=2026-04-07T12:25:33.341Z level=DEBUG source=device.go:245 msg="model weights" device=CPU size="1.2 GiB" [ollama-0 ollama] time=2026-04-07T12:25:33.341Z level=DEBUG source=device.go:251 msg="kv cache" device=CUDA0 size="23.5 GiB" [ollama-0 ollama] time=2026-04-07T12:25:33.341Z level=DEBUG source=device.go:262 msg="compute graph" device=CUDA0 size="24.6 GiB" [ollama-0 ollama] time=2026-04-07T12:25:33.341Z level=DEBUG source=device.go:267 msg="compute graph" device=CPU size="10.5 MiB" [ollama-0 ollama] time=2026-04-07T12:25:33.341Z level=DEBUG source=device.go:272 msg="total memory" size="67.7 GiB" [ollama-0 ollama] time=2026-04-07T12:25:33.341Z level=DEBUG source=server.go:784 msg=memory success=true required.InputWeights=1250426880 required.CPU.Graph=11010048 required.CUDA0.ID=GPU-9762feba-cea4-7981-7353-533400b79c72 required.CUDA0.Weights="[304974208 304974208 304974208 304974208 304974208 330264704 304974208 275169664 275169664 304974208 269492608 300460160 299297152 275169664 269492608 299297152 275169664 300460160 299297152 269492608 275169664 299297152 269492608 300460160 304974208 269492608 269492608 304974208 269492608 300460160 299297152 275169664 269492608 299297152 275169664 300460160 299297152 269492608 275169664 299297152 269492608 300460160 304974208 269492608 269492608 304974208 269492608 300460160 299297152 275169664 269492608 299297152 304974208 330264704 299297152 299297152 304974208 299297152 299297152 330264704 2260644352]" required.CUDA0.Cache="[75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 0]" required.CUDA0.Graph=26440636544 [ollama-0 ollama] time=2026-04-07T12:25:33.341Z level=DEBUG source=server.go:978 msg="available gpu" id=GPU-9762feba-cea4-7981-7353-533400b79c72 library=CUDA "available layer vram"="0 B" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="24.6 GiB" [ollama-0 ollama] time=2026-04-07T12:25:33.341Z level=DEBUG source=server.go:1059 msg="insufficient VRAM to load any model layers" [ollama-0 ollama] time=2026-04-07T12:25:33.341Z level=DEBUG source=server.go:795 msg="new layout created" layers=[] [ollama-0 ollama] time=2026-04-07T12:25:33.341Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" [ollama-0 ollama] time=2026-04-07T12:25:33.398Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32 [ollama-0 ollama] time=2026-04-07T12:25:33.406Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.pooling_type default=0 [ollama-0 ollama] time=2026-04-07T12:25:33.406Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eot_token_id default=106 [ollama-0 ollama] time=2026-04-07T12:25:33.407Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 [ollama-0 ollama] time=2026-04-07T12:25:33.407Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.attention.global_head_count_kv default=0 [ollama-0 ollama] time=2026-04-07T12:25:33.407Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_count default=0 [ollama-0 ollama] time=2026-04-07T12:25:33.407Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_used_count default=0 [ollama-0 ollama] time=2026-04-07T12:25:33.407Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.block_count default=0 [ollama-0 ollama] time=2026-04-07T12:25:33.407Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.embedding_length default=0 [ollama-0 ollama] time=2026-04-07T12:25:33.423Z level=INFO source=model.go:138 msg="vision: decode" elapsed=3.65082ms bounds=(0,0)-(2048,2048) [ollama-0 ollama] time=2026-04-07T12:25:33.552Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=129.412007ms size="[768 768]" [ollama-0 ollama] time=2026-04-07T12:25:33.552Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 [ollama-0 ollama] time=2026-04-07T12:25:33.552Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 [ollama-0 ollama] time=2026-04-07T12:25:33.553Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=133.987785ms shape="[5376 256]" [ollama-0 ollama] time=2026-04-07T12:25:33.555Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1272 splits=1 [ollama-0 ollama] time=2026-04-07T12:25:34.203Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2752 splits=1 [ollama-0 ollama] time=2026-04-07T12:25:34.207Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2750 splits=1 [ollama-0 ollama] time=2026-04-07T12:25:34.208Z level=DEBUG source=device.go:245 msg="model weights" device=CPU size="19.6 GiB" [ollama-0 ollama] time=2026-04-07T12:25:34.208Z level=DEBUG source=device.go:256 msg="kv cache" device=CPU size="23.5 GiB" [ollama-0 ollama] time=2026-04-07T12:25:34.208Z level=DEBUG source=device.go:267 msg="compute graph" device=CPU size="16.1 GiB" [ollama-0 ollama] time=2026-04-07T12:25:34.208Z level=DEBUG source=device.go:272 msg="total memory" size="59.2 GiB" [ollama-0 ollama] time=2026-04-07T12:25:34.208Z level=DEBUG source=server.go:784 msg=memory success=true required.InputWeights=1250426880 required.CPU.Weights="[304972832 304972832 304972832 304972832 304972832 330263584 304972832 275168288 275168288 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 304972832 330263584 299295776 299295776 304972832 299295776 299295776 330263584 2260638912]" required.CPU.Cache="[75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 0]" required.CPU.Graph=17280008192 [ollama-0 ollama] time=2026-04-07T12:25:34.208Z level=DEBUG source=server.go:978 msg="available gpu" id=GPU-9762feba-cea4-7981-7353-533400b79c72 library=CUDA "available layer vram"="3.1 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="0 B" [ollama-0 ollama] time=2026-04-07T12:25:34.208Z level=DEBUG source=server.go:795 msg="new layout created" layers="3[ID:GPU-9762feba-cea4-7981-7353-533400b79c72 Layers:3(57..59)]" [ollama-0 ollama] time=2026-04-07T12:25:34.208Z level=DEBUG source=server.go:820 msg="exploring intermediate layers" layer=2 [ollama-0 ollama] time=2026-04-07T12:25:34.208Z level=DEBUG source=server.go:978 msg="available gpu" id=GPU-9762feba-cea4-7981-7353-533400b79c72 library=CUDA "available layer vram"="3.1 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="0 B" [ollama-0 ollama] time=2026-04-07T12:25:34.208Z level=DEBUG source=server.go:828 msg="new layout created" layers="2[ID:GPU-9762feba-cea4-7981-7353-533400b79c72 Layers:2(58..59)]" [ollama-0 ollama] time=2026-04-07T12:25:34.208Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:8 GPULayers:2[ID:GPU-9762feba-cea4-7981-7353-533400b79c72 Layers:2(58..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" [ollama-0 ollama] time=2026-04-07T12:25:34.264Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32 [ollama-0 ollama] time=2026-04-07T12:25:34.271Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.pooling_type default=0 [ollama-0 ollama] time=2026-04-07T12:25:34.271Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eot_token_id default=106 [ollama-0 ollama] time=2026-04-07T12:25:34.272Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 [ollama-0 ollama] time=2026-04-07T12:25:34.272Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.attention.global_head_count_kv default=0 [ollama-0 ollama] time=2026-04-07T12:25:34.272Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_count default=0 [ollama-0 ollama] time=2026-04-07T12:25:34.272Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_used_count default=0 [ollama-0 ollama] time=2026-04-07T12:25:34.272Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.block_count default=0 [ollama-0 ollama] time=2026-04-07T12:25:34.272Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.embedding_length default=0 [ollama-0 ollama] [GIN] 2026/04/07 - 12:25:34 | 200 | 34.019µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] [GIN] 2026/04/07 - 12:25:34 | 200 | 48.054µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] time=2026-04-07T12:25:34.286Z level=INFO source=model.go:138 msg="vision: decode" elapsed=833.65µs bounds=(0,0)-(2048,2048) [ollama-0 ollama] time=2026-04-07T12:25:34.411Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=124.942791ms size="[768 768]" [ollama-0 ollama] time=2026-04-07T12:25:34.411Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 [ollama-0 ollama] time=2026-04-07T12:25:34.411Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 [ollama-0 ollama] time=2026-04-07T12:25:34.412Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=126.806024ms shape="[5376 256]" [ollama-0 ollama] time=2026-04-07T12:25:34.417Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1272 splits=355 [ollama-0 ollama] time=2026-04-07T12:25:35.188Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2752 splits=940 [ollama-0 ollama] time=2026-04-07T12:25:35.193Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2750 splits=3 [ollama-0 ollama] time=2026-04-07T12:25:35.194Z level=DEBUG source=device.go:240 msg="model weights" device=CUDA0 size="600.4 MiB" [ollama-0 ollama] time=2026-04-07T12:25:35.194Z level=DEBUG source=device.go:245 msg="model weights" device=CPU size="19.0 GiB" [ollama-0 ollama] time=2026-04-07T12:25:35.194Z level=DEBUG source=device.go:251 msg="kv cache" device=CUDA0 size="2.1 GiB" [ollama-0 ollama] time=2026-04-07T12:25:35.194Z level=DEBUG source=device.go:256 msg="kv cache" device=CPU size="21.4 GiB" [ollama-0 ollama] time=2026-04-07T12:25:35.194Z level=DEBUG source=device.go:262 msg="compute graph" device=CUDA0 size="26.7 GiB" [ollama-0 ollama] time=2026-04-07T12:25:35.194Z level=DEBUG source=device.go:267 msg="compute graph" device=CPU size="32.2 MiB" [ollama-0 ollama] time=2026-04-07T12:25:35.194Z level=DEBUG source=device.go:272 msg="total memory" size="69.8 GiB" [ollama-0 ollama] time=2026-04-07T12:25:35.194Z level=DEBUG source=server.go:837 msg=memory success=true required.InputWeights=1250426880 required.CPU.Weights="[304972832 304972832 304972832 304972832 304972832 330263584 304972832 275168288 275168288 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 304972832 330263584 299295776 299295776 304972832 299295776 0 0 2260638912]" required.CPU.Cache="[75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 0 0 0]" required.CPU.Graph=33728512 required.CUDA0.ID=GPU-9762feba-cea4-7981-7353-533400b79c72 required.CUDA0.Weights="[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 299297152 330264704 0]" required.CUDA0.Cache="[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 75497472 2147483648 0]" required.CUDA0.Graph=28661715968 [ollama-0 ollama] time=2026-04-07T12:25:35.194Z level=DEBUG source=server.go:978 msg="available gpu" id=GPU-9762feba-cea4-7981-7353-533400b79c72 library=CUDA "available layer vram"="0 B" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="26.7 GiB" [ollama-0 ollama] time=2026-04-07T12:25:35.194Z level=DEBUG source=server.go:1059 msg="insufficient VRAM to load any model layers" [ollama-0 ollama] time=2026-04-07T12:25:35.194Z level=DEBUG source=server.go:845 msg="verifying layout" layers=[] [ollama-0 ollama] time=2026-04-07T12:25:35.194Z level=DEBUG source=server.go:820 msg="exploring intermediate layers" layer=1 [ollama-0 ollama] time=2026-04-07T12:25:35.194Z level=DEBUG source=server.go:978 msg="available gpu" id=GPU-9762feba-cea4-7981-7353-533400b79c72 library=CUDA "available layer vram"="3.1 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="0 B" [ollama-0 ollama] time=2026-04-07T12:25:35.194Z level=DEBUG source=server.go:828 msg="new layout created" layers="1[ID:GPU-9762feba-cea4-7981-7353-533400b79c72 Layers:1(59..59)]" [ollama-0 ollama] time=2026-04-07T12:25:35.195Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:8 GPULayers:1[ID:GPU-9762feba-cea4-7981-7353-533400b79c72 Layers:1(59..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" [ollama-0 ollama] time=2026-04-07T12:25:35.253Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32 [ollama-0 ollama] time=2026-04-07T12:25:35.260Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.pooling_type default=0 [ollama-0 ollama] time=2026-04-07T12:25:35.260Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eot_token_id default=106 [ollama-0 ollama] time=2026-04-07T12:25:35.261Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 [ollama-0 ollama] time=2026-04-07T12:25:35.261Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.attention.global_head_count_kv default=0 [ollama-0 ollama] time=2026-04-07T12:25:35.261Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_count default=0 [ollama-0 ollama] time=2026-04-07T12:25:35.261Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_used_count default=0 [ollama-0 ollama] time=2026-04-07T12:25:35.261Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.block_count default=0 [ollama-0 ollama] time=2026-04-07T12:25:35.261Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.embedding_length default=0 [ollama-0 ollama] time=2026-04-07T12:25:35.282Z level=INFO source=model.go:138 msg="vision: decode" elapsed=770.123µs bounds=(0,0)-(2048,2048) [ollama-0 ollama] time=2026-04-07T12:25:35.446Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=163.773076ms size="[768 768]" [ollama-0 ollama] time=2026-04-07T12:25:35.446Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 [ollama-0 ollama] time=2026-04-07T12:25:35.446Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 [ollama-0 ollama] time=2026-04-07T12:25:35.447Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=166.162382ms shape="[5376 256]" [ollama-0 ollama] time=2026-04-07T12:25:35.453Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1272 splits=355 [ollama-0 ollama] time=2026-04-07T12:25:36.276Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2752 splits=956 [ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2750 splits=3 [ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=device.go:240 msg="model weights" device=CUDA0 size="315.0 MiB" [ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=device.go:245 msg="model weights" device=CPU size="19.3 GiB" [ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=device.go:251 msg="kv cache" device=CUDA0 size="2.0 GiB" [ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=device.go:256 msg="kv cache" device=CPU size="21.5 GiB" [ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=device.go:262 msg="compute graph" device=CUDA0 size="26.7 GiB" [ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=device.go:267 msg="compute graph" device=CPU size="32.2 MiB" [ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=device.go:272 msg="total memory" size="69.8 GiB" [ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=server.go:837 msg=memory success=true required.InputWeights=1250426880 required.CPU.Weights="[304972832 304972832 304972832 304972832 304972832 330263584 304972832 275168288 275168288 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 304972832 330263584 299295776 299295776 304972832 299295776 299295776 0 2260638912]" required.CPU.Cache="[75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 0 0]" required.CPU.Graph=33728512 required.CUDA0.ID=GPU-9762feba-cea4-7981-7353-533400b79c72 required.CUDA0.Weights="[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 330264704 0]" required.CUDA0.Cache="[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2147483648 0]" required.CUDA0.Graph=28661715968 [ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=server.go:978 msg="available gpu" id=GPU-9762feba-cea4-7981-7353-533400b79c72 library=CUDA "available layer vram"="0 B" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="26.7 GiB" [ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=server.go:1059 msg="insufficient VRAM to load any model layers" [ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=server.go:845 msg="verifying layout" layers=[] [ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=server.go:820 msg="exploring intermediate layers" layer=0 [ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=server.go:978 msg="available gpu" id=GPU-9762feba-cea4-7981-7353-533400b79c72 library=CUDA "available layer vram"="3.1 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="0 B" [ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=server.go:1059 msg="insufficient VRAM to load any model layers" [ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=DEBUG source=server.go:828 msg="new layout created" layers=[] [ollama-0 ollama] time=2026-04-07T12:25:36.280Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" [ollama-0 ollama] time=2026-04-07T12:25:36.336Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32 [ollama-0 ollama] time=2026-04-07T12:25:36.344Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.pooling_type default=0 [ollama-0 ollama] time=2026-04-07T12:25:36.344Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eot_token_id default=106 [ollama-0 ollama] time=2026-04-07T12:25:36.345Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 [ollama-0 ollama] time=2026-04-07T12:25:36.345Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.attention.global_head_count_kv default=0 [ollama-0 ollama] time=2026-04-07T12:25:36.345Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_count default=0 [ollama-0 ollama] time=2026-04-07T12:25:36.345Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_used_count default=0 [ollama-0 ollama] time=2026-04-07T12:25:36.345Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.block_count default=0 [ollama-0 ollama] time=2026-04-07T12:25:36.345Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.embedding_length default=0 [ollama-0 ollama] time=2026-04-07T12:25:36.362Z level=INFO source=model.go:138 msg="vision: decode" elapsed=1.145509ms bounds=(0,0)-(2048,2048) [ollama-0 ollama] time=2026-04-07T12:25:36.503Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=140.876134ms size="[768 768]" [ollama-0 ollama] time=2026-04-07T12:25:36.503Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 [ollama-0 ollama] time=2026-04-07T12:25:36.503Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 [ollama-0 ollama] time=2026-04-07T12:25:36.504Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=142.98229ms shape="[5376 256]" [ollama-0 ollama] time=2026-04-07T12:25:36.505Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1272 splits=1 [ollama-0 ollama] time=2026-04-07T12:25:37.338Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2752 splits=1 [ollama-0 ollama] time=2026-04-07T12:25:37.342Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2750 splits=1 [ollama-0 ollama] time=2026-04-07T12:25:37.342Z level=DEBUG source=device.go:245 msg="model weights" device=CPU size="19.6 GiB" [ollama-0 ollama] time=2026-04-07T12:25:37.342Z level=DEBUG source=device.go:256 msg="kv cache" device=CPU size="23.5 GiB" [ollama-0 ollama] time=2026-04-07T12:25:37.342Z level=DEBUG source=device.go:267 msg="compute graph" device=CPU size="16.1 GiB" [ollama-0 ollama] time=2026-04-07T12:25:37.342Z level=DEBUG source=device.go:272 msg="total memory" size="59.2 GiB" [ollama-0 ollama] time=2026-04-07T12:25:37.342Z level=DEBUG source=server.go:837 msg=memory success=true required.InputWeights=1250426880 required.CPU.Weights="[304972832 304972832 304972832 304972832 304972832 330263584 304972832 275168288 275168288 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 304972832 330263584 299295776 299295776 304972832 299295776 299295776 330263584 2260638912]" required.CPU.Cache="[75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 0]" required.CPU.Graph=17280008192 [ollama-0 ollama] time=2026-04-07T12:25:37.342Z level=DEBUG source=server.go:978 msg="available gpu" id=GPU-9762feba-cea4-7981-7353-533400b79c72 library=CUDA "available layer vram"="3.1 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="0 B" [ollama-0 ollama] time=2026-04-07T12:25:37.342Z level=DEBUG source=server.go:845 msg="verifying layout" layers="3[ID:GPU-9762feba-cea4-7981-7353-533400b79c72 Layers:3(57..59)]" [ollama-0 ollama] time=2026-04-07T12:25:37.343Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" [ollama-0 ollama] time=2026-04-07T12:25:37.404Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32 [ollama-0 ollama] time=2026-04-07T12:25:37.424Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.pooling_type default=0 [ollama-0 ollama] time=2026-04-07T12:25:37.424Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eot_token_id default=106 [ollama-0 ollama] time=2026-04-07T12:25:37.425Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 [ollama-0 ollama] time=2026-04-07T12:25:37.425Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.attention.global_head_count_kv default=0 [ollama-0 ollama] time=2026-04-07T12:25:37.425Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_count default=0 [ollama-0 ollama] time=2026-04-07T12:25:37.425Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_used_count default=0 [ollama-0 ollama] time=2026-04-07T12:25:37.425Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.block_count default=0 [ollama-0 ollama] time=2026-04-07T12:25:37.425Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.embedding_length default=0 [ollama-0 ollama] time=2026-04-07T12:25:37.451Z level=INFO source=model.go:138 msg="vision: decode" elapsed=1.711867ms bounds=(0,0)-(2048,2048) [ollama-0 ollama] time=2026-04-07T12:25:37.601Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=150.209742ms size="[768 768]" [ollama-0 ollama] time=2026-04-07T12:25:37.604Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 [ollama-0 ollama] time=2026-04-07T12:25:37.604Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 [ollama-0 ollama] time=2026-04-07T12:25:37.605Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=156.28355ms shape="[5376 256]" [ollama-0 ollama] time=2026-04-07T12:25:37.607Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1272 splits=1 [ollama-0 ollama] [GIN] 2026/04/07 - 12:25:39 | 200 | 39.85µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] [GIN] 2026/04/07 - 12:25:39 | 200 | 43.372µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] [GIN] 2026/04/07 - 12:25:44 | 200 | 40.613µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] [GIN] 2026/04/07 - 12:25:44 | 200 | 20.983µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] time=2026-04-07T12:25:46.509Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2752 splits=1 [ollama-0 ollama] time=2026-04-07T12:25:46.557Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2750 splits=1 [ollama-0 ollama] time=2026-04-07T12:25:46.557Z level=DEBUG source=device.go:245 msg="model weights" device=CPU size="19.6 GiB" [ollama-0 ollama] time=2026-04-07T12:25:46.557Z level=DEBUG source=device.go:256 msg="kv cache" device=CPU size="23.5 GiB" [ollama-0 ollama] time=2026-04-07T12:25:46.557Z level=DEBUG source=device.go:267 msg="compute graph" device=CPU size="16.1 GiB" [ollama-0 ollama] time=2026-04-07T12:25:46.557Z level=DEBUG source=device.go:272 msg="total memory" size="59.2 GiB" [ollama-0 ollama] time=2026-04-07T12:25:46.557Z level=DEBUG source=server.go:784 msg=memory success=true required.InputWeights=1250426880 required.CPU.Weights="[304972832 304972832 304972832 304972832 304972832 330263584 304972832 275168288 275168288 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 304972832 330263584 299295776 299295776 304972832 299295776 299295776 330263584 2260638912]" required.CPU.Cache="[75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 0]" required.CPU.Graph=17280008192 [ollama-0 ollama] time=2026-04-07T12:25:46.557Z level=DEBUG source=server.go:978 msg="available gpu" id=GPU-9762feba-cea4-7981-7353-533400b79c72 library=CUDA "available layer vram"="3.1 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="0 B" [ollama-0 ollama] time=2026-04-07T12:25:46.557Z level=DEBUG source=server.go:795 msg="new layout created" layers="3[ID:GPU-9762feba-cea4-7981-7353-533400b79c72 Layers:3(57..59)]" [ollama-0 ollama] time=2026-04-07T12:25:46.557Z level=DEBUG source=server.go:820 msg="exploring intermediate layers" layer=2 [ollama-0 ollama] time=2026-04-07T12:25:46.557Z level=DEBUG source=server.go:978 msg="available gpu" id=GPU-9762feba-cea4-7981-7353-533400b79c72 library=CUDA "available layer vram"="3.1 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="0 B" [ollama-0 ollama] time=2026-04-07T12:25:46.557Z level=DEBUG source=server.go:828 msg="new layout created" layers="2[ID:GPU-9762feba-cea4-7981-7353-533400b79c72 Layers:2(58..59)]" [ollama-0 ollama] time=2026-04-07T12:25:46.558Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:8 GPULayers:2[ID:GPU-9762feba-cea4-7981-7353-533400b79c72 Layers:2(58..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" [ollama-0 ollama] time=2026-04-07T12:25:47.491Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32 [ollama-0 ollama] time=2026-04-07T12:25:47.510Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.pooling_type default=0 [ollama-0 ollama] time=2026-04-07T12:25:47.510Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eot_token_id default=106 [ollama-0 ollama] time=2026-04-07T12:25:47.511Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 [ollama-0 ollama] time=2026-04-07T12:25:47.512Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.attention.global_head_count_kv default=0 [ollama-0 ollama] time=2026-04-07T12:25:47.512Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_count default=0 [ollama-0 ollama] time=2026-04-07T12:25:47.512Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_used_count default=0 [ollama-0 ollama] time=2026-04-07T12:25:47.512Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.block_count default=0 [ollama-0 ollama] time=2026-04-07T12:25:47.512Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.embedding_length default=0 [ollama-0 ollama] time=2026-04-07T12:25:47.533Z level=INFO source=model.go:138 msg="vision: decode" elapsed=1.775305ms bounds=(0,0)-(2048,2048) [ollama-0 ollama] time=2026-04-07T12:25:47.710Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=177.372135ms size="[768 768]" [ollama-0 ollama] time=2026-04-07T12:25:47.714Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 [ollama-0 ollama] time=2026-04-07T12:25:47.714Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 [ollama-0 ollama] time=2026-04-07T12:25:47.714Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=183.203596ms shape="[5376 256]" [ollama-0 ollama] time=2026-04-07T12:25:47.729Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1272 splits=355 [ollama-0 ollama] [GIN] 2026/04/07 - 12:25:49 | 200 | 38.499µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] [GIN] 2026/04/07 - 12:25:49 | 200 | 60.868µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] [GIN] 2026/04/07 - 12:25:54 | 200 | 53.539µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] [GIN] 2026/04/07 - 12:25:54 | 200 | 47.436µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] ggml_backend_cuda_buffer_type_alloc_buffer: allocating 19123.94 MiB on device 0: cudaMalloc failed: out of memory [ollama-0 ollama] ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 20052907008 [ollama-0 ollama] time=2026-04-07T12:25:55.955Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2752 splits=940 [ollama-0 ollama] time=2026-04-07T12:25:56.924Z level=DEBUG source=device.go:240 msg="model weights" device=CUDA0 size="600.4 MiB" [ollama-0 ollama] time=2026-04-07T12:25:56.924Z level=DEBUG source=device.go:245 msg="model weights" device=CPU size="19.0 GiB" [ollama-0 ollama] time=2026-04-07T12:25:56.924Z level=DEBUG source=device.go:251 msg="kv cache" device=CUDA0 size="2.1 GiB" [ollama-0 ollama] time=2026-04-07T12:25:56.924Z level=DEBUG source=device.go:256 msg="kv cache" device=CPU size="21.4 GiB" [ollama-0 ollama] time=2026-04-07T12:25:56.924Z level=DEBUG source=device.go:262 msg="compute graph" device=CUDA0 size="18.7 GiB" [ollama-0 ollama] time=2026-04-07T12:25:56.924Z level=DEBUG source=device.go:267 msg="compute graph" device=CPU size="16.0 MiB" [ollama-0 ollama] time=2026-04-07T12:25:56.924Z level=DEBUG source=device.go:272 msg="total memory" size="61.8 GiB" [ollama-0 ollama] time=2026-04-07T12:25:56.924Z level=DEBUG source=server.go:837 msg=memory success=false required.InputWeights=1250426880 required.CPU.Weights="[304972832 304972832 304972832 304972832 304972832 330263584 304972832 275168288 275168288 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 304972832 330263584 299295776 299295776 304972832 299295776 0 0 2260638912]" required.CPU.Cache="[75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 0 0 0]" required.CPU.Graph=16777216 required.CUDA0.ID=GPU-9762feba-cea4-7981-7353-533400b79c72 required.CUDA0.Weights="[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 299297152 330264704 0]" required.CUDA0.Cache="[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 75497472 2147483648 0]" required.CUDA0.Graph=20078072832 [ollama-0 ollama] time=2026-04-07T12:25:56.924Z level=DEBUG source=server.go:820 msg="exploring intermediate layers" layer=1 [ollama-0 ollama] time=2026-04-07T12:25:56.924Z level=DEBUG source=server.go:978 msg="available gpu" id=GPU-9762feba-cea4-7981-7353-533400b79c72 library=CUDA "available layer vram"="3.1 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="0 B" [ollama-0 ollama] time=2026-04-07T12:25:56.924Z level=DEBUG source=server.go:828 msg="new layout created" layers="1[ID:GPU-9762feba-cea4-7981-7353-533400b79c72 Layers:1(59..59)]" [ollama-0 ollama] time=2026-04-07T12:25:56.924Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:8 GPULayers:1[ID:GPU-9762feba-cea4-7981-7353-533400b79c72 Layers:1(59..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" [ollama-0 ollama] time=2026-04-07T12:25:56.978Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32 [ollama-0 ollama] time=2026-04-07T12:25:56.987Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.pooling_type default=0 [ollama-0 ollama] time=2026-04-07T12:25:56.987Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eot_token_id default=106 [ollama-0 ollama] time=2026-04-07T12:25:56.988Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 [ollama-0 ollama] time=2026-04-07T12:25:56.988Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.attention.global_head_count_kv default=0 [ollama-0 ollama] time=2026-04-07T12:25:56.988Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_count default=0 [ollama-0 ollama] time=2026-04-07T12:25:56.988Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_used_count default=0 [ollama-0 ollama] time=2026-04-07T12:25:56.988Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.block_count default=0 [ollama-0 ollama] time=2026-04-07T12:25:56.988Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.embedding_length default=0 [ollama-0 ollama] time=2026-04-07T12:25:57.006Z level=INFO source=model.go:138 msg="vision: decode" elapsed=2.345549ms bounds=(0,0)-(2048,2048) [ollama-0 ollama] time=2026-04-07T12:25:57.150Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=143.368125ms size="[768 768]" [ollama-0 ollama] time=2026-04-07T12:25:57.153Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 [ollama-0 ollama] time=2026-04-07T12:25:57.153Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 [ollama-0 ollama] time=2026-04-07T12:25:57.154Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=150.467504ms shape="[5376 256]" [ollama-0 ollama] time=2026-04-07T12:25:57.161Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1272 splits=355 [ollama-0 ollama] [GIN] 2026/04/07 - 12:25:59 | 200 | 39.271µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] [GIN] 2026/04/07 - 12:25:59 | 200 | 28.811µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] [GIN] 2026/04/07 - 12:26:04 | 200 | 58.757µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] [GIN] 2026/04/07 - 12:26:04 | 200 | 21.524µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] ggml_backend_cuda_buffer_type_alloc_buffer: allocating 19123.94 MiB on device 0: cudaMalloc failed: out of memory [ollama-0 ollama] ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 20052907008 [ollama-0 ollama] time=2026-04-07T12:26:05.321Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2752 splits=956 [ollama-0 ollama] time=2026-04-07T12:26:06.236Z level=DEBUG source=device.go:240 msg="model weights" device=CUDA0 size="315.0 MiB" [ollama-0 ollama] time=2026-04-07T12:26:06.236Z level=DEBUG source=device.go:245 msg="model weights" device=CPU size="19.3 GiB" [ollama-0 ollama] time=2026-04-07T12:26:06.236Z level=DEBUG source=device.go:251 msg="kv cache" device=CUDA0 size="2.0 GiB" [ollama-0 ollama] time=2026-04-07T12:26:06.236Z level=DEBUG source=device.go:256 msg="kv cache" device=CPU size="21.5 GiB" [ollama-0 ollama] time=2026-04-07T12:26:06.236Z level=DEBUG source=device.go:262 msg="compute graph" device=CUDA0 size="18.7 GiB" [ollama-0 ollama] time=2026-04-07T12:26:06.236Z level=DEBUG source=device.go:267 msg="compute graph" device=CPU size="16.0 MiB" [ollama-0 ollama] time=2026-04-07T12:26:06.236Z level=DEBUG source=device.go:272 msg="total memory" size="61.8 GiB" [ollama-0 ollama] time=2026-04-07T12:26:06.236Z level=DEBUG source=server.go:837 msg=memory success=false required.InputWeights=1250426880 required.CPU.Weights="[304972832 304972832 304972832 304972832 304972832 330263584 304972832 275168288 275168288 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 304972832 330263584 299295776 299295776 304972832 299295776 299295776 0 2260638912]" required.CPU.Cache="[75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 0 0]" required.CPU.Graph=16777216 required.CUDA0.ID=GPU-9762feba-cea4-7981-7353-533400b79c72 required.CUDA0.Weights="[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 330264704 0]" required.CUDA0.Cache="[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2147483648 0]" required.CUDA0.Graph=20078072832 [ollama-0 ollama] time=2026-04-07T12:26:06.236Z level=DEBUG source=server.go:820 msg="exploring intermediate layers" layer=0 [ollama-0 ollama] time=2026-04-07T12:26:06.236Z level=DEBUG source=server.go:978 msg="available gpu" id=GPU-9762feba-cea4-7981-7353-533400b79c72 library=CUDA "available layer vram"="3.1 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="0 B" [ollama-0 ollama] time=2026-04-07T12:26:06.236Z level=DEBUG source=server.go:1059 msg="insufficient VRAM to load any model layers" [ollama-0 ollama] time=2026-04-07T12:26:06.236Z level=DEBUG source=server.go:828 msg="new layout created" layers=[] [ollama-0 ollama] time=2026-04-07T12:26:06.236Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" [ollama-0 ollama] time=2026-04-07T12:26:06.291Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32 [ollama-0 ollama] time=2026-04-07T12:26:06.298Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.pooling_type default=0 [ollama-0 ollama] time=2026-04-07T12:26:06.298Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eot_token_id default=106 [ollama-0 ollama] time=2026-04-07T12:26:06.299Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 [ollama-0 ollama] time=2026-04-07T12:26:06.300Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.attention.global_head_count_kv default=0 [ollama-0 ollama] time=2026-04-07T12:26:06.300Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_count default=0 [ollama-0 ollama] time=2026-04-07T12:26:06.300Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.expert_used_count default=0 [ollama-0 ollama] time=2026-04-07T12:26:06.300Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.block_count default=0 [ollama-0 ollama] time=2026-04-07T12:26:06.300Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.audio.embedding_length default=0 [ollama-0 ollama] time=2026-04-07T12:26:06.321Z level=INFO source=model.go:138 msg="vision: decode" elapsed=1.773695ms bounds=(0,0)-(2048,2048) [ollama-0 ollama] time=2026-04-07T12:26:06.470Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=149.050579ms size="[768 768]" [ollama-0 ollama] time=2026-04-07T12:26:06.472Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 [ollama-0 ollama] time=2026-04-07T12:26:06.472Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 [ollama-0 ollama] time=2026-04-07T12:26:06.473Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=153.434664ms shape="[5376 256]" [ollama-0 ollama] time=2026-04-07T12:26:06.474Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1272 splits=1 [ollama-0 ollama] [GIN] 2026/04/07 - 12:26:09 | 200 | 32.004µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] [GIN] 2026/04/07 - 12:26:09 | 200 | 32.395µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] [GIN] 2026/04/07 - 12:26:14 | 200 | 36.028µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] [GIN] 2026/04/07 - 12:26:14 | 200 | 47.483µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] time=2026-04-07T12:26:15.272Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2752 splits=1 [ollama-0 ollama] time=2026-04-07T12:26:15.326Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2750 splits=1 [ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=DEBUG source=device.go:245 msg="model weights" device=CPU size="19.6 GiB" [ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=DEBUG source=device.go:256 msg="kv cache" device=CPU size="23.5 GiB" [ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=DEBUG source=device.go:267 msg="compute graph" device=CPU size="16.1 GiB" [ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=DEBUG source=device.go:272 msg="total memory" size="59.2 GiB" [ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=DEBUG source=server.go:837 msg=memory success=true required.InputWeights=1250426880 required.CPU.Weights="[304972832 304972832 304972832 304972832 304972832 330263584 304972832 275168288 275168288 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 275168288 300459040 299295776 269491232 275168288 299295776 269491232 300459040 304972832 269491232 269491232 304972832 269491232 300459040 299295776 275168288 269491232 299295776 304972832 330263584 299295776 299295776 304972832 299295776 299295776 330263584 2260638912]" required.CPU.Cache="[75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 75497472 75497472 75497472 75497472 75497472 2147483648 0]" required.CPU.Graph=17280008192 [ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=DEBUG source=server.go:978 msg="available gpu" id=GPU-9762feba-cea4-7981-7353-533400b79c72 library=CUDA "available layer vram"="3.1 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="0 B" [ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=DEBUG source=server.go:845 msg="verifying layout" layers="3[ID:GPU-9762feba-cea4-7981-7353-533400b79c72 Layers:3(57..59)]" [ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=INFO source=runner.go:1290 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" [ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU" [ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=INFO source=ggml.go:486 msg="offloading output layer to CPU" [ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=INFO source=ggml.go:494 msg="offloaded 0/61 layers to GPU" [ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=INFO source=device.go:245 msg="model weights" device=CPU size="19.6 GiB" [ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=INFO source=device.go:256 msg="kv cache" device=CPU size="23.5 GiB" [ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="16.1 GiB" [ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=INFO source=device.go:272 msg="total memory" size="59.2 GiB" [ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=INFO source=sched.go:561 msg="loaded runners" count=1 [ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=INFO source=server.go:1352 msg="waiting for llama runner to start responding" [ollama-0 ollama] time=2026-04-07T12:26:15.328Z level=INFO source=server.go:1386 msg="waiting for server to become available" status="llm server loading model" [ollama-0 ollama] time=2026-04-07T12:26:15.328Z level=DEBUG source=server.go:1396 msg="model load progress 0.00" [ollama-0 ollama] time=2026-04-07T12:26:15.579Z level=DEBUG source=server.go:1396 msg="model load progress 0.03" [ollama-0 ollama] time=2026-04-07T12:26:15.831Z level=DEBUG source=server.go:1396 msg="model load progress 0.07" [ollama-0 ollama] time=2026-04-07T12:26:16.082Z level=DEBUG source=server.go:1396 msg="model load progress 0.11" [ollama-0 ollama] time=2026-04-07T12:26:16.333Z level=DEBUG source=server.go:1396 msg="model load progress 0.14" [ollama-0 ollama] time=2026-04-07T12:26:16.584Z level=DEBUG source=server.go:1396 msg="model load progress 0.17" [ollama-0 ollama] time=2026-04-07T12:26:16.834Z level=DEBUG source=server.go:1396 msg="model load progress 0.21" [ollama-0 ollama] time=2026-04-07T12:26:17.088Z level=DEBUG source=server.go:1396 msg="model load progress 0.25" [ollama-0 ollama] time=2026-04-07T12:26:17.339Z level=DEBUG source=server.go:1396 msg="model load progress 0.28" [ollama-0 ollama] time=2026-04-07T12:26:17.591Z level=DEBUG source=server.go:1396 msg="model load progress 0.31" [ollama-0 ollama] time=2026-04-07T12:26:17.843Z level=DEBUG source=server.go:1396 msg="model load progress 0.35" [ollama-0 ollama] time=2026-04-07T12:26:18.095Z level=DEBUG source=server.go:1396 msg="model load progress 0.39" [ollama-0 ollama] time=2026-04-07T12:26:18.346Z level=DEBUG source=server.go:1396 msg="model load progress 0.42" [ollama-0 ollama] time=2026-04-07T12:26:18.600Z level=DEBUG source=server.go:1396 msg="model load progress 0.44" [ollama-0 ollama] time=2026-04-07T12:26:18.851Z level=DEBUG source=server.go:1396 msg="model load progress 0.48" [ollama-0 ollama] time=2026-04-07T12:26:19.103Z level=DEBUG source=server.go:1396 msg="model load progress 0.51" [ollama-0 ollama] [GIN] 2026/04/07 - 12:26:19 | 200 | 137.068µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] [GIN] 2026/04/07 - 12:26:19 | 200 | 104.876µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] time=2026-04-07T12:26:19.355Z level=DEBUG source=server.go:1396 msg="model load progress 0.54" [ollama-0 ollama] time=2026-04-07T12:26:19.608Z level=DEBUG source=server.go:1396 msg="model load progress 0.57" [ollama-0 ollama] time=2026-04-07T12:26:19.860Z level=DEBUG source=server.go:1396 msg="model load progress 0.60" [ollama-0 ollama] time=2026-04-07T12:26:20.112Z level=DEBUG source=server.go:1396 msg="model load progress 0.63" [ollama-0 ollama] time=2026-04-07T12:26:20.364Z level=DEBUG source=server.go:1396 msg="model load progress 0.67" [ollama-0 ollama] time=2026-04-07T12:26:20.615Z level=DEBUG source=server.go:1396 msg="model load progress 0.70" [ollama-0 ollama] time=2026-04-07T12:26:20.866Z level=DEBUG source=server.go:1396 msg="model load progress 0.73" [ollama-0 ollama] time=2026-04-07T12:26:21.118Z level=DEBUG source=server.go:1396 msg="model load progress 0.76" [ollama-0 ollama] time=2026-04-07T12:26:21.369Z level=DEBUG source=server.go:1396 msg="model load progress 0.79" [ollama-0 ollama] time=2026-04-07T12:26:21.621Z level=DEBUG source=server.go:1396 msg="model load progress 0.83" [ollama-0 ollama] time=2026-04-07T12:26:21.873Z level=DEBUG source=server.go:1396 msg="model load progress 0.86" [ollama-0 ollama] time=2026-04-07T12:26:22.124Z level=DEBUG source=server.go:1396 msg="model load progress 0.89" [ollama-0 ollama] time=2026-04-07T12:26:22.376Z level=DEBUG source=server.go:1396 msg="model load progress 0.92" [ollama-0 ollama] time=2026-04-07T12:26:22.627Z level=DEBUG source=server.go:1396 msg="model load progress 0.94" [ollama-0 ollama] time=2026-04-07T12:26:23.129Z level=DEBUG source=server.go:1396 msg="model load progress 0.95" [ollama-0 ollama] time=2026-04-07T12:26:23.380Z level=DEBUG source=server.go:1396 msg="model load progress 0.96" [ollama-0 ollama] time=2026-04-07T12:26:23.631Z level=DEBUG source=server.go:1396 msg="model load progress 0.97" [ollama-0 ollama] time=2026-04-07T12:26:23.882Z level=DEBUG source=server.go:1396 msg="model load progress 0.98" [ollama-0 ollama] time=2026-04-07T12:26:24.132Z level=DEBUG source=server.go:1396 msg="model load progress 0.99" [ollama-0 ollama] [GIN] 2026/04/07 - 12:26:24 | 200 | 692.611µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] [GIN] 2026/04/07 - 12:26:24 | 200 | 49.769µs | 192.168.0.141 | GET "/api/version" [ollama-0 ollama] time=2026-04-07T12:26:24.383Z level=DEBUG source=server.go:1396 msg="model load progress 0.99" [ollama-0 ollama] time=2026-04-07T12:26:24.634Z level=DEBUG source=server.go:1396 msg="model load progress 0.99" [ollama-0 ollama] time=2026-04-07T12:26:24.786Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.pooling_type default=0 [ollama-0 ollama] time=2026-04-07T12:26:24.884Z level=INFO source=server.go:1390 msg="llama runner started in 90.06 seconds" [ollama-0 ollama] time=2026-04-07T12:26:24.884Z level=DEBUG source=sched.go:573 msg="finished setting up" runner.name=registry.ollama.ai/library/gemma4:31b runner.size="59.2 GiB" runner.vram="0 B" runner.parallel=1 runner.pid=161 runner.model=/models/blobs/sha256-280af6832eca23cb322c4dcc65edfea98a21b8f8ab07dc7553bd6f7e6e7a3313 runner.num_ctx=262144 [ollama-0 ollama] time=2026-04-07T12:26:25.012Z level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=33899 format="" [ollama-0 ollama] time=2026-04-07T12:26:25.155Z level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=0 prompt=7904 used=0 remaining=7904 ```
Author
Owner

@rick-github commented on GitHub (Apr 7, 2026):

So, there were logs?

<!-- gh-comment-id:4198934598 --> @rick-github commented on GitHub (Apr 7, 2026): So, there were logs?
Author
Owner

@khteh commented on GitHub (Apr 7, 2026):

I guess logging to stderr is ephemeral. Restarting kubectl logs only captures the logs in the moment compared to persistent logs..?

<!-- gh-comment-id:4198947659 --> @khteh commented on GitHub (Apr 7, 2026): I guess logging to stderr is ephemeral. Restarting `kubectl logs` only captures the logs in the moment compared to persistent logs..?
Author
Owner

@khteh commented on GitHub (Apr 7, 2026):

The model is still unresponsive to request.

<!-- gh-comment-id:4198953722 --> @khteh commented on GitHub (Apr 7, 2026): The model is still unresponsive to request.
Author
Owner

@rick-github commented on GitHub (Apr 7, 2026):

https://github.com/ollama/ollama/issues/15387#issuecomment-4198898466

<!-- gh-comment-id:4198956382 --> @rick-github commented on GitHub (Apr 7, 2026): https://github.com/ollama/ollama/issues/15387#issuecomment-4198898466
Author
Owner

@khteh commented on GitHub (Apr 7, 2026):

Right. That's removed but to no avail. Back to square 1.

<!-- gh-comment-id:4198958698 --> @khteh commented on GitHub (Apr 7, 2026): Right. That's removed but to no avail. Back to square 1.
Author
Owner

@rick-github commented on GitHub (Apr 7, 2026):

[ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=INFO source=ggml.go:494 msg="offloaded 0/61 layers to GPU"

Reduce OLLAMA_CONTEXT_LENGTH or get a bigger GPU.

<!-- gh-comment-id:4199001666 --> @rick-github commented on GitHub (Apr 7, 2026): ``` [ollama-0 ollama] time=2026-04-07T12:26:15.327Z level=INFO source=ggml.go:494 msg="offloaded 0/61 layers to GPU" ``` Reduce `OLLAMA_CONTEXT_LENGTH` or get a bigger GPU.
Author
Owner

@khteh commented on GitHub (Apr 7, 2026):

Why? I don't see any error in the message. And it's only 256K...?

<!-- gh-comment-id:4199014105 --> @khteh commented on GitHub (Apr 7, 2026): Why? I don't see any error in the message. And it's only 256K...?
Author
Owner

@khteh commented on GitHub (Apr 7, 2026):

So do you find this https://github.com/ollama/ollama/issues/10136 relevant now?

<!-- gh-comment-id:4199035462 --> @khteh commented on GitHub (Apr 7, 2026): So do you find this https://github.com/ollama/ollama/issues/10136 relevant now?
Author
Owner

@rick-github commented on GitHub (Apr 7, 2026):

You have a small GPU and a large context, which means that the GPU is not being used at all.

<!-- gh-comment-id:4199037158 --> @rick-github commented on GitHub (Apr 7, 2026): You have a small GPU and a large context, which means that the GPU is not being used at all.
Author
Owner

@Shazix75 commented on GitHub (Apr 7, 2026):

I think this is known issue in version 0.20.x, because even I choose only to use one GPU, the ollama run model always use 100% CPU not GPU

<!-- gh-comment-id:4199045630 --> @Shazix75 commented on GitHub (Apr 7, 2026): I think this is known issue in version 0.20.x, because even I choose only to use one GPU, the ollama run model always use 100% CPU not GPU
Author
Owner

@khteh commented on GitHub (Apr 7, 2026):

This is what I have been always struggling with using Ollama. How do I know what's the right size of context length to use when I swtich between models? Any formula for reference? My GPU is 4GiB.

<!-- gh-comment-id:4199048993 --> @khteh commented on GitHub (Apr 7, 2026): This is what I have been always struggling with using Ollama. How do I know what's the right size of context length to use when I swtich between models? Any formula for reference? My GPU is 4GiB.
Author
Owner

@rick-github commented on GitHub (Apr 7, 2026):

So do you find this #10136 relevant now?

How is this relevant? The instructions clearly state that in a container, the logs go to stderr.

<!-- gh-comment-id:4199054030 --> @rick-github commented on GitHub (Apr 7, 2026): > So do you find this [#10136](https://github.com/ollama/ollama/issues/10136) relevant now? How is this relevant? The instructions clearly state that in a container, the logs go to stderr.
Author
Owner

@khteh commented on GitHub (Apr 7, 2026):

Relevant for a persistent log instead of ephemeral.

<!-- gh-comment-id:4199063835 --> @khteh commented on GitHub (Apr 7, 2026): Relevant for a persistent log instead of ephemeral.
Author
Owner

@rick-github commented on GitHub (Apr 7, 2026):

That's up to the container manager. In docker, the logs persist until the container is shutdown. If you want to persist logs across container instantiations, that's not an ollama issue.

<!-- gh-comment-id:4199070503 --> @rick-github commented on GitHub (Apr 7, 2026): That's up to the container manager. In docker, the logs persist until the container is shutdown. If you want to persist logs across container instantiations, that's not an ollama issue.
Author
Owner

@rick-github commented on GitHub (Apr 7, 2026):

@Shazix75 If you would like this debugged, open a new issue and add server log.

<!-- gh-comment-id:4199093721 --> @rick-github commented on GitHub (Apr 7, 2026): @Shazix75 If you would like this debugged, open a new issue and add [server log](https://docs.ollama.com/troubleshooting).
Author
Owner

@khteh commented on GitHub (Apr 7, 2026):

You have a small GPU and a large context, which means that the GPU is not being used at all.

Reduced to 8192 but to no avail. Still no response from the model.

[ollama-0 ollama] time=2026-04-07T12:57:06.580Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.pooling_type default=0
[ollama-0 ollama] time=2026-04-07T12:57:06.823Z level=INFO source=server.go:1390 msg="llama runner started in 45.12 seconds"
[ollama-0 ollama] time=2026-04-07T12:57:06.823Z level=DEBUG source=sched.go:573 msg="finished setting up" runner.name=registry.ollama.ai/library/gemma4:31b runner.inference="[{ID:GPU-9762feba-cea4-7981-7353-533400b79c72 Library:CUDA}]" runner.size="24.7 GiB" runner.vram="3.1 GiB" runner.parallel=1 runner.pid=164 runner.model=/models/blobs/sha256-280af6832eca23cb322c4dcc65edfea98a21b8f8ab07dc7553bd6f7e6e7a3313 runner.num_ctx=8192
[ollama-0 ollama] time=2026-04-07T12:57:06.930Z level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=33899 format=""
[ollama-0 ollama] time=2026-04-07T12:57:07.031Z level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=0 prompt=7904 used=0 remaining=7904
<!-- gh-comment-id:4199109830 --> @khteh commented on GitHub (Apr 7, 2026): > You have a small GPU and a large context, which means that the GPU is not being used at all. Reduced to 8192 but to no avail. Still no response from the model. ``` [ollama-0 ollama] time=2026-04-07T12:57:06.580Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=gemma4.pooling_type default=0 [ollama-0 ollama] time=2026-04-07T12:57:06.823Z level=INFO source=server.go:1390 msg="llama runner started in 45.12 seconds" [ollama-0 ollama] time=2026-04-07T12:57:06.823Z level=DEBUG source=sched.go:573 msg="finished setting up" runner.name=registry.ollama.ai/library/gemma4:31b runner.inference="[{ID:GPU-9762feba-cea4-7981-7353-533400b79c72 Library:CUDA}]" runner.size="24.7 GiB" runner.vram="3.1 GiB" runner.parallel=1 runner.pid=164 runner.model=/models/blobs/sha256-280af6832eca23cb322c4dcc65edfea98a21b8f8ab07dc7553bd6f7e6e7a3313 runner.num_ctx=8192 [ollama-0 ollama] time=2026-04-07T12:57:06.930Z level=DEBUG source=server.go:1538 msg="completion request" images=0 prompt=33899 format="" [ollama-0 ollama] time=2026-04-07T12:57:07.031Z level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=0 prompt=7904 used=0 remaining=7904 ```
Author
Owner

@Shazix75 commented on GitHub (Apr 7, 2026):

@Shazix75 If you would like this debugged, open a new issue and add server log.

Thank you, I have the same issue as this two #15237 and #15352, so I think is the same problem, need to disable the flash

<!-- gh-comment-id:4199166934 --> @Shazix75 commented on GitHub (Apr 7, 2026): > [@Shazix75](https://github.com/Shazix75) If you would like this debugged, open a new issue and add [server log](https://docs.ollama.com/troubleshooting). Thank you, I have the same issue as this two #15237 and #15352, so I think is the same problem, need to disable the flash
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9841