[GH-ISSUE #15284] Out of memory Gemma4 31B Q4 #56290

Closed
opened 2026-04-29 10:34:54 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @phr0gz on GitHub (Apr 3, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15284

What is the issue?

Cannot load Gemma4 31B Q4 on a multi-gpu server using docker. I didn't use this server since I received my GB10, but since the perf for Gemma4 are bad on it, I decided to use my old multi-gpu server.

It fails to allocate the memory: seems to do an infinite loop with "cudaMalloc failed: out of memory"

GPU are:
Device 0: NVIDIA GeForce RTX 3090, available="22.3 GiB" free="22.7 GiB"
Device 1: NVIDIA GeForce RTX 4060 Ti, available="12.8 GiB" free="13.3 GiB"
Device 2: NVIDIA GeForce RTX 5060 Ti, available="14.9 GiB" free="15.3 GiB"

Relevant log output

time=2026-04-03T12:43:48.418Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 33383"
time=2026-04-03T12:43:48.900Z level=WARN source=cpu_linux.go:130 msg="failed to parse CPU allowed micro secs" error="strconv.ParseInt: parsing \"max\": invalid syntax"
time=2026-04-03T12:43:49.081Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
time=2026-04-03T12:43:49.081Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-280af6832eca23cb322c4dcc65edfea98a21b8f8ab07dc7553bd6f7e6e7a3313 --port 34277"
time=2026-04-03T12:43:49.081Z level=INFO source=sched.go:484 msg="system memory" total="29.4 GiB" free="29.2 GiB" free_swap="8.0 GiB"
time=2026-04-03T12:43:49.081Z level=INFO source=sched.go:491 msg="gpu memory" id=GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 library=CUDA available="22.3 GiB" free="22.7 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-04-03T12:43:49.081Z level=INFO source=sched.go:491 msg="gpu memory" id=GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 library=CUDA available="12.8 GiB" free="13.3 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-04-03T12:43:49.081Z level=INFO source=sched.go:491 msg="gpu memory" id=GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 library=CUDA available="14.9 GiB" free="15.3 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-04-03T12:43:49.081Z level=INFO source=server.go:759 msg="loading model" "model layers"=61 requested=-1
time=2026-04-03T12:43:49.097Z level=INFO source=runner.go:1417 msg="starting ollama engine"
time=2026-04-03T12:43:49.097Z level=INFO source=runner.go:1452 msg="Server listening on 127.0.0.1:34277"
time=2026-04-03T12:43:49.103Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:61[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:61(0..60)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-03T12:43:49.174Z level=INFO source=ggml.go:136 msg="" architecture=gemma4 file_type=Q4_K_M name="" description="" num_tensors=1189 num_key_values=49
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 3 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, ID: GPU-a000b5fa-b553-4068-d239-bbebd2e92d97
  Device 1: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes, ID: GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667
  Device 2: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes, ID: GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0
load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v13/libggml-cuda.so
time=2026-04-03T12:43:49.291Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 CUDA.1.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.1.USE_GRAPHS=1 CUDA.1.PEER_MAX_BATCH_SIZE=128 CUDA.2.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.2.USE_GRAPHS=1 CUDA.2.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
time=2026-04-03T12:43:49.301Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
time=2026-04-03T12:43:49.328Z level=INFO source=model.go:138 msg="vision: decode" elapsed=2.330937ms bounds=(0,0)-(2048,2048)
time=2026-04-03T12:43:49.453Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=125.553785ms size="[768 768]"
time=2026-04-03T12:43:49.454Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
time=2026-04-03T12:43:49.454Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
time=2026-04-03T12:43:49.455Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=129.035176ms shape="[5376 256]"
time=2026-04-03T12:43:51.224Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:39[ ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:21(21..41) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:18(42..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-03T12:43:51.297Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
time=2026-04-03T12:43:51.311Z level=INFO source=model.go:138 msg="vision: decode" elapsed=774.209µs bounds=(0,0)-(2048,2048)
time=2026-04-03T12:43:51.427Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=115.100016ms size="[768 768]"
time=2026-04-03T12:43:51.427Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
time=2026-04-03T12:43:51.427Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
time=2026-04-03T12:43:51.428Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=116.951326ms shape="[5376 256]"
time=2026-04-03T12:43:52.666Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:31[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:31(29..59)  ] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-03T12:43:52.743Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
time=2026-04-03T12:43:52.762Z level=INFO source=model.go:138 msg="vision: decode" elapsed=931.651µs bounds=(0,0)-(2048,2048)
time=2026-04-03T12:43:52.891Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=129.212158ms size="[768 768]"
time=2026-04-03T12:43:52.891Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
time=2026-04-03T12:43:52.891Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
time=2026-04-03T12:43:52.892Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=131.267002ms shape="[5376 256]"
time=2026-04-03T12:43:53.702Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:38[ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:20(22..41) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:18(42..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-03T12:43:53.776Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
time=2026-04-03T12:43:53.787Z level=INFO source=model.go:138 msg="vision: decode" elapsed=609.757µs bounds=(0,0)-(2048,2048)
time=2026-04-03T12:43:53.895Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=108.032442ms size="[768 768]"
time=2026-04-03T12:43:53.895Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
time=2026-04-03T12:43:53.895Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
time=2026-04-03T12:43:53.896Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=109.808303ms shape="[5376 256]"
time=2026-04-03T12:43:54.934Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:37[ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:19(23..41) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:18(42..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-03T12:43:55.000Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
time=2026-04-03T12:43:55.010Z level=INFO source=model.go:138 msg="vision: decode" elapsed=559.417µs bounds=(0,0)-(2048,2048)
time=2026-04-03T12:43:55.120Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=109.851565ms size="[768 768]"
time=2026-04-03T12:43:55.120Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
time=2026-04-03T12:43:55.120Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
time=2026-04-03T12:43:55.121Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=111.527624ms shape="[5376 256]"
time=2026-04-03T12:43:55.978Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:36[ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:21(24..44) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:15(45..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-03T12:43:56.047Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
time=2026-04-03T12:43:56.065Z level=INFO source=model.go:138 msg="vision: decode" elapsed=607.477µs bounds=(0,0)-(2048,2048)
time=2026-04-03T12:43:56.180Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=114.917032ms size="[768 768]"
time=2026-04-03T12:43:56.180Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
time=2026-04-03T12:43:56.180Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
time=2026-04-03T12:43:56.181Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=116.684472ms shape="[5376 256]"
time=2026-04-03T12:43:56.965Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:35[ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:20(25..44) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:15(45..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-03T12:43:57.040Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
time=2026-04-03T12:43:57.056Z level=INFO source=model.go:138 msg="vision: decode" elapsed=576.697µs bounds=(0,0)-(2048,2048)
time=2026-04-03T12:43:57.177Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=121.738252ms size="[768 768]"
time=2026-04-03T12:43:57.177Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
time=2026-04-03T12:43:57.177Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
time=2026-04-03T12:43:57.178Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=123.428372ms shape="[5376 256]"
time=2026-04-03T12:43:57.949Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:34[ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:20(26..45) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:14(46..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-03T12:43:58.018Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
time=2026-04-03T12:43:58.037Z level=INFO source=model.go:138 msg="vision: decode" elapsed=2.121795ms bounds=(0,0)-(2048,2048)
time=2026-04-03T12:43:58.161Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=123.804186ms size="[768 768]"
time=2026-04-03T12:43:58.161Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
time=2026-04-03T12:43:58.161Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
time=2026-04-03T12:43:58.162Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=127.041373ms shape="[5376 256]"
time=2026-04-03T12:43:58.935Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:33[ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:19(27..45) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:14(46..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-03T12:43:59.013Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
time=2026-04-03T12:43:59.032Z level=INFO source=model.go:138 msg="vision: decode" elapsed=610.297µs bounds=(0,0)-(2048,2048)
time=2026-04-03T12:43:59.158Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=126.7683ms size="[768 768]"
time=2026-04-03T12:43:59.159Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
time=2026-04-03T12:43:59.159Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
time=2026-04-03T12:43:59.160Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=128.4948ms shape="[5376 256]"
time=2026-04-03T12:43:59.966Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:32[ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:19(28..46) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:13(47..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-03T12:44:00.048Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
time=2026-04-03T12:44:00.074Z level=INFO source=model.go:138 msg="vision: decode" elapsed=2.505719ms bounds=(0,0)-(2048,2048)
time=2026-04-03T12:44:00.199Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=124.783707ms size="[768 768]"
time=2026-04-03T12:44:00.199Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
time=2026-04-03T12:44:00.199Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
time=2026-04-03T12:44:00.200Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=128.47336ms shape="[5376 256]"
time=2026-04-03T12:44:01.016Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:31[ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:18(29..46) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:13(47..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-03T12:44:01.098Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
time=2026-04-03T12:44:01.119Z level=INFO source=model.go:138 msg="vision: decode" elapsed=582.076µs bounds=(0,0)-(2048,2048)
time=2026-04-03T12:44:01.247Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=128.114525ms size="[768 768]"
time=2026-04-03T12:44:01.247Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
time=2026-04-03T12:44:01.247Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
time=2026-04-03T12:44:01.248Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=129.852516ms shape="[5376 256]"
time=2026-04-03T12:44:02.021Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:31[ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:18(29..46) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:13(47..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-03T12:44:02.091Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
time=2026-04-03T12:44:02.110Z level=INFO source=model.go:138 msg="vision: decode" elapsed=2.237757ms bounds=(0,0)-(2048,2048)
time=2026-04-03T12:44:02.235Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=125.330923ms size="[768 768]"
time=2026-04-03T12:44:02.240Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
time=2026-04-03T12:44:02.240Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
time=2026-04-03T12:44:02.241Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=133.65614ms shape="[5376 256]"
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 19123.94 MiB on device 1: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA1 buffer of size 20052907008
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 2: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA2 buffer of size 17815312384
time=2026-04-03T12:44:10.396Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:31[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:31(29..59)  ] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-03T12:44:10.516Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
time=2026-04-03T12:44:10.538Z level=INFO source=model.go:138 msg="vision: decode" elapsed=773.309µs bounds=(0,0)-(2048,2048)
time=2026-04-03T12:44:10.680Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=142.31186ms size="[768 768]"
time=2026-04-03T12:44:10.685Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
time=2026-04-03T12:44:10.685Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
time=2026-04-03T12:44:10.686Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=148.553763ms shape="[5376 256]"
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 1024.00 MiB on device 0: cudaMalloc failed: out of memory
time=2026-04-03T12:44:18.276Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:60[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:29(0..28) ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:18(29..46) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:13(47..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-03T12:44:18.608Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
time=2026-04-03T12:44:18.633Z level=INFO source=model.go:138 msg="vision: decode" elapsed=2.511019ms bounds=(0,0)-(2048,2048)
time=2026-04-03T12:44:18.767Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=133.963624ms size="[768 768]"
time=2026-04-03T12:44:18.767Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
time=2026-04-03T12:44:18.767Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
time=2026-04-03T12:44:18.768Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=137.791528ms shape="[5376 256]"
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 17005.76 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 17831827456
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 1: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA1 buffer of size 17815312512
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 2: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA2 buffer of size 17815312384
time=2026-04-03T12:44:20.887Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:59[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:28(1..28) ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:18(29..46) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:13(47..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-03T12:44:21.225Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
time=2026-04-03T12:44:21.253Z level=INFO source=model.go:138 msg="vision: decode" elapsed=674.508µs bounds=(0,0)-(2048,2048)
time=2026-04-03T12:44:21.389Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=135.486231ms size="[768 768]"
time=2026-04-03T12:44:21.393Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
time=2026-04-03T12:44:21.393Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
time=2026-04-03T12:44:21.394Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=141.570322ms shape="[5376 256]"
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 17075.94 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 17905423360
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 1: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA1 buffer of size 17815312512
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 2: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA2 buffer of size 17815312384
time=2026-04-03T12:44:23.556Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:58[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:27(2..28) ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:18(29..46) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:13(47..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-03T12:44:23.874Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
time=2026-04-03T12:44:23.905Z level=INFO source=model.go:138 msg="vision: decode" elapsed=2.58332ms bounds=(0,0)-(2048,2048)
time=2026-04-03T12:44:24.039Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=133.441317ms size="[768 768]"
time=2026-04-03T12:44:24.041Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
time=2026-04-03T12:44:24.041Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
time=2026-04-03T12:44:24.042Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=138.874162ms shape="[5376 256]"
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 17075.94 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 17905423360
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 1: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA1 buffer of size 17815312512
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 2: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA2 buffer of size 17815312384
time=2026-04-03T12:44:26.219Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:57[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:26(3..28) ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:18(29..46) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:13(47..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-03T12:44:26.555Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
time=2026-04-03T12:44:26.576Z level=INFO source=model.go:138 msg="vision: decode" elapsed=633.457µs bounds=(0,0)-(2048,2048)
time=2026-04-03T12:44:26.712Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=135.760874ms size="[768 768]"
time=2026-04-03T12:44:26.714Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
time=2026-04-03T12:44:26.714Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
time=2026-04-03T12:44:26.715Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=139.204774ms shape="[5376 256]"
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 17075.94 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 17905423360
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 1: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA1 buffer of size 17815312512
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 2: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA2 buffer of size 17815312384
time=2026-04-03T12:44:28.999Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:56[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:25(4..28) ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:18(29..46) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:13(47..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-03T12:44:29.094Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
time=2026-04-03T12:44:29.114Z level=INFO source=model.go:138 msg="vision: decode" elapsed=2.396147ms bounds=(0,0)-(2048,2048)
time=2026-04-03T12:44:29.247Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=132.916112ms size="[768 768]"
time=2026-04-03T12:44:29.248Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
time=2026-04-03T12:44:29.248Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
time=2026-04-03T12:44:29.249Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=136.934537ms shape="[5376 256]"
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 17075.94 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 17905423360
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 1: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA1 buffer of size 17815312512
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 2: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA2 buffer of size 17815312384
time=2026-04-03T12:44:31.540Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:55[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:24(5..28) ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:18(29..46) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:13(47..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-03T12:44:31.841Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
time=2026-04-03T12:44:31.861Z level=INFO source=model.go:138 msg="vision: decode" elapsed=577.647µs bounds=(0,0)-(2048,2048)
time=2026-04-03T12:44:31.995Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=133.095073ms size="[768 768]"
time=2026-04-03T12:44:31.998Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
time=2026-04-03T12:44:31.998Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
time=2026-04-03T12:44:31.999Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=138.305505ms shape="[5376 256]"
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 17075.94 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 17905423360
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 1: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA1 buffer of size 17815312512
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 2: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA2 buffer of size 17815312384
time=2026-04-03T12:44:34.318Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:54[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:35(6..40) ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:19(41..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-03T12:44:34.614Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
time=2026-04-03T12:44:34.636Z level=INFO source=model.go:138 msg="vision: decode" elapsed=2.528089ms bounds=(0,0)-(2048,2048)
time=2026-04-03T12:44:34.768Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=132.118952ms size="[768 768]"
time=2026-04-03T12:44:34.771Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
time=2026-04-03T12:44:34.771Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
time=2026-04-03T12:44:34.771Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=137.606036ms shape="[5376 256]"
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 19123.94 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 20052907008
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 2: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA2 buffer of size 17815312512
time=2026-04-03T12:44:38.161Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:53[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:34(7..40) ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:19(41..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-03T12:44:38.247Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
time=2026-04-03T12:44:38.269Z level=INFO source=model.go:138 msg="vision: decode" elapsed=557.717µs bounds=(0,0)-(2048,2048)
time=2026-04-03T12:44:38.404Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=135.564832ms size="[768 768]"
time=2026-04-03T12:44:38.405Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
time=2026-04-03T12:44:38.405Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
time=2026-04-03T12:44:38.406Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=137.729017ms shape="[5376 256]"
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 19123.94 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 20052907008
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 2: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA2 buffer of size 17815312512
time=2026-04-03T12:44:41.831Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:52[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:33(8..40) ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:19(41..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-03T12:44:41.912Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
time=2026-04-03T12:44:41.931Z level=INFO source=model.go:138 msg="vision: decode" elapsed=3.485651ms bounds=(0,0)-(2048,2048)
time=2026-04-03T12:44:42.064Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=133.56348ms size="[768 768]"
time=2026-04-03T12:44:42.069Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
time=2026-04-03T12:44:42.069Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
time=2026-04-03T12:44:42.070Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=142.786767ms shape="[5376 256]"
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 19123.94 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 20052907008
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 2: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA2 buffer of size 17815312512
time=2026-04-03T12:44:45.777Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:51[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:32(9..40) ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:19(41..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-03T12:44:45.858Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
time=2026-04-03T12:44:45.873Z level=INFO source=model.go:138 msg="vision: decode" elapsed=969.001µs bounds=(0,0)-(2048,2048)
time=2026-04-03T12:44:46.003Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=129.26575ms size="[768 768]"
time=2026-04-03T12:44:46.007Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
time=2026-04-03T12:44:46.007Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
time=2026-04-03T12:44:46.008Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=135.525641ms shape="[5376 256]"
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 19123.94 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 20052907008
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 2: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA2 buffer of size 17815312512
time=2026-04-03T12:44:49.639Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:50[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:31(10..40) ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:19(41..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-03T12:44:50.046Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
time=2026-04-03T12:44:50.070Z level=INFO source=model.go:138 msg="vision: decode" elapsed=2.302777ms bounds=(0,0)-(2048,2048)
time=2026-04-03T12:44:50.203Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=132.114042ms size="[768 768]"
time=2026-04-03T12:44:50.204Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
time=2026-04-03T12:44:50.204Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
time=2026-04-03T12:44:50.205Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=136.919907ms shape="[5376 256]"
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 19123.94 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 20052907008
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 2: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA2 buffer of size 17815312512
time=2026-04-03T12:44:53.975Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:49[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:30(11..40) ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:19(41..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-03T12:44:54.386Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
time=2026-04-03T12:44:54.408Z level=INFO source=model.go:138 msg="vision: decode" elapsed=700.768µs bounds=(0,0)-(2048,2048)
time=2026-04-03T12:44:54.544Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=136.038048ms size="[768 768]"
time=2026-04-03T12:44:54.545Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
time=2026-04-03T12:44:54.545Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
time=2026-04-03T12:44:54.546Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=138.302324ms shape="[5376 256]"
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 19123.94 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 20052907008
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 2: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA2 buffer of size 17815312512
time=2026-04-03T12:44:58.081Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:48[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:30(12..41) ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:18(42..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-03T12:44:58.421Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883
time=2026-04-03T12:44:58.444Z level=INFO source=model.go:138 msg="vision: decode" elapsed=702.599µs bounds=(0,0)-(2048,2048)
time=2026-04-03T12:44:58.585Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=140.728012ms size="[768 768]"
time=2026-04-03T12:44:58.586Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3
time=2026-04-03T12:44:58.586Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16
time=2026-04-03T12:44:58.587Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=143.08269ms shape="[5376 256]"
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 19123.94 MiB on device 0: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 20052907008
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 2: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n_impl: failed to allocate CUDA2 buffer of size 17815312512

OS

Docker

GPU

Nvidia

CPU

No response

Ollama version

0.20.0

Originally created by @phr0gz on GitHub (Apr 3, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15284 ### What is the issue? Cannot load Gemma4 31B Q4 on a multi-gpu server using docker. I didn't use this server since I received my GB10, but since the perf for Gemma4 are bad on it, I decided to use my old multi-gpu server. It fails to allocate the memory: seems to do an infinite loop with "cudaMalloc failed: out of memory" GPU are: Device 0: NVIDIA GeForce RTX 3090, available="22.3 GiB" free="22.7 GiB" Device 1: NVIDIA GeForce RTX 4060 Ti, available="12.8 GiB" free="13.3 GiB" Device 2: NVIDIA GeForce RTX 5060 Ti, available="14.9 GiB" free="15.3 GiB" ### Relevant log output ```shell time=2026-04-03T12:43:48.418Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 33383" time=2026-04-03T12:43:48.900Z level=WARN source=cpu_linux.go:130 msg="failed to parse CPU allowed micro secs" error="strconv.ParseInt: parsing \"max\": invalid syntax" time=2026-04-03T12:43:49.081Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 time=2026-04-03T12:43:49.081Z level=INFO source=server.go:432 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-280af6832eca23cb322c4dcc65edfea98a21b8f8ab07dc7553bd6f7e6e7a3313 --port 34277" time=2026-04-03T12:43:49.081Z level=INFO source=sched.go:484 msg="system memory" total="29.4 GiB" free="29.2 GiB" free_swap="8.0 GiB" time=2026-04-03T12:43:49.081Z level=INFO source=sched.go:491 msg="gpu memory" id=GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 library=CUDA available="22.3 GiB" free="22.7 GiB" minimum="457.0 MiB" overhead="0 B" time=2026-04-03T12:43:49.081Z level=INFO source=sched.go:491 msg="gpu memory" id=GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 library=CUDA available="12.8 GiB" free="13.3 GiB" minimum="457.0 MiB" overhead="0 B" time=2026-04-03T12:43:49.081Z level=INFO source=sched.go:491 msg="gpu memory" id=GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 library=CUDA available="14.9 GiB" free="15.3 GiB" minimum="457.0 MiB" overhead="0 B" time=2026-04-03T12:43:49.081Z level=INFO source=server.go:759 msg="loading model" "model layers"=61 requested=-1 time=2026-04-03T12:43:49.097Z level=INFO source=runner.go:1417 msg="starting ollama engine" time=2026-04-03T12:43:49.097Z level=INFO source=runner.go:1452 msg="Server listening on 127.0.0.1:34277" time=2026-04-03T12:43:49.103Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:61[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:61(0..60)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-03T12:43:49.174Z level=INFO source=ggml.go:136 msg="" architecture=gemma4 file_type=Q4_K_M name="" description="" num_tensors=1189 num_key_values=49 load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 3 CUDA devices: Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, ID: GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Device 1: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes, ID: GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Device 2: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes, ID: GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v13/libggml-cuda.so time=2026-04-03T12:43:49.291Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 CUDA.1.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.1.USE_GRAPHS=1 CUDA.1.PEER_MAX_BATCH_SIZE=128 CUDA.2.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.2.USE_GRAPHS=1 CUDA.2.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc) time=2026-04-03T12:43:49.301Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 time=2026-04-03T12:43:49.328Z level=INFO source=model.go:138 msg="vision: decode" elapsed=2.330937ms bounds=(0,0)-(2048,2048) time=2026-04-03T12:43:49.453Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=125.553785ms size="[768 768]" time=2026-04-03T12:43:49.454Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 time=2026-04-03T12:43:49.454Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 time=2026-04-03T12:43:49.455Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=129.035176ms shape="[5376 256]" time=2026-04-03T12:43:51.224Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:39[ ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:21(21..41) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:18(42..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-03T12:43:51.297Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 time=2026-04-03T12:43:51.311Z level=INFO source=model.go:138 msg="vision: decode" elapsed=774.209µs bounds=(0,0)-(2048,2048) time=2026-04-03T12:43:51.427Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=115.100016ms size="[768 768]" time=2026-04-03T12:43:51.427Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 time=2026-04-03T12:43:51.427Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 time=2026-04-03T12:43:51.428Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=116.951326ms shape="[5376 256]" time=2026-04-03T12:43:52.666Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:31[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:31(29..59) ] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-03T12:43:52.743Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 time=2026-04-03T12:43:52.762Z level=INFO source=model.go:138 msg="vision: decode" elapsed=931.651µs bounds=(0,0)-(2048,2048) time=2026-04-03T12:43:52.891Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=129.212158ms size="[768 768]" time=2026-04-03T12:43:52.891Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 time=2026-04-03T12:43:52.891Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 time=2026-04-03T12:43:52.892Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=131.267002ms shape="[5376 256]" time=2026-04-03T12:43:53.702Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:38[ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:20(22..41) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:18(42..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-03T12:43:53.776Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 time=2026-04-03T12:43:53.787Z level=INFO source=model.go:138 msg="vision: decode" elapsed=609.757µs bounds=(0,0)-(2048,2048) time=2026-04-03T12:43:53.895Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=108.032442ms size="[768 768]" time=2026-04-03T12:43:53.895Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 time=2026-04-03T12:43:53.895Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 time=2026-04-03T12:43:53.896Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=109.808303ms shape="[5376 256]" time=2026-04-03T12:43:54.934Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:37[ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:19(23..41) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:18(42..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-03T12:43:55.000Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 time=2026-04-03T12:43:55.010Z level=INFO source=model.go:138 msg="vision: decode" elapsed=559.417µs bounds=(0,0)-(2048,2048) time=2026-04-03T12:43:55.120Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=109.851565ms size="[768 768]" time=2026-04-03T12:43:55.120Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 time=2026-04-03T12:43:55.120Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 time=2026-04-03T12:43:55.121Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=111.527624ms shape="[5376 256]" time=2026-04-03T12:43:55.978Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:36[ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:21(24..44) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:15(45..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-03T12:43:56.047Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 time=2026-04-03T12:43:56.065Z level=INFO source=model.go:138 msg="vision: decode" elapsed=607.477µs bounds=(0,0)-(2048,2048) time=2026-04-03T12:43:56.180Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=114.917032ms size="[768 768]" time=2026-04-03T12:43:56.180Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 time=2026-04-03T12:43:56.180Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 time=2026-04-03T12:43:56.181Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=116.684472ms shape="[5376 256]" time=2026-04-03T12:43:56.965Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:35[ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:20(25..44) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:15(45..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-03T12:43:57.040Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 time=2026-04-03T12:43:57.056Z level=INFO source=model.go:138 msg="vision: decode" elapsed=576.697µs bounds=(0,0)-(2048,2048) time=2026-04-03T12:43:57.177Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=121.738252ms size="[768 768]" time=2026-04-03T12:43:57.177Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 time=2026-04-03T12:43:57.177Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 time=2026-04-03T12:43:57.178Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=123.428372ms shape="[5376 256]" time=2026-04-03T12:43:57.949Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:34[ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:20(26..45) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:14(46..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-03T12:43:58.018Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 time=2026-04-03T12:43:58.037Z level=INFO source=model.go:138 msg="vision: decode" elapsed=2.121795ms bounds=(0,0)-(2048,2048) time=2026-04-03T12:43:58.161Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=123.804186ms size="[768 768]" time=2026-04-03T12:43:58.161Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 time=2026-04-03T12:43:58.161Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 time=2026-04-03T12:43:58.162Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=127.041373ms shape="[5376 256]" time=2026-04-03T12:43:58.935Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:33[ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:19(27..45) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:14(46..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-03T12:43:59.013Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 time=2026-04-03T12:43:59.032Z level=INFO source=model.go:138 msg="vision: decode" elapsed=610.297µs bounds=(0,0)-(2048,2048) time=2026-04-03T12:43:59.158Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=126.7683ms size="[768 768]" time=2026-04-03T12:43:59.159Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 time=2026-04-03T12:43:59.159Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 time=2026-04-03T12:43:59.160Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=128.4948ms shape="[5376 256]" time=2026-04-03T12:43:59.966Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:32[ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:19(28..46) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:13(47..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-03T12:44:00.048Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 time=2026-04-03T12:44:00.074Z level=INFO source=model.go:138 msg="vision: decode" elapsed=2.505719ms bounds=(0,0)-(2048,2048) time=2026-04-03T12:44:00.199Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=124.783707ms size="[768 768]" time=2026-04-03T12:44:00.199Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 time=2026-04-03T12:44:00.199Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 time=2026-04-03T12:44:00.200Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=128.47336ms shape="[5376 256]" time=2026-04-03T12:44:01.016Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:31[ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:18(29..46) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:13(47..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-03T12:44:01.098Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 time=2026-04-03T12:44:01.119Z level=INFO source=model.go:138 msg="vision: decode" elapsed=582.076µs bounds=(0,0)-(2048,2048) time=2026-04-03T12:44:01.247Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=128.114525ms size="[768 768]" time=2026-04-03T12:44:01.247Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 time=2026-04-03T12:44:01.247Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 time=2026-04-03T12:44:01.248Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=129.852516ms shape="[5376 256]" time=2026-04-03T12:44:02.021Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:31[ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:18(29..46) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:13(47..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-03T12:44:02.091Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 time=2026-04-03T12:44:02.110Z level=INFO source=model.go:138 msg="vision: decode" elapsed=2.237757ms bounds=(0,0)-(2048,2048) time=2026-04-03T12:44:02.235Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=125.330923ms size="[768 768]" time=2026-04-03T12:44:02.240Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 time=2026-04-03T12:44:02.240Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 time=2026-04-03T12:44:02.241Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=133.65614ms shape="[5376 256]" ggml_backend_cuda_buffer_type_alloc_buffer: allocating 19123.94 MiB on device 1: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA1 buffer of size 20052907008 ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 2: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA2 buffer of size 17815312384 time=2026-04-03T12:44:10.396Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:31[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:31(29..59) ] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-03T12:44:10.516Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 time=2026-04-03T12:44:10.538Z level=INFO source=model.go:138 msg="vision: decode" elapsed=773.309µs bounds=(0,0)-(2048,2048) time=2026-04-03T12:44:10.680Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=142.31186ms size="[768 768]" time=2026-04-03T12:44:10.685Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 time=2026-04-03T12:44:10.685Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 time=2026-04-03T12:44:10.686Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=148.553763ms shape="[5376 256]" ggml_backend_cuda_buffer_type_alloc_buffer: allocating 1024.00 MiB on device 0: cudaMalloc failed: out of memory time=2026-04-03T12:44:18.276Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:60[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:29(0..28) ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:18(29..46) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:13(47..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-03T12:44:18.608Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 time=2026-04-03T12:44:18.633Z level=INFO source=model.go:138 msg="vision: decode" elapsed=2.511019ms bounds=(0,0)-(2048,2048) time=2026-04-03T12:44:18.767Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=133.963624ms size="[768 768]" time=2026-04-03T12:44:18.767Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 time=2026-04-03T12:44:18.767Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 time=2026-04-03T12:44:18.768Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=137.791528ms shape="[5376 256]" ggml_backend_cuda_buffer_type_alloc_buffer: allocating 17005.76 MiB on device 0: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 17831827456 ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 1: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA1 buffer of size 17815312512 ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 2: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA2 buffer of size 17815312384 time=2026-04-03T12:44:20.887Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:59[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:28(1..28) ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:18(29..46) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:13(47..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-03T12:44:21.225Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 time=2026-04-03T12:44:21.253Z level=INFO source=model.go:138 msg="vision: decode" elapsed=674.508µs bounds=(0,0)-(2048,2048) time=2026-04-03T12:44:21.389Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=135.486231ms size="[768 768]" time=2026-04-03T12:44:21.393Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 time=2026-04-03T12:44:21.393Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 time=2026-04-03T12:44:21.394Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=141.570322ms shape="[5376 256]" ggml_backend_cuda_buffer_type_alloc_buffer: allocating 17075.94 MiB on device 0: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 17905423360 ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 1: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA1 buffer of size 17815312512 ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 2: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA2 buffer of size 17815312384 time=2026-04-03T12:44:23.556Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:58[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:27(2..28) ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:18(29..46) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:13(47..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-03T12:44:23.874Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 time=2026-04-03T12:44:23.905Z level=INFO source=model.go:138 msg="vision: decode" elapsed=2.58332ms bounds=(0,0)-(2048,2048) time=2026-04-03T12:44:24.039Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=133.441317ms size="[768 768]" time=2026-04-03T12:44:24.041Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 time=2026-04-03T12:44:24.041Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 time=2026-04-03T12:44:24.042Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=138.874162ms shape="[5376 256]" ggml_backend_cuda_buffer_type_alloc_buffer: allocating 17075.94 MiB on device 0: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 17905423360 ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 1: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA1 buffer of size 17815312512 ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 2: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA2 buffer of size 17815312384 time=2026-04-03T12:44:26.219Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:57[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:26(3..28) ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:18(29..46) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:13(47..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-03T12:44:26.555Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 time=2026-04-03T12:44:26.576Z level=INFO source=model.go:138 msg="vision: decode" elapsed=633.457µs bounds=(0,0)-(2048,2048) time=2026-04-03T12:44:26.712Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=135.760874ms size="[768 768]" time=2026-04-03T12:44:26.714Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 time=2026-04-03T12:44:26.714Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 time=2026-04-03T12:44:26.715Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=139.204774ms shape="[5376 256]" ggml_backend_cuda_buffer_type_alloc_buffer: allocating 17075.94 MiB on device 0: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 17905423360 ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 1: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA1 buffer of size 17815312512 ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 2: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA2 buffer of size 17815312384 time=2026-04-03T12:44:28.999Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:56[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:25(4..28) ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:18(29..46) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:13(47..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-03T12:44:29.094Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 time=2026-04-03T12:44:29.114Z level=INFO source=model.go:138 msg="vision: decode" elapsed=2.396147ms bounds=(0,0)-(2048,2048) time=2026-04-03T12:44:29.247Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=132.916112ms size="[768 768]" time=2026-04-03T12:44:29.248Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 time=2026-04-03T12:44:29.248Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 time=2026-04-03T12:44:29.249Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=136.934537ms shape="[5376 256]" ggml_backend_cuda_buffer_type_alloc_buffer: allocating 17075.94 MiB on device 0: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 17905423360 ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 1: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA1 buffer of size 17815312512 ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 2: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA2 buffer of size 17815312384 time=2026-04-03T12:44:31.540Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:55[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:24(5..28) ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:18(29..46) ID:GPU-52aab516-8e4c-8a18-1ec5-ea4e93119667 Layers:13(47..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-03T12:44:31.841Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 time=2026-04-03T12:44:31.861Z level=INFO source=model.go:138 msg="vision: decode" elapsed=577.647µs bounds=(0,0)-(2048,2048) time=2026-04-03T12:44:31.995Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=133.095073ms size="[768 768]" time=2026-04-03T12:44:31.998Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 time=2026-04-03T12:44:31.998Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 time=2026-04-03T12:44:31.999Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=138.305505ms shape="[5376 256]" ggml_backend_cuda_buffer_type_alloc_buffer: allocating 17075.94 MiB on device 0: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 17905423360 ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 1: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA1 buffer of size 17815312512 ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 2: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA2 buffer of size 17815312384 time=2026-04-03T12:44:34.318Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:54[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:35(6..40) ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:19(41..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-03T12:44:34.614Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 time=2026-04-03T12:44:34.636Z level=INFO source=model.go:138 msg="vision: decode" elapsed=2.528089ms bounds=(0,0)-(2048,2048) time=2026-04-03T12:44:34.768Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=132.118952ms size="[768 768]" time=2026-04-03T12:44:34.771Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 time=2026-04-03T12:44:34.771Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 time=2026-04-03T12:44:34.771Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=137.606036ms shape="[5376 256]" ggml_backend_cuda_buffer_type_alloc_buffer: allocating 19123.94 MiB on device 0: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 20052907008 ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 2: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA2 buffer of size 17815312512 time=2026-04-03T12:44:38.161Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:53[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:34(7..40) ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:19(41..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-03T12:44:38.247Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 time=2026-04-03T12:44:38.269Z level=INFO source=model.go:138 msg="vision: decode" elapsed=557.717µs bounds=(0,0)-(2048,2048) time=2026-04-03T12:44:38.404Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=135.564832ms size="[768 768]" time=2026-04-03T12:44:38.405Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 time=2026-04-03T12:44:38.405Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 time=2026-04-03T12:44:38.406Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=137.729017ms shape="[5376 256]" ggml_backend_cuda_buffer_type_alloc_buffer: allocating 19123.94 MiB on device 0: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 20052907008 ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 2: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA2 buffer of size 17815312512 time=2026-04-03T12:44:41.831Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:52[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:33(8..40) ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:19(41..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-03T12:44:41.912Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 time=2026-04-03T12:44:41.931Z level=INFO source=model.go:138 msg="vision: decode" elapsed=3.485651ms bounds=(0,0)-(2048,2048) time=2026-04-03T12:44:42.064Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=133.56348ms size="[768 768]" time=2026-04-03T12:44:42.069Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 time=2026-04-03T12:44:42.069Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 time=2026-04-03T12:44:42.070Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=142.786767ms shape="[5376 256]" ggml_backend_cuda_buffer_type_alloc_buffer: allocating 19123.94 MiB on device 0: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 20052907008 ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 2: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA2 buffer of size 17815312512 time=2026-04-03T12:44:45.777Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:51[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:32(9..40) ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:19(41..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-03T12:44:45.858Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 time=2026-04-03T12:44:45.873Z level=INFO source=model.go:138 msg="vision: decode" elapsed=969.001µs bounds=(0,0)-(2048,2048) time=2026-04-03T12:44:46.003Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=129.26575ms size="[768 768]" time=2026-04-03T12:44:46.007Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 time=2026-04-03T12:44:46.007Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 time=2026-04-03T12:44:46.008Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=135.525641ms shape="[5376 256]" ggml_backend_cuda_buffer_type_alloc_buffer: allocating 19123.94 MiB on device 0: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 20052907008 ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 2: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA2 buffer of size 17815312512 time=2026-04-03T12:44:49.639Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:50[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:31(10..40) ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:19(41..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-03T12:44:50.046Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 time=2026-04-03T12:44:50.070Z level=INFO source=model.go:138 msg="vision: decode" elapsed=2.302777ms bounds=(0,0)-(2048,2048) time=2026-04-03T12:44:50.203Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=132.114042ms size="[768 768]" time=2026-04-03T12:44:50.204Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 time=2026-04-03T12:44:50.204Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 time=2026-04-03T12:44:50.205Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=136.919907ms shape="[5376 256]" ggml_backend_cuda_buffer_type_alloc_buffer: allocating 19123.94 MiB on device 0: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 20052907008 ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 2: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA2 buffer of size 17815312512 time=2026-04-03T12:44:53.975Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:49[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:30(11..40) ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:19(41..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-03T12:44:54.386Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 time=2026-04-03T12:44:54.408Z level=INFO source=model.go:138 msg="vision: decode" elapsed=700.768µs bounds=(0,0)-(2048,2048) time=2026-04-03T12:44:54.544Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=136.038048ms size="[768 768]" time=2026-04-03T12:44:54.545Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 time=2026-04-03T12:44:54.545Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 time=2026-04-03T12:44:54.546Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=138.302324ms shape="[5376 256]" ggml_backend_cuda_buffer_type_alloc_buffer: allocating 19123.94 MiB on device 0: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 20052907008 ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 2: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA2 buffer of size 17815312512 time=2026-04-03T12:44:58.081Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:48[ID:GPU-a000b5fa-b553-4068-d239-bbebd2e92d97 Layers:30(12..41) ID:GPU-4ccbba1c-2f53-b260-f75e-ad10640a3cc0 Layers:18(42..59)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-04-03T12:44:58.421Z level=INFO source=model.go:97 msg="gemma4: token IDs" image=255999 image_end=258882 audio=256000 audio_end=258883 time=2026-04-03T12:44:58.444Z level=INFO source=model.go:138 msg="vision: decode" elapsed=702.599µs bounds=(0,0)-(2048,2048) time=2026-04-03T12:44:58.585Z level=INFO source=model.go:145 msg="vision: preprocess" elapsed=140.728012ms size="[768 768]" time=2026-04-03T12:44:58.586Z level=INFO source=model.go:148 msg="vision: pixelValues" shape="[768 768 3]" dim0=768 dim1=768 dim2=3 time=2026-04-03T12:44:58.586Z level=INFO source=model.go:152 msg="vision: patches" patchesX=48 patchesY=48 total=2304 patchSize=16 time=2026-04-03T12:44:58.587Z level=INFO source=model.go:156 msg="vision: encoded" elapsed=143.08269ms shape="[5376 256]" ggml_backend_cuda_buffer_type_alloc_buffer: allocating 19123.94 MiB on device 0: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA0 buffer of size 20052907008 ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16990.01 MiB on device 2: cudaMalloc failed: out of memory ggml_gallocr_reserve_n_impl: failed to allocate CUDA2 buffer of size 17815312512 ``` ### OS Docker ### GPU Nvidia ### CPU _No response_ ### Ollama version 0.20.0
GiteaMirror added the bug label 2026-04-29 10:34:54 -05:00
Author
Owner

@stianteien commented on GitHub (Apr 3, 2026):

I do think that the Gemma4:31b eats to much memory at the moment. This it all my 120GB of vRAM now.

Image
<!-- gh-comment-id:4184038990 --> @stianteien commented on GitHub (Apr 3, 2026): I do think that the Gemma4:31b eats to much memory at the moment. This it all my 120GB of vRAM now. <img width="1047" height="425" alt="Image" src="https://github.com/user-attachments/assets/a15573d9-f49a-442c-9ce4-8935b71058ff" />
Author
Owner

@zhoujustin commented on GitHub (Apr 3, 2026):

Both of the Q4 and Q8.

<!-- gh-comment-id:4184326571 --> @zhoujustin commented on GitHub (Apr 3, 2026): Both of the Q4 and Q8.
Author
Owner

@mazphilip commented on GitHub (Apr 4, 2026):

I think its a Flash Attention issue, it gets deactivated (although not sure about the latest rc's) and this means the memory need is quadratic, which hurts particularly if Ollama defaults to 256k context size if you have more than 48Gb VRAM

<!-- gh-comment-id:4187440479 --> @mazphilip commented on GitHub (Apr 4, 2026): I think its a Flash Attention issue, it gets deactivated (although not sure about the latest rc's) and this means the memory need is quadratic, which hurts particularly if Ollama defaults to 256k context size if you have more than 48Gb VRAM
Author
Owner

@phr0gz commented on GitHub (Apr 13, 2026):

I don't have the issue anymore, fixed for me

<!-- gh-comment-id:4240159048 --> @phr0gz commented on GitHub (Apr 13, 2026): I don't have the issue anymore, fixed for me
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#56290