[GH-ISSUE #13029] Vulkan fails to allocate memory buffer #70690

New Issue

GiteaMirror · 2026-05-04T22:35:21-05:00

GiteaMirror commented

2026-05-04 22:35:21 -05:00

Originally created by @NotJustAnna on GitHub (Nov 9, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13029

What is the issue?

Was testing experimental Vulkan support, successfully built Ollama on Windows with Vulkan support, ran Qwen3-VL (4b and 8b) with an arbirtrary image (1381 x 701, 931KB PNG file) + "What is this" prompt.

Vulkan backend tries to allocate a buffer and fails, even though there is enough VRAM for the operation.

Tested Qwen3-VL :4b and :8b on Ollama-For-AMD's fork, Ollama manages to successfully load qwen3-vl:8b with 11~12GB of VRAM. qwen3-vl:4b successfully loaded and ran with 8.4GB or VRAM.

Relevant log output

time=2025-11-09T12:11:53.209-03:00 level=INFO source=server.go:400 msg="starting runner" cmd="C:\\Users\\Anna\\AppData\\Local\\go-build\\5b\\5b1bd64a22e5c8fc29d1155b107660904faf58b2091913877676754333f2f183-d\\ollama.exe runner --ollama-engine --model C:\\Users\\Anna\\.ollama\\models\\blobs\\sha256-9c60bdd691c1897bbfe5ddbc67336848e18c346b7ee2ab8541b135f208e5bb38 --port 57002"
time=2025-11-09T12:11:53.212-03:00 level=INFO source=server.go:653 msg="loading model" "model layers"=37 requested=-1
time=2025-11-09T12:11:53.212-03:00 level=INFO source=server.go:658 msg="system memory" total="31.6 GiB" free="19.4 GiB" free_swap="17.7 GiB"
time=2025-11-09T12:11:53.212-03:00 level=INFO source=server.go:665 msg="gpu memory" id=00000000-0300-0000-0000-000000000000 library=Vulkan available="9.8 GiB" free="10.3 GiB" minimum="457.0 MiB" overhead="0 B"
time=2025-11-09T12:11:53.238-03:00 level=INFO source=runner.go:1349 msg="starting ollama engine"
time=2025-11-09T12:11:53.239-03:00 level=INFO source=runner.go:1384 msg="Server listening on 127.0.0.1:57002"
time=2025-11-09T12:11:53.244-03:00 level=INFO source=runner.go:1222 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:00000000-0300-0000-0000-000000000000 Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-11-09T12:11:53.265-03:00 level=INFO source=ggml.go:136 msg="" architecture=qwen3vl file_type=Q4_K_M name="" description="" num_tensors=809 num_key_values=40
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6750 XT (AMD proprietary driver) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 32768 | int dot: 1 | matrix cores: none
load_backend: loaded Vulkan backend from C:\Users\Anna\CodeProjects\ollama\build\lib\ollama\ggml-vulkan.dll
load_backend: loaded CPU backend from C:\Users\Anna\CodeProjects\ollama\build\lib\ollama\ggml-cpu-icelake.dll
time=2025-11-09T12:11:53.321-03:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
ggml_backend_vk_get_device_memory called: uuid 00000000-0300-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000015027
ggml_dxgi_pdh_init called
DXGI + PDH Initialized. Getting GPU free memory info
[DXGI] Adapter Description: AMD Radeon RX 6750 XT, LUID: 0x0000000000015027, Dedicated: 11.93 GB, Shared: 15.81 GB
[DXGI] Adapter Description: AMD Radeon RX 6750 XT, LUID: 0x000000000001FD6F, Dedicated: 11.93 GB, Shared: 15.81 GB
[DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x0000000000017702, Dedicated: 0.00 GB, Shared: 15.81 GB
Discrete GPU (AMD Radeon RX 6750 XT) with LUID 0x0000000000015027 detected. Dedicated Total: 12809637888.00 bytes (11.93 GB), Dedicated Usage: 1782349824.00 bytes (1.66 GB)
ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 11027288064 total: 12809637888
time=2025-11-09T12:11:53.679-03:00 level=INFO source=runner.go:1222 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:00000000-0300-0000-0000-000000000000 Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0300-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000015027
ggml_dxgi_pdh_init called
DXGI + PDH Initialized. Getting GPU free memory info
[DXGI] Adapter Description: AMD Radeon RX 6750 XT, LUID: 0x0000000000015027, Dedicated: 11.93 GB, Shared: 15.81 GB
[DXGI] Adapter Description: AMD Radeon RX 6750 XT, LUID: 0x000000000001FD6F, Dedicated: 11.93 GB, Shared: 15.81 GB
[DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x0000000000017702, Dedicated: 0.00 GB, Shared: 15.81 GB
Discrete GPU (AMD Radeon RX 6750 XT) with LUID 0x0000000000015027 detected. Dedicated Total: 12809637888.00 bytes (11.93 GB), Dedicated Usage: 1779195904.00 bytes (1.66 GB)
ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 11030441984 total: 12809637888
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: Requested buffer size exceeds device buffer size limit: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4538144160
time=2025-11-09T12:11:54.251-03:00 level=INFO source=server.go:792 msg="model layout did not fit, applying backoff" backoff=0.10
time=2025-11-09T12:11:54.251-03:00 level=INFO source=server.go:792 msg="model layout did not fit, applying backoff" backoff=0.20
time=2025-11-09T12:11:54.251-03:00 level=INFO source=server.go:792 msg="model layout did not fit, applying backoff" backoff=0.30

(BELOW GETS REPEATED 40 TIMES>

ggml_dxgi_pdh_init called
DXGI + PDH Initialized. Getting GPU free memory info
[DXGI] Adapter Description: AMD Radeon RX 6750 XT, LUID: 0x0000000000015027, Dedicated: 11.93 GB, Shared: 15.81 GB
[DXGI] Adapter Description: AMD Radeon RX 6750 XT, LUID: 0x000000000001FD6F, Dedicated: 11.93 GB, Shared: 15.81 GB
[DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x0000000000017702, Dedicated: 0.00 GB, Shared: 15.81 GB
Discrete GPU (AMD Radeon RX 6750 XT) with LUID 0x0000000000015027 detected. Dedicated Total: 12809637888.00 bytes (11.93 GB), Dedicated Usage: 1739702272.00 bytes (1.62 GB)
ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 11069935616 total: 12809637888
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: Requested buffer size exceeds device buffer size limit: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-09T12:12:08.697-03:00 level=INFO source=runner.go:1222 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:8 GPULayers:1[ID:00000000-0300-0000-0000-000000000000 Layers:1(35..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0300-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000015027

(ABOVE GETS REPEATED 40 TIMES)

ggml_dxgi_pdh_init called
DXGI + PDH Initialized. Getting GPU free memory info
[DXGI] Adapter Description: AMD Radeon RX 6750 XT, LUID: 0x0000000000015027, Dedicated: 11.93 GB, Shared: 15.81 GB
[DXGI] Adapter Description: AMD Radeon RX 6750 XT, LUID: 0x000000000001FD6F, Dedicated: 11.93 GB, Shared: 15.81 GB
[DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x0000000000017702, Dedicated: 0.00 GB, Shared: 15.81 GB
Discrete GPU (AMD Radeon RX 6750 XT) with LUID 0x0000000000015027 detected. Dedicated Total: 12809637888.00 bytes (11.93 GB), Dedicated Usage: 1739702272.00 bytes (1.62 GB)
ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 11069935616 total: 12809637888
time=2025-11-09T12:12:09.262-03:00 level=INFO source=runner.go:1222 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-11-09T12:12:09.262-03:00 level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU"
time=2025-11-09T12:12:09.262-03:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
time=2025-11-09T12:12:09.262-03:00 level=INFO source=ggml.go:494 msg="offloaded 0/37 layers to GPU"
time=2025-11-09T12:12:09.262-03:00 level=INFO source=device.go:217 msg="model weights" device=CPU size="3.4 GiB"
time=2025-11-09T12:12:09.262-03:00 level=INFO source=device.go:228 msg="kv cache" device=CPU size="576.0 MiB"
time=2025-11-09T12:12:09.262-03:00 level=INFO source=device.go:239 msg="compute graph" device=CPU size="4.2 GiB"
time=2025-11-09T12:12:09.262-03:00 level=INFO source=device.go:244 msg="total memory" size="8.1 GiB"
time=2025-11-09T12:12:09.262-03:00 level=INFO source=sched.go:500 msg="loaded runners" count=1
time=2025-11-09T12:12:09.262-03:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding"
time=2025-11-09T12:12:09.263-03:00 level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server loading model"
time=2025-11-09T12:12:09.764-03:00 level=INFO source=server.go:1289 msg="llama runner started in 16.55 seconds"

OS

Windows

GPU

AMD

CPU

AMD

Ollama version

main branch (commit e10a3533a5)

Originally created by @NotJustAnna on GitHub (Nov 9, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13029 ### What is the issue? Was testing experimental Vulkan support, successfully built Ollama on Windows with Vulkan support, ran Qwen3-VL (4b and 8b) with an arbirtrary image (1381 x 701, 931KB PNG file) + "What is this" prompt. Vulkan backend tries to allocate a buffer and fails, even though there is enough VRAM for the operation. Tested Qwen3-VL :4b and :8b on Ollama-For-AMD's fork, Ollama manages to successfully load qwen3-vl:8b with 11~12GB of VRAM. qwen3-vl:4b successfully loaded and ran with 8.4GB or VRAM. ### Relevant log output ```shell time=2025-11-09T12:11:53.209-03:00 level=INFO source=server.go:400 msg="starting runner" cmd="C:\\Users\\Anna\\AppData\\Local\\go-build\\5b\\5b1bd64a22e5c8fc29d1155b107660904faf58b2091913877676754333f2f183-d\\ollama.exe runner --ollama-engine --model C:\\Users\\Anna\\.ollama\\models\\blobs\\sha256-9c60bdd691c1897bbfe5ddbc67336848e18c346b7ee2ab8541b135f208e5bb38 --port 57002" time=2025-11-09T12:11:53.212-03:00 level=INFO source=server.go:653 msg="loading model" "model layers"=37 requested=-1 time=2025-11-09T12:11:53.212-03:00 level=INFO source=server.go:658 msg="system memory" total="31.6 GiB" free="19.4 GiB" free_swap="17.7 GiB" time=2025-11-09T12:11:53.212-03:00 level=INFO source=server.go:665 msg="gpu memory" id=00000000-0300-0000-0000-000000000000 library=Vulkan available="9.8 GiB" free="10.3 GiB" minimum="457.0 MiB" overhead="0 B" time=2025-11-09T12:11:53.238-03:00 level=INFO source=runner.go:1349 msg="starting ollama engine" time=2025-11-09T12:11:53.239-03:00 level=INFO source=runner.go:1384 msg="Server listening on 127.0.0.1:57002" time=2025-11-09T12:11:53.244-03:00 level=INFO source=runner.go:1222 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:00000000-0300-0000-0000-000000000000 Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-11-09T12:11:53.265-03:00 level=INFO source=ggml.go:136 msg="" architecture=qwen3vl file_type=Q4_K_M name="" description="" num_tensors=809 num_key_values=40 ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = AMD Radeon RX 6750 XT (AMD proprietary driver) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 32768 | int dot: 1 | matrix cores: none load_backend: loaded Vulkan backend from C:\Users\Anna\CodeProjects\ollama\build\lib\ollama\ggml-vulkan.dll load_backend: loaded CPU backend from C:\Users\Anna\CodeProjects\ollama\build\lib\ollama\ggml-cpu-icelake.dll time=2025-11-09T12:11:53.321-03:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) ggml_backend_vk_get_device_memory called: uuid 00000000-0300-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000015027 ggml_dxgi_pdh_init called DXGI + PDH Initialized. Getting GPU free memory info [DXGI] Adapter Description: AMD Radeon RX 6750 XT, LUID: 0x0000000000015027, Dedicated: 11.93 GB, Shared: 15.81 GB [DXGI] Adapter Description: AMD Radeon RX 6750 XT, LUID: 0x000000000001FD6F, Dedicated: 11.93 GB, Shared: 15.81 GB [DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x0000000000017702, Dedicated: 0.00 GB, Shared: 15.81 GB Discrete GPU (AMD Radeon RX 6750 XT) with LUID 0x0000000000015027 detected. Dedicated Total: 12809637888.00 bytes (11.93 GB), Dedicated Usage: 1782349824.00 bytes (1.66 GB) ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 11027288064 total: 12809637888 time=2025-11-09T12:11:53.679-03:00 level=INFO source=runner.go:1222 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:8 GPULayers:37[ID:00000000-0300-0000-0000-000000000000 Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0300-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000015027 ggml_dxgi_pdh_init called DXGI + PDH Initialized. Getting GPU free memory info [DXGI] Adapter Description: AMD Radeon RX 6750 XT, LUID: 0x0000000000015027, Dedicated: 11.93 GB, Shared: 15.81 GB [DXGI] Adapter Description: AMD Radeon RX 6750 XT, LUID: 0x000000000001FD6F, Dedicated: 11.93 GB, Shared: 15.81 GB [DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x0000000000017702, Dedicated: 0.00 GB, Shared: 15.81 GB Discrete GPU (AMD Radeon RX 6750 XT) with LUID 0x0000000000015027 detected. Dedicated Total: 12809637888.00 bytes (11.93 GB), Dedicated Usage: 1779195904.00 bytes (1.66 GB) ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 11030441984 total: 12809637888 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: Requested buffer size exceeds device buffer size limit: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4538144160 time=2025-11-09T12:11:54.251-03:00 level=INFO source=server.go:792 msg="model layout did not fit, applying backoff" backoff=0.10 time=2025-11-09T12:11:54.251-03:00 level=INFO source=server.go:792 msg="model layout did not fit, applying backoff" backoff=0.20 time=2025-11-09T12:11:54.251-03:00 level=INFO source=server.go:792 msg="model layout did not fit, applying backoff" backoff=0.30 (BELOW GETS REPEATED 40 TIMES> ggml_dxgi_pdh_init called DXGI + PDH Initialized. Getting GPU free memory info [DXGI] Adapter Description: AMD Radeon RX 6750 XT, LUID: 0x0000000000015027, Dedicated: 11.93 GB, Shared: 15.81 GB [DXGI] Adapter Description: AMD Radeon RX 6750 XT, LUID: 0x000000000001FD6F, Dedicated: 11.93 GB, Shared: 15.81 GB [DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x0000000000017702, Dedicated: 0.00 GB, Shared: 15.81 GB Discrete GPU (AMD Radeon RX 6750 XT) with LUID 0x0000000000015027 detected. Dedicated Total: 12809637888.00 bytes (11.93 GB), Dedicated Usage: 1739702272.00 bytes (1.62 GB) ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 11069935616 total: 12809637888 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: Requested buffer size exceeds device buffer size limit: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-09T12:12:08.697-03:00 level=INFO source=runner.go:1222 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:8 GPULayers:1[ID:00000000-0300-0000-0000-000000000000 Layers:1(35..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0300-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000015027 (ABOVE GETS REPEATED 40 TIMES) ggml_dxgi_pdh_init called DXGI + PDH Initialized. Getting GPU free memory info [DXGI] Adapter Description: AMD Radeon RX 6750 XT, LUID: 0x0000000000015027, Dedicated: 11.93 GB, Shared: 15.81 GB [DXGI] Adapter Description: AMD Radeon RX 6750 XT, LUID: 0x000000000001FD6F, Dedicated: 11.93 GB, Shared: 15.81 GB [DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x0000000000017702, Dedicated: 0.00 GB, Shared: 15.81 GB Discrete GPU (AMD Radeon RX 6750 XT) with LUID 0x0000000000015027 detected. Dedicated Total: 12809637888.00 bytes (11.93 GB), Dedicated Usage: 1739702272.00 bytes (1.62 GB) ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 11069935616 total: 12809637888 time=2025-11-09T12:12:09.262-03:00 level=INFO source=runner.go:1222 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:8 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-11-09T12:12:09.262-03:00 level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU" time=2025-11-09T12:12:09.262-03:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU" time=2025-11-09T12:12:09.262-03:00 level=INFO source=ggml.go:494 msg="offloaded 0/37 layers to GPU" time=2025-11-09T12:12:09.262-03:00 level=INFO source=device.go:217 msg="model weights" device=CPU size="3.4 GiB" time=2025-11-09T12:12:09.262-03:00 level=INFO source=device.go:228 msg="kv cache" device=CPU size="576.0 MiB" time=2025-11-09T12:12:09.262-03:00 level=INFO source=device.go:239 msg="compute graph" device=CPU size="4.2 GiB" time=2025-11-09T12:12:09.262-03:00 level=INFO source=device.go:244 msg="total memory" size="8.1 GiB" time=2025-11-09T12:12:09.262-03:00 level=INFO source=sched.go:500 msg="loaded runners" count=1 time=2025-11-09T12:12:09.262-03:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding" time=2025-11-09T12:12:09.263-03:00 level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server loading model" time=2025-11-09T12:12:09.764-03:00 level=INFO source=server.go:1289 msg="llama runner started in 16.55 seconds" ``` ### OS Windows ### GPU AMD ### CPU AMD ### Ollama version main branch (commit e10a3533a5c6c2eb4ab266de1631907bcfd64d73)

GiteaMirror added the vulkan amd bug windows labels 2026-05-04 22:35:24 -05:00

GiteaMirror commented

2026-05-04 22:35:27 -05:00

@MightyPlaza commented on GitHub (Nov 22, 2025):

Same problem on Linux on a rx570 4gb (and on mistral-small3.2:24b)

@MightyPlaza commented on GitHub (Nov 22, 2025): Same problem on Linux on a rx570 4gb (and on `mistral-small3.2:24b`)

GiteaMirror commented

2026-05-04 22:35:27 -05:00

@NotJustAnna commented on GitHub (Nov 22, 2025):

~~Re-ran all the prompts that failed on release v0.13.0 -> ollama-windows-amd64.zip on an AMD GPU with Vulkan enabled.~~

~~Vulkan correctly allocated the memory buffer now.~~

Ran on the wrong ollama.exe

I can reproduce the same problem on v0.13.0

@NotJustAnna commented on GitHub (Nov 22, 2025): ~~Re-ran all the prompts that failed on release v0.13.0 -> ollama-windows-amd64.zip on an AMD GPU with Vulkan enabled.~~ ~~Vulkan correctly allocated the memory buffer now.~~ Ran on the wrong ollama.exe I can reproduce the same problem on v0.13.0

GiteaMirror commented

2026-05-04 22:35:28 -05:00

@NotJustAnna commented on GitHub (Nov 22, 2025):

DeepSeek-OCR runs properly on Ollama 0.13.0 Vulkan-only no ROCm libs.

I also ran qwen2.5vl:7b on this same setup.

Exception 0xc0000005 0x0 0x52 0x7ffdf4feaa95
PC=0x7ffdf4feaa95
signal arrived during external code execution

runtime.cgocall(0x7ff6f48167c0, 0xc00017d568)
        runtime/cgocall.go:167 +0x3e fp=0xc00017d540 sp=0xc00017d4d8 pc=0x7ff6f3ae243e
github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_graph_compute_async(0x1c5cc880d40, 0x1c5c943b0a0)
        _cgo_gotypes.go:947 +0x50 fp=0xc00017d568 sp=0xc00017d540 pc=0x7ff6f3f2f910
github.com/ollama/ollama/ml/backend/ggml.(*Context).ComputeWithNotify.func2(...)
        github.com/ollama/ollama/ml/backend/ggml/ggml.go:825
github.com/ollama/ollama/ml/backend/ggml.(*Context).ComputeWithNotify(0xc000512040, 0x0?, {0xc0000440b0, 0x1, 0xc000044001?})
        github.com/ollama/ollama/ml/backend/ggml/ggml.go:825 +0x1b5 fp=0xc00017d640 sp=0xc00017d568 pc=0x7ff6f3f3e0f5
github.com/ollama/ollama/ml/backend/ggml.(*Context).Compute(0xc000512040?, {0xc0000440b0?, 0x1?, 0x7ff6f3a8a35e?})
        github.com/ollama/ollama/ml/backend/ggml/ggml.go:811 +0x25 fp=0xc00017d678 sp=0xc00017d640 pc=0x7ff6f3f3df05
github.com/ollama/ollama/runner/ollamarunner.multimodalStore.getTensor(0x7ff6f4c0f2e0?, {0x7ff6f4fd9e50, 0xc0000d36b0}, {0x7ff6f4fdeec0, 0xc000512000}, {0x7ff6f4feb728, 0xc0014ffe30}, 0x0)
        github.com/ollama/ollama/runner/ollamarunner/multimodal.go:93 +0x2f4 fp=0xc00017d788 sp=0xc00017d678 pc=0x7ff6f400dd34
github.com/ollama/ollama/runner/ollamarunner.multimodalStore.getMultimodal(0xc000f26d80, {0x7ff6f4fd9e50, 0xc0000d36b0}, {0x7ff6f4fdeec0, 0xc000512000}, {0xc00024eb60, 0x1, 0xc000e0c400?}, 0x0)
        github.com/ollama/ollama/runner/ollamarunner/multimodal.go:56 +0xe5 fp=0xc00017d7f0 sp=0xc00017d788 pc=0x7ff6f400d925
github.com/ollama/ollama/runner/ollamarunner.(*Server).forwardBatch(_, {0x0, {0x7ff6f4fdeec0, 0xc000f08380}, {0x7ff6f4feb728, 0xc008f81f08}, {0xc000120200, 0x16, 0x20}, {{0x7ff6f4feb728, ...}, ...}, ...})
        github.com/ollama/ollama/runner/ollamarunner/runner.go:584 +0x1217 fp=0xc00017db58 sp=0xc00017d7f0 pc=0x7ff6f4010f97
github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc000202d20, {0x7ff6f4fd3b80, 0xc0000e1860})
        github.com/ollama/ollama/runner/ollamarunner/runner.go:452 +0x18c fp=0xc00017dfb8 sp=0xc00017db58 pc=0x7ff6f400fb2c
github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap1()
        github.com/ollama/ollama/runner/ollamarunner/runner.go:1411 +0x28 fp=0xc00017dfe0 sp=0xc00017dfb8 pc=0x7ff6f4019088
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00017dfe8 sp=0xc00017dfe0 pc=0x7ff6f3aed8e1
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
        github.com/ollama/ollama/runner/ollamarunner/runner.go:1411 +0x4c9

goroutine 1 gp=0xc0000021c0 m=nil [IO wait, 1 minutes]:
runtime.gopark(0x7ff6f3aef0e0?, 0x7ff6f595ca00?, 0x20?, 0x94?, 0xc0001394cc?)
        runtime/proc.go:435 +0xce fp=0xc000d85648 sp=0xc000d85628 pc=0x7ff6f3ae598e
runtime.netpollblock(0x23c?, 0xf3a80406?, 0xf6?)
        runtime/netpoll.go:575 +0xf7 fp=0xc000d85680 sp=0xc000d85648 pc=0x7ff6f3aabdf7
internal/poll.runtime_pollWait(0x1c5f1d86470, 0x72)
        runtime/netpoll.go:351 +0x85 fp=0xc000d856a0 sp=0xc000d85680 pc=0x7ff6f3ae4b25
internal/poll.(*pollDesc).wait(0x7ff6f3b7a693?, 0x0?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000d856c8 sp=0xc000d856a0 pc=0x7ff6f3b7bc87
internal/poll.execIO(0xc000139420, 0xc000129770)
        internal/poll/fd_windows.go:177 +0x105 fp=0xc000d85740 sp=0xc000d856c8 pc=0x7ff6f3b7d0e5
internal/poll.(*FD).acceptOne(0xc000139408, 0x250, {0xc00003a1e0?, 0xc0001297d0?, 0x7ff6f3b84da5?}, 0xc000129804?)
        internal/poll/fd_windows.go:946 +0x65 fp=0xc000d857a0 sp=0xc000d85740 pc=0x7ff6f3b81665
internal/poll.(*FD).Accept(0xc000139408, 0xc000d85950)
        internal/poll/fd_windows.go:980 +0x1b6 fp=0xc000d85858 sp=0xc000d857a0 pc=0x7ff6f3b81996
net.(*netFD).accept(0xc000139408)
        net/fd_windows.go:182 +0x4b fp=0xc000d85970 sp=0xc000d85858 pc=0x7ff6f3bf2f0b
net.(*TCPListener).accept(0xc0002c2840)
        net/tcpsock_posix.go:159 +0x1b fp=0xc000d859c0 sp=0xc000d85970 pc=0x7ff6f3c08f5b
net.(*TCPListener).Accept(0xc0002c2840)
        net/tcpsock.go:380 +0x30 fp=0xc000d859f0 sp=0xc000d859c0 pc=0x7ff6f3c07d10
net/http.(*onceCloseListener).Accept(0xc0001523f0?)
        <autogenerated>:1 +0x24 fp=0xc000d85a08 sp=0xc000d859f0 pc=0x7ff6f3e21184
net/http.(*Server).Serve(0xc0001cf600, {0x7ff6f4fd1580, 0xc0002c2840})
        net/http/server.go:3424 +0x30c fp=0xc000d85b38 sp=0xc000d85a08 pc=0x7ff6f3df8a4c
github.com/ollama/ollama/runner/ollamarunner.Execute({0xc0000500b0, 0x4, 0x5})
        github.com/ollama/ollama/runner/ollamarunner/runner.go:1434 +0x94e fp=0xc000d85d08 sp=0xc000d85b38 pc=0x7ff6f4018e0e
github.com/ollama/ollama/runner.Execute({0xc000050090?, 0x0?, 0x0?})
        github.com/ollama/ollama/runner/runner.go:20 +0xc9 fp=0xc000d85d30 sp=0xc000d85d08 pc=0x7ff6f4019709
github.com/ollama/ollama/cmd.NewCLI.func2(0xc0001cf400?, {0x7ff6f4dead9d?, 0x4?, 0x7ff6f4deada1?})
        github.com/ollama/ollama/cmd/cmd.go:1841 +0x45 fp=0xc000d85d58 sp=0xc000d85d30 pc=0x7ff6f47a9145
github.com/spf13/cobra.(*Command).execute(0xc000157508, {0xc0000e17c0, 0x5, 0x5})
        github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc000d85e78 sp=0xc000d85d58 pc=0x7ff6f3c6d9dc
github.com/spf13/cobra.(*Command).ExecuteC(0xc00012e908)
        github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc000d85f30 sp=0xc000d85e78 pc=0x7ff6f3c6e225
github.com/spf13/cobra.(*Command).Execute(...)
        github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
        github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
        github.com/ollama/ollama/main.go:12 +0x4d fp=0xc000d85f50 sp=0xc000d85f30 pc=0x7ff6f47a9c2d
runtime.main()
        runtime/proc.go:283 +0x27d fp=0xc000d85fe0 sp=0xc000d85f50 pc=0x7ff6f3ab4ddd
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000d85fe8 sp=0xc000d85fe0 pc=0x7ff6f3aed8e1

goroutine 2 gp=0xc0000028c0 m=nil [force gc (idle), 1 minutes]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000081fa8 sp=0xc000081f88 pc=0x7ff6f3ae598e
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.forcegchelper()
        runtime/proc.go:348 +0xb8 fp=0xc000081fe0 sp=0xc000081fa8 pc=0x7ff6f3ab50f8
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000081fe8 sp=0xc000081fe0 pc=0x7ff6f3aed8e1
created by runtime.init.7 in goroutine 1
        runtime/proc.go:336 +0x1a

goroutine 3 gp=0xc000002c40 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000083f80 sp=0xc000083f60 pc=0x7ff6f3ae598e
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.bgsweep(0xc00008c000)
        runtime/mgcsweep.go:316 +0xdf fp=0xc000083fc8 sp=0xc000083f80 pc=0x7ff6f3a9debf
runtime.gcenable.gowrap1()
        runtime/mgc.go:204 +0x25 fp=0xc000083fe0 sp=0xc000083fc8 pc=0x7ff6f3a92285
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000083fe8 sp=0xc000083fe0 pc=0x7ff6f3aed8e1
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:204 +0x66

goroutine 4 gp=0xc000002e00 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x4cdeea?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000093f78 sp=0xc000093f58 pc=0x7ff6f3ae598e
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.(*scavengerState).park(0x7ff6f59833c0)
        runtime/mgcscavenge.go:425 +0x49 fp=0xc000093fa8 sp=0xc000093f78 pc=0x7ff6f3a9b909
runtime.bgscavenge(0xc00008c000)
        runtime/mgcscavenge.go:658 +0x59 fp=0xc000093fc8 sp=0xc000093fa8 pc=0x7ff6f3a9be99
runtime.gcenable.gowrap2()
        runtime/mgc.go:205 +0x25 fp=0xc000093fe0 sp=0xc000093fc8 pc=0x7ff6f3a92225
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000093fe8 sp=0xc000093fe0 pc=0x7ff6f3aed8e1
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:205 +0xa5

goroutine 5 gp=0xc000003340 m=nil [finalizer wait, 1 minutes]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000095e30 sp=0xc000095e10 pc=0x7ff6f3ae598e
runtime.runfinq()
        runtime/mfinal.go:196 +0x107 fp=0xc000095fe0 sp=0xc000095e30 pc=0x7ff6f3a91207
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000095fe8 sp=0xc000095fe0 pc=0x7ff6f3aed8e1
created by runtime.createfing in goroutine 1
        runtime/mfinal.go:166 +0x3d

goroutine 6 gp=0xc000003dc0 m=nil [chan receive]:
runtime.gopark(0xc000205680?, 0xc0014d2048?, 0x60?, 0x5f?, 0x7ff6f3bdbe48?)
        runtime/proc.go:435 +0xce fp=0xc000085f18 sp=0xc000085ef8 pc=0x7ff6f3ae598e
runtime.chanrecv(0xc00009a3f0, 0x0, 0x1)
        runtime/chan.go:664 +0x445 fp=0xc000085f90 sp=0xc000085f18 pc=0x7ff6f3a82d45
runtime.chanrecv1(0x7ff6f3ab4f40?, 0xc000085f76?)
        runtime/chan.go:506 +0x12 fp=0xc000085fb8 sp=0xc000085f90 pc=0x7ff6f3a828d2
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
        runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
        runtime/mgc.go:1799 +0x2f fp=0xc000085fe0 sp=0xc000085fb8 pc=0x7ff6f3a954af
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x7ff6f3aed8e1
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
        runtime/mgc.go:1794 +0x85

goroutine 7 gp=0xc0003f8380 m=nil [GC worker (idle), 1 minutes]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00008ff38 sp=0xc00008ff18 pc=0x7ff6f3ae598e
runtime.gcBgMarkWorker(0xc00009b810)
        runtime/mgc.go:1423 +0xe9 fp=0xc00008ffc8 sp=0xc00008ff38 pc=0x7ff6f3a947a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc00008ffe0 sp=0xc00008ffc8 pc=0x7ff6f3a94685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00008ffe8 sp=0xc00008ffe0 pc=0x7ff6f3aed8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 18 gp=0xc0001061c0 m=nil [GC worker (idle)]:
runtime.gopark(0x28af7e0fdff10?, 0x3?, 0x1c?, 0x48?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000113f38 sp=0xc000113f18 pc=0x7ff6f3ae598e
runtime.gcBgMarkWorker(0xc00009b810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000113fc8 sp=0xc000113f38 pc=0x7ff6f3a947a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000113fe0 sp=0xc000113fc8 pc=0x7ff6f3a94685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000113fe8 sp=0xc000113fe0 pc=0x7ff6f3aed8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 34 gp=0xc000484000 m=nil [GC worker (idle)]:
runtime.gopark(0x28af7e0eebec4?, 0x1?, 0xc8?, 0xf3?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00010ff38 sp=0xc00010ff18 pc=0x7ff6f3ae598e
runtime.gcBgMarkWorker(0xc00009b810)
        runtime/mgc.go:1423 +0xe9 fp=0xc00010ffc8 sp=0xc00010ff38 pc=0x7ff6f3a947a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc00010ffe0 sp=0xc00010ffc8 pc=0x7ff6f3a94685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00010ffe8 sp=0xc00010ffe0 pc=0x7ff6f3aed8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 8 gp=0xc0003f8540 m=nil [GC worker (idle)]:
runtime.gopark(0x28af7e0eebec4?, 0x3?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000091f38 sp=0xc000091f18 pc=0x7ff6f3ae598e
runtime.gcBgMarkWorker(0xc00009b810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000091fc8 sp=0xc000091f38 pc=0x7ff6f3a947a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000091fe0 sp=0xc000091fc8 pc=0x7ff6f3a94685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000091fe8 sp=0xc000091fe0 pc=0x7ff6f3aed8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 19 gp=0xc000106380 m=nil [GC worker (idle)]:
runtime.gopark(0x28af7e0eebec4?, 0x3?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000115f38 sp=0xc000115f18 pc=0x7ff6f3ae598e
runtime.gcBgMarkWorker(0xc00009b810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000115fc8 sp=0xc000115f38 pc=0x7ff6f3a947a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000115fe0 sp=0xc000115fc8 pc=0x7ff6f3a94685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000115fe8 sp=0xc000115fe0 pc=0x7ff6f3aed8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 35 gp=0xc0004841c0 m=nil [GC worker (idle)]:
runtime.gopark(0x28af7e0eebec4?, 0x3?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000111f38 sp=0xc000111f18 pc=0x7ff6f3ae598e
runtime.gcBgMarkWorker(0xc00009b810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000111fc8 sp=0xc000111f38 pc=0x7ff6f3a947a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000111fe0 sp=0xc000111fc8 pc=0x7ff6f3a94685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000111fe8 sp=0xc000111fe0 pc=0x7ff6f3aed8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 20 gp=0xc000106540 m=nil [GC worker (idle)]:
runtime.gopark(0x28af7e0eebec4?, 0x3?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00011df38 sp=0xc00011df18 pc=0x7ff6f3ae598e
runtime.gcBgMarkWorker(0xc00009b810)
        runtime/mgc.go:1423 +0xe9 fp=0xc00011dfc8 sp=0xc00011df38 pc=0x7ff6f3a947a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc00011dfe0 sp=0xc00011dfc8 pc=0x7ff6f3a94685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00011dfe8 sp=0xc00011dfe0 pc=0x7ff6f3aed8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 9 gp=0xc0003f8700 m=nil [GC worker (idle)]:
runtime.gopark(0x28af7e0eebec4?, 0x3?, 0xc0?, 0xb0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000119f38 sp=0xc000119f18 pc=0x7ff6f3ae598e
runtime.gcBgMarkWorker(0xc00009b810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000119fc8 sp=0xc000119f38 pc=0x7ff6f3a947a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000119fe0 sp=0xc000119fc8 pc=0x7ff6f3a94685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000119fe8 sp=0xc000119fe0 pc=0x7ff6f3aed8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 36 gp=0xc000484380 m=nil [GC worker (idle)]:
runtime.gopark(0x28af7e0eebec4?, 0x3?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00048bf38 sp=0xc00048bf18 pc=0x7ff6f3ae598e
runtime.gcBgMarkWorker(0xc00009b810)
        runtime/mgc.go:1423 +0xe9 fp=0xc00048bfc8 sp=0xc00048bf38 pc=0x7ff6f3a947a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc00048bfe0 sp=0xc00048bfc8 pc=0x7ff6f3a94685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00048bfe8 sp=0xc00048bfe0 pc=0x7ff6f3aed8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 21 gp=0xc000106700 m=nil [GC worker (idle)]:
runtime.gopark(0x28af7e0fdff10?, 0x3?, 0x98?, 0xf1?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00011ff38 sp=0xc00011ff18 pc=0x7ff6f3ae598e
runtime.gcBgMarkWorker(0xc00009b810)
        runtime/mgc.go:1423 +0xe9 fp=0xc00011ffc8 sp=0xc00011ff38 pc=0x7ff6f3a947a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc00011ffe0 sp=0xc00011ffc8 pc=0x7ff6f3a94685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00011ffe8 sp=0xc00011ffe0 pc=0x7ff6f3aed8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 10 gp=0xc0003f88c0 m=nil [GC worker (idle)]:
runtime.gopark(0x28af7e0eebec4?, 0x3?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00011bf38 sp=0xc00011bf18 pc=0x7ff6f3ae598e
runtime.gcBgMarkWorker(0xc00009b810)
        runtime/mgc.go:1423 +0xe9 fp=0xc00011bfc8 sp=0xc00011bf38 pc=0x7ff6f3a947a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc00011bfe0 sp=0xc00011bfc8 pc=0x7ff6f3a94685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00011bfe8 sp=0xc00011bfe0 pc=0x7ff6f3aed8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 37 gp=0xc000484540 m=nil [GC worker (idle)]:
runtime.gopark(0x28af7e0fdff10?, 0x1?, 0x30?, 0x73?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00048df38 sp=0xc00048df18 pc=0x7ff6f3ae598e
runtime.gcBgMarkWorker(0xc00009b810)
        runtime/mgc.go:1423 +0xe9 fp=0xc00048dfc8 sp=0xc00048df38 pc=0x7ff6f3a947a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc00048dfe0 sp=0xc00048dfc8 pc=0x7ff6f3a94685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00048dfe8 sp=0xc00048dfe0 pc=0x7ff6f3aed8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 22 gp=0xc0001068c0 m=nil [GC worker (idle)]:
runtime.gopark(0x28af7e0fdff10?, 0x3?, 0x68?, 0x88?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000487f38 sp=0xc000487f18 pc=0x7ff6f3ae598e
runtime.gcBgMarkWorker(0xc00009b810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000487fc8 sp=0xc000487f38 pc=0x7ff6f3a947a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000487fe0 sp=0xc000487fc8 pc=0x7ff6f3a94685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000487fe8 sp=0xc000487fe0 pc=0x7ff6f3aed8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 11 gp=0xc0003f8a80 m=nil [GC worker (idle)]:
runtime.gopark(0x28af7e0eebec4?, 0x1?, 0xbc?, 0x68?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000473f38 sp=0xc000473f18 pc=0x7ff6f3ae598e
runtime.gcBgMarkWorker(0xc00009b810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000473fc8 sp=0xc000473f38 pc=0x7ff6f3a947a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000473fe0 sp=0xc000473fc8 pc=0x7ff6f3a94685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000473fe8 sp=0xc000473fe0 pc=0x7ff6f3aed8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 38 gp=0xc000484700 m=nil [GC worker (idle)]:
runtime.gopark(0x28af7e0eebec4?, 0x3?, 0xc0?, 0xb0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00046ff38 sp=0xc00046ff18 pc=0x7ff6f3ae598e
runtime.gcBgMarkWorker(0xc00009b810)
        runtime/mgc.go:1423 +0xe9 fp=0xc00046ffc8 sp=0xc00046ff38 pc=0x7ff6f3a947a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc00046ffe0 sp=0xc00046ffc8 pc=0x7ff6f3a94685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00046ffe8 sp=0xc00046ffe0 pc=0x7ff6f3aed8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 23 gp=0xc000106a80 m=nil [GC worker (idle)]:
runtime.gopark(0x28af7e0eebec4?, 0x1?, 0x1c?, 0x48?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000489f38 sp=0xc000489f18 pc=0x7ff6f3ae598e
runtime.gcBgMarkWorker(0xc00009b810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000489fc8 sp=0xc000489f38 pc=0x7ff6f3a947a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000489fe0 sp=0xc000489fc8 pc=0x7ff6f3a94685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000489fe8 sp=0xc000489fe0 pc=0x7ff6f3aed8e1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 25 gp=0xc0003f9880 m=nil [select]:
runtime.gopark(0xc000049a08?, 0x2?, 0x0?, 0x90?, 0xc00004986c?)
        runtime/proc.go:435 +0xce fp=0xc000049698 sp=0xc000049678 pc=0x7ff6f3ae598e
runtime.selectgo(0xc000049a08, 0xc000049868, 0x2c2?, 0x0, 0x1?, 0x1)
        runtime/select.go:351 +0x837 fp=0xc0000497d0 sp=0xc000049698 pc=0x7ff6f3ac6437
github.com/ollama/ollama/runner/ollamarunner.(*Server).completion(0xc000202d20, {0x7ff6f4fd1730, 0xc0015042a0}, 0xc00051a3c0)
        github.com/ollama/ollama/runner/ollamarunner/runner.go:950 +0xc4e fp=0xc000049ac0 sp=0xc0000497d0 pc=0x7ff6f401424e
github.com/ollama/ollama/runner/ollamarunner.(*Server).completion-fm({0x7ff6f4fd1730?, 0xc0015042a0?}, 0xc000049b40?)
        <autogenerated>:1 +0x36 fp=0xc000049af0 sp=0xc000049ac0 pc=0x7ff6f4019576
net/http.HandlerFunc.ServeHTTP(0xc000170000?, {0x7ff6f4fd1730?, 0xc0015042a0?}, 0xc000049b60?)
        net/http/server.go:2294 +0x29 fp=0xc000049b18 sp=0xc000049af0 pc=0x7ff6f3df5089
net/http.(*ServeMux).ServeHTTP(0x7ff6f3a8b785?, {0x7ff6f4fd1730, 0xc0015042a0}, 0xc00051a3c0)
        net/http/server.go:2822 +0x1c4 fp=0xc000049b68 sp=0xc000049b18 pc=0x7ff6f3df6f84
net/http.serverHandler.ServeHTTP({0x7ff6f4fcdcd0?}, {0x7ff6f4fd1730?, 0xc0015042a0?}, 0x1?)
        net/http/server.go:3301 +0x8e fp=0xc000049b98 sp=0xc000049b68 pc=0x7ff6f3e14a0e
net/http.(*conn).serve(0xc0001523f0, {0x7ff6f4fd3b48, 0xc00016a420})
        net/http/server.go:2102 +0x625 fp=0xc000049fb8 sp=0xc000049b98 pc=0x7ff6f3df3585
net/http.(*Server).Serve.gowrap3()
        net/http/server.go:3454 +0x28 fp=0xc000049fe0 sp=0xc000049fb8 pc=0x7ff6f3df8e48
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000049fe8 sp=0xc000049fe0 pc=0x7ff6f3aed8e1
created by net/http.(*Server).Serve in goroutine 1
        net/http/server.go:3454 +0x485

goroutine 1016 gp=0xc0005068c0 m=nil [IO wait]:
runtime.gopark(0x0?, 0xc0001396a0?, 0x48?, 0x97?, 0xc00013974c?)
        runtime/proc.go:435 +0xce fp=0xc000c9bd58 sp=0xc000c9bd38 pc=0x7ff6f3ae598e
runtime.netpollblock(0x244?, 0xf3a80406?, 0xf6?)
        runtime/netpoll.go:575 +0xf7 fp=0xc000c9bd90 sp=0xc000c9bd58 pc=0x7ff6f3aabdf7
internal/poll.runtime_pollWait(0x1c5f1d86358, 0x72)
        runtime/netpoll.go:351 +0x85 fp=0xc000c9bdb0 sp=0xc000c9bd90 pc=0x7ff6f3ae4b25
internal/poll.(*pollDesc).wait(0x244?, 0x72?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000c9bdd8 sp=0xc000c9bdb0 pc=0x7ff6f3b7bc87
internal/poll.execIO(0xc0001396a0, 0x7ff6f4e62278)
        internal/poll/fd_windows.go:177 +0x105 fp=0xc000c9be50 sp=0xc000c9bdd8 pc=0x7ff6f3b7d0e5
internal/poll.(*FD).Read(0xc000139688, {0xc0000b2041, 0x1, 0x1})
        internal/poll/fd_windows.go:438 +0x29b fp=0xc000c9bef0 sp=0xc000c9be50 pc=0x7ff6f3b7ddbb
net.(*netFD).Read(0xc000139688, {0xc0000b2041?, 0xc00010c098?, 0xc000c9bf70?})
        net/fd_posix.go:55 +0x25 fp=0xc000c9bf38 sp=0xc000c9bef0 pc=0x7ff6f3bf1025
net.(*conn).Read(0xc00007c8a0, {0xc0000b2041?, 0xc000f08340?, 0x7ff6f3e65460?})
        net/net.go:194 +0x45 fp=0xc000c9bf80 sp=0xc000c9bf38 pc=0x7ff6f3c00505
net/http.(*connReader).backgroundRead(0xc0000b2030)
        net/http/server.go:690 +0x37 fp=0xc000c9bfc8 sp=0xc000c9bf80 pc=0x7ff6f3ded457
net/http.(*connReader).startBackgroundRead.gowrap2()
        net/http/server.go:686 +0x25 fp=0xc000c9bfe0 sp=0xc000c9bfc8 pc=0x7ff6f3ded385
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000c9bfe8 sp=0xc000c9bfe0 pc=0x7ff6f3aed8e1
created by net/http.(*connReader).startBackgroundRead in goroutine 25
        net/http/server.go:686 +0xb6

goroutine 217 gp=0xc0015a0700 m=nil [sync.Mutex.Lock]:
runtime.gopark(0x7ff6f5986060?, 0xc000210060?, 0xc0?, 0xc0?, 0x7ff6f3ae3419?)
        runtime/proc.go:435 +0xce fp=0xc000475a88 sp=0xc000475a68 pc=0x7ff6f3ae598e
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.semacquire1(0xc000202e1c, 0x0, 0x3, 0x2, 0x15)
        runtime/sema.go:188 +0x22f fp=0xc000475af0 sp=0xc000475a88 pc=0x7ff6f3ac750f
internal/sync.runtime_SemacquireMutex(0x7ff6f3eabf94?, 0x48?, 0xc000210060?)
        runtime/sema.go:95 +0x25 fp=0xc000475b28 sp=0xc000475af0 pc=0x7ff6f3ae6e65
internal/sync.(*Mutex).lockSlow(0xc000202e18)
        internal/sync/mutex.go:149 +0x15d fp=0xc000475b78 sp=0xc000475b28 pc=0x7ff6f3af981d
internal/sync.(*Mutex).Lock(...)
        internal/sync/mutex.go:70
sync.(*Mutex).Lock(...)
        sync/mutex.go:46
github.com/ollama/ollama/runner/ollamarunner.(*Server).computeBatch(0xc000202d20, {0x0, {0x7ff6f4fdeec0, 0xc000f08380}, {0x7ff6f4feb728, 0xc008f81f08}, {0xc000120200, 0x16, 0x20}, {{0x7ff6f4feb728, ...}, ...}, ...})
        github.com/ollama/ollama/runner/ollamarunner/runner.go:735 +0x972 fp=0xc000475ef0 sp=0xc000475b78 pc=0x7ff6f4012252
github.com/ollama/ollama/runner/ollamarunner.(*Server).run.gowrap1()
        github.com/ollama/ollama/runner/ollamarunner/runner.go:458 +0x58 fp=0xc000475fe0 sp=0xc000475ef0 pc=0x7ff6f400fd58
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000475fe8 sp=0xc000475fe0 pc=0x7ff6f3aed8e1
created by github.com/ollama/ollama/runner/ollamarunner.(*Server).run in goroutine 24
        github.com/ollama/ollama/runner/ollamarunner/runner.go:458 +0x2cd
rax     0x0
rbx     0x3428
rcx     0x7ffdf50f61d9
rdx     0x1c5a061d270
rdi     0xffffffff
rsi     0x0
rbp     0x12747cf140
rsp     0x12747cf040
r8      0x1c5ab5301d0
r9      0x0
r10     0x1c5c645b7b0
r11     0x685
r12     0x1
r13     0x1c5c645b7b0
r14     0x1c5a061d270
r15     0x1c5ab5301d0
rip     0x7ffdf4feaa95
rflags  0x10206
cs      0x33
fs      0x53
gs      0x2b
time=2025-11-21T23:23:14.338-03:00 level=ERROR source=server.go:1539 msg="post predict" error="Post \"http://127.0.0.1:57368/completion\": read tcp 127.0.0.1:57373->127.0.0.1:57368: wsarecv: An existing connection was forcibly closed by the remote host."

EDIT: Thinking now, not sure if this is related to the failed memory buffer TBH.

@NotJustAnna commented on GitHub (Nov 22, 2025): DeepSeek-OCR runs properly on Ollama 0.13.0 Vulkan-only no ROCm libs. I also ran qwen2.5vl:7b on this same setup. ``` Exception 0xc0000005 0x0 0x52 0x7ffdf4feaa95 PC=0x7ffdf4feaa95 signal arrived during external code execution runtime.cgocall(0x7ff6f48167c0, 0xc00017d568) runtime/cgocall.go:167 +0x3e fp=0xc00017d540 sp=0xc00017d4d8 pc=0x7ff6f3ae243e github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_sched_graph_compute_async(0x1c5cc880d40, 0x1c5c943b0a0) _cgo_gotypes.go:947 +0x50 fp=0xc00017d568 sp=0xc00017d540 pc=0x7ff6f3f2f910 github.com/ollama/ollama/ml/backend/ggml.(*Context).ComputeWithNotify.func2(...) github.com/ollama/ollama/ml/backend/ggml/ggml.go:825 github.com/ollama/ollama/ml/backend/ggml.(*Context).ComputeWithNotify(0xc000512040, 0x0?, {0xc0000440b0, 0x1, 0xc000044001?}) github.com/ollama/ollama/ml/backend/ggml/ggml.go:825 +0x1b5 fp=0xc00017d640 sp=0xc00017d568 pc=0x7ff6f3f3e0f5 github.com/ollama/ollama/ml/backend/ggml.(*Context).Compute(0xc000512040?, {0xc0000440b0?, 0x1?, 0x7ff6f3a8a35e?}) github.com/ollama/ollama/ml/backend/ggml/ggml.go:811 +0x25 fp=0xc00017d678 sp=0xc00017d640 pc=0x7ff6f3f3df05 github.com/ollama/ollama/runner/ollamarunner.multimodalStore.getTensor(0x7ff6f4c0f2e0?, {0x7ff6f4fd9e50, 0xc0000d36b0}, {0x7ff6f4fdeec0, 0xc000512000}, {0x7ff6f4feb728, 0xc0014ffe30}, 0x0) github.com/ollama/ollama/runner/ollamarunner/multimodal.go:93 +0x2f4 fp=0xc00017d788 sp=0xc00017d678 pc=0x7ff6f400dd34 github.com/ollama/ollama/runner/ollamarunner.multimodalStore.getMultimodal(0xc000f26d80, {0x7ff6f4fd9e50, 0xc0000d36b0}, {0x7ff6f4fdeec0, 0xc000512000}, {0xc00024eb60, 0x1, 0xc000e0c400?}, 0x0) github.com/ollama/ollama/runner/ollamarunner/multimodal.go:56 +0xe5 fp=0xc00017d7f0 sp=0xc00017d788 pc=0x7ff6f400d925 github.com/ollama/ollama/runner/ollamarunner.(*Server).forwardBatch(_, {0x0, {0x7ff6f4fdeec0, 0xc000f08380}, {0x7ff6f4feb728, 0xc008f81f08}, {0xc000120200, 0x16, 0x20}, {{0x7ff6f4feb728, ...}, ...}, ...}) github.com/ollama/ollama/runner/ollamarunner/runner.go:584 +0x1217 fp=0xc00017db58 sp=0xc00017d7f0 pc=0x7ff6f4010f97 github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0xc000202d20, {0x7ff6f4fd3b80, 0xc0000e1860}) github.com/ollama/ollama/runner/ollamarunner/runner.go:452 +0x18c fp=0xc00017dfb8 sp=0xc00017db58 pc=0x7ff6f400fb2c github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap1() github.com/ollama/ollama/runner/ollamarunner/runner.go:1411 +0x28 fp=0xc00017dfe0 sp=0xc00017dfb8 pc=0x7ff6f4019088 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00017dfe8 sp=0xc00017dfe0 pc=0x7ff6f3aed8e1 created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/ollamarunner/runner.go:1411 +0x4c9 goroutine 1 gp=0xc0000021c0 m=nil [IO wait, 1 minutes]: runtime.gopark(0x7ff6f3aef0e0?, 0x7ff6f595ca00?, 0x20?, 0x94?, 0xc0001394cc?) runtime/proc.go:435 +0xce fp=0xc000d85648 sp=0xc000d85628 pc=0x7ff6f3ae598e runtime.netpollblock(0x23c?, 0xf3a80406?, 0xf6?) runtime/netpoll.go:575 +0xf7 fp=0xc000d85680 sp=0xc000d85648 pc=0x7ff6f3aabdf7 internal/poll.runtime_pollWait(0x1c5f1d86470, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc000d856a0 sp=0xc000d85680 pc=0x7ff6f3ae4b25 internal/poll.(*pollDesc).wait(0x7ff6f3b7a693?, 0x0?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000d856c8 sp=0xc000d856a0 pc=0x7ff6f3b7bc87 internal/poll.execIO(0xc000139420, 0xc000129770) internal/poll/fd_windows.go:177 +0x105 fp=0xc000d85740 sp=0xc000d856c8 pc=0x7ff6f3b7d0e5 internal/poll.(*FD).acceptOne(0xc000139408, 0x250, {0xc00003a1e0?, 0xc0001297d0?, 0x7ff6f3b84da5?}, 0xc000129804?) internal/poll/fd_windows.go:946 +0x65 fp=0xc000d857a0 sp=0xc000d85740 pc=0x7ff6f3b81665 internal/poll.(*FD).Accept(0xc000139408, 0xc000d85950) internal/poll/fd_windows.go:980 +0x1b6 fp=0xc000d85858 sp=0xc000d857a0 pc=0x7ff6f3b81996 net.(*netFD).accept(0xc000139408) net/fd_windows.go:182 +0x4b fp=0xc000d85970 sp=0xc000d85858 pc=0x7ff6f3bf2f0b net.(*TCPListener).accept(0xc0002c2840) net/tcpsock_posix.go:159 +0x1b fp=0xc000d859c0 sp=0xc000d85970 pc=0x7ff6f3c08f5b net.(*TCPListener).Accept(0xc0002c2840) net/tcpsock.go:380 +0x30 fp=0xc000d859f0 sp=0xc000d859c0 pc=0x7ff6f3c07d10 net/http.(*onceCloseListener).Accept(0xc0001523f0?) <autogenerated>:1 +0x24 fp=0xc000d85a08 sp=0xc000d859f0 pc=0x7ff6f3e21184 net/http.(*Server).Serve(0xc0001cf600, {0x7ff6f4fd1580, 0xc0002c2840}) net/http/server.go:3424 +0x30c fp=0xc000d85b38 sp=0xc000d85a08 pc=0x7ff6f3df8a4c github.com/ollama/ollama/runner/ollamarunner.Execute({0xc0000500b0, 0x4, 0x5}) github.com/ollama/ollama/runner/ollamarunner/runner.go:1434 +0x94e fp=0xc000d85d08 sp=0xc000d85b38 pc=0x7ff6f4018e0e github.com/ollama/ollama/runner.Execute({0xc000050090?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:20 +0xc9 fp=0xc000d85d30 sp=0xc000d85d08 pc=0x7ff6f4019709 github.com/ollama/ollama/cmd.NewCLI.func2(0xc0001cf400?, {0x7ff6f4dead9d?, 0x4?, 0x7ff6f4deada1?}) github.com/ollama/ollama/cmd/cmd.go:1841 +0x45 fp=0xc000d85d58 sp=0xc000d85d30 pc=0x7ff6f47a9145 github.com/spf13/cobra.(*Command).execute(0xc000157508, {0xc0000e17c0, 0x5, 0x5}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc000d85e78 sp=0xc000d85d58 pc=0x7ff6f3c6d9dc github.com/spf13/cobra.(*Command).ExecuteC(0xc00012e908) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc000d85f30 sp=0xc000d85e78 pc=0x7ff6f3c6e225 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x4d fp=0xc000d85f50 sp=0xc000d85f30 pc=0x7ff6f47a9c2d runtime.main() runtime/proc.go:283 +0x27d fp=0xc000d85fe0 sp=0xc000d85f50 pc=0x7ff6f3ab4ddd runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000d85fe8 sp=0xc000d85fe0 pc=0x7ff6f3aed8e1 goroutine 2 gp=0xc0000028c0 m=nil [force gc (idle), 1 minutes]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000081fa8 sp=0xc000081f88 pc=0x7ff6f3ae598e runtime.goparkunlock(...) runtime/proc.go:441 runtime.forcegchelper() runtime/proc.go:348 +0xb8 fp=0xc000081fe0 sp=0xc000081fa8 pc=0x7ff6f3ab50f8 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000081fe8 sp=0xc000081fe0 pc=0x7ff6f3aed8e1 created by runtime.init.7 in goroutine 1 runtime/proc.go:336 +0x1a goroutine 3 gp=0xc000002c40 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000083f80 sp=0xc000083f60 pc=0x7ff6f3ae598e runtime.goparkunlock(...) runtime/proc.go:441 runtime.bgsweep(0xc00008c000) runtime/mgcsweep.go:316 +0xdf fp=0xc000083fc8 sp=0xc000083f80 pc=0x7ff6f3a9debf runtime.gcenable.gowrap1() runtime/mgc.go:204 +0x25 fp=0xc000083fe0 sp=0xc000083fc8 pc=0x7ff6f3a92285 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000083fe8 sp=0xc000083fe0 pc=0x7ff6f3aed8e1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:204 +0x66 goroutine 4 gp=0xc000002e00 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x4cdeea?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000093f78 sp=0xc000093f58 pc=0x7ff6f3ae598e runtime.goparkunlock(...) runtime/proc.go:441 runtime.(*scavengerState).park(0x7ff6f59833c0) runtime/mgcscavenge.go:425 +0x49 fp=0xc000093fa8 sp=0xc000093f78 pc=0x7ff6f3a9b909 runtime.bgscavenge(0xc00008c000) runtime/mgcscavenge.go:658 +0x59 fp=0xc000093fc8 sp=0xc000093fa8 pc=0x7ff6f3a9be99 runtime.gcenable.gowrap2() runtime/mgc.go:205 +0x25 fp=0xc000093fe0 sp=0xc000093fc8 pc=0x7ff6f3a92225 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000093fe8 sp=0xc000093fe0 pc=0x7ff6f3aed8e1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:205 +0xa5 goroutine 5 gp=0xc000003340 m=nil [finalizer wait, 1 minutes]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000095e30 sp=0xc000095e10 pc=0x7ff6f3ae598e runtime.runfinq() runtime/mfinal.go:196 +0x107 fp=0xc000095fe0 sp=0xc000095e30 pc=0x7ff6f3a91207 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000095fe8 sp=0xc000095fe0 pc=0x7ff6f3aed8e1 created by runtime.createfing in goroutine 1 runtime/mfinal.go:166 +0x3d goroutine 6 gp=0xc000003dc0 m=nil [chan receive]: runtime.gopark(0xc000205680?, 0xc0014d2048?, 0x60?, 0x5f?, 0x7ff6f3bdbe48?) runtime/proc.go:435 +0xce fp=0xc000085f18 sp=0xc000085ef8 pc=0x7ff6f3ae598e runtime.chanrecv(0xc00009a3f0, 0x0, 0x1) runtime/chan.go:664 +0x445 fp=0xc000085f90 sp=0xc000085f18 pc=0x7ff6f3a82d45 runtime.chanrecv1(0x7ff6f3ab4f40?, 0xc000085f76?) runtime/chan.go:506 +0x12 fp=0xc000085fb8 sp=0xc000085f90 pc=0x7ff6f3a828d2 runtime.unique_runtime_registerUniqueMapCleanup.func2(...) runtime/mgc.go:1796 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() runtime/mgc.go:1799 +0x2f fp=0xc000085fe0 sp=0xc000085fb8 pc=0x7ff6f3a954af runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x7ff6f3aed8e1 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 runtime/mgc.go:1794 +0x85 goroutine 7 gp=0xc0003f8380 m=nil [GC worker (idle), 1 minutes]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00008ff38 sp=0xc00008ff18 pc=0x7ff6f3ae598e runtime.gcBgMarkWorker(0xc00009b810) runtime/mgc.go:1423 +0xe9 fp=0xc00008ffc8 sp=0xc00008ff38 pc=0x7ff6f3a947a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00008ffe0 sp=0xc00008ffc8 pc=0x7ff6f3a94685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00008ffe8 sp=0xc00008ffe0 pc=0x7ff6f3aed8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 18 gp=0xc0001061c0 m=nil [GC worker (idle)]: runtime.gopark(0x28af7e0fdff10?, 0x3?, 0x1c?, 0x48?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000113f38 sp=0xc000113f18 pc=0x7ff6f3ae598e runtime.gcBgMarkWorker(0xc00009b810) runtime/mgc.go:1423 +0xe9 fp=0xc000113fc8 sp=0xc000113f38 pc=0x7ff6f3a947a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000113fe0 sp=0xc000113fc8 pc=0x7ff6f3a94685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000113fe8 sp=0xc000113fe0 pc=0x7ff6f3aed8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 34 gp=0xc000484000 m=nil [GC worker (idle)]: runtime.gopark(0x28af7e0eebec4?, 0x1?, 0xc8?, 0xf3?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00010ff38 sp=0xc00010ff18 pc=0x7ff6f3ae598e runtime.gcBgMarkWorker(0xc00009b810) runtime/mgc.go:1423 +0xe9 fp=0xc00010ffc8 sp=0xc00010ff38 pc=0x7ff6f3a947a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00010ffe0 sp=0xc00010ffc8 pc=0x7ff6f3a94685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00010ffe8 sp=0xc00010ffe0 pc=0x7ff6f3aed8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 8 gp=0xc0003f8540 m=nil [GC worker (idle)]: runtime.gopark(0x28af7e0eebec4?, 0x3?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000091f38 sp=0xc000091f18 pc=0x7ff6f3ae598e runtime.gcBgMarkWorker(0xc00009b810) runtime/mgc.go:1423 +0xe9 fp=0xc000091fc8 sp=0xc000091f38 pc=0x7ff6f3a947a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000091fe0 sp=0xc000091fc8 pc=0x7ff6f3a94685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000091fe8 sp=0xc000091fe0 pc=0x7ff6f3aed8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 19 gp=0xc000106380 m=nil [GC worker (idle)]: runtime.gopark(0x28af7e0eebec4?, 0x3?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000115f38 sp=0xc000115f18 pc=0x7ff6f3ae598e runtime.gcBgMarkWorker(0xc00009b810) runtime/mgc.go:1423 +0xe9 fp=0xc000115fc8 sp=0xc000115f38 pc=0x7ff6f3a947a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000115fe0 sp=0xc000115fc8 pc=0x7ff6f3a94685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000115fe8 sp=0xc000115fe0 pc=0x7ff6f3aed8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 35 gp=0xc0004841c0 m=nil [GC worker (idle)]: runtime.gopark(0x28af7e0eebec4?, 0x3?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000111f38 sp=0xc000111f18 pc=0x7ff6f3ae598e runtime.gcBgMarkWorker(0xc00009b810) runtime/mgc.go:1423 +0xe9 fp=0xc000111fc8 sp=0xc000111f38 pc=0x7ff6f3a947a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000111fe0 sp=0xc000111fc8 pc=0x7ff6f3a94685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000111fe8 sp=0xc000111fe0 pc=0x7ff6f3aed8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 20 gp=0xc000106540 m=nil [GC worker (idle)]: runtime.gopark(0x28af7e0eebec4?, 0x3?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00011df38 sp=0xc00011df18 pc=0x7ff6f3ae598e runtime.gcBgMarkWorker(0xc00009b810) runtime/mgc.go:1423 +0xe9 fp=0xc00011dfc8 sp=0xc00011df38 pc=0x7ff6f3a947a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00011dfe0 sp=0xc00011dfc8 pc=0x7ff6f3a94685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00011dfe8 sp=0xc00011dfe0 pc=0x7ff6f3aed8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 9 gp=0xc0003f8700 m=nil [GC worker (idle)]: runtime.gopark(0x28af7e0eebec4?, 0x3?, 0xc0?, 0xb0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000119f38 sp=0xc000119f18 pc=0x7ff6f3ae598e runtime.gcBgMarkWorker(0xc00009b810) runtime/mgc.go:1423 +0xe9 fp=0xc000119fc8 sp=0xc000119f38 pc=0x7ff6f3a947a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000119fe0 sp=0xc000119fc8 pc=0x7ff6f3a94685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000119fe8 sp=0xc000119fe0 pc=0x7ff6f3aed8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 36 gp=0xc000484380 m=nil [GC worker (idle)]: runtime.gopark(0x28af7e0eebec4?, 0x3?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00048bf38 sp=0xc00048bf18 pc=0x7ff6f3ae598e runtime.gcBgMarkWorker(0xc00009b810) runtime/mgc.go:1423 +0xe9 fp=0xc00048bfc8 sp=0xc00048bf38 pc=0x7ff6f3a947a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00048bfe0 sp=0xc00048bfc8 pc=0x7ff6f3a94685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00048bfe8 sp=0xc00048bfe0 pc=0x7ff6f3aed8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 21 gp=0xc000106700 m=nil [GC worker (idle)]: runtime.gopark(0x28af7e0fdff10?, 0x3?, 0x98?, 0xf1?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00011ff38 sp=0xc00011ff18 pc=0x7ff6f3ae598e runtime.gcBgMarkWorker(0xc00009b810) runtime/mgc.go:1423 +0xe9 fp=0xc00011ffc8 sp=0xc00011ff38 pc=0x7ff6f3a947a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00011ffe0 sp=0xc00011ffc8 pc=0x7ff6f3a94685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00011ffe8 sp=0xc00011ffe0 pc=0x7ff6f3aed8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 10 gp=0xc0003f88c0 m=nil [GC worker (idle)]: runtime.gopark(0x28af7e0eebec4?, 0x3?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00011bf38 sp=0xc00011bf18 pc=0x7ff6f3ae598e runtime.gcBgMarkWorker(0xc00009b810) runtime/mgc.go:1423 +0xe9 fp=0xc00011bfc8 sp=0xc00011bf38 pc=0x7ff6f3a947a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00011bfe0 sp=0xc00011bfc8 pc=0x7ff6f3a94685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00011bfe8 sp=0xc00011bfe0 pc=0x7ff6f3aed8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 37 gp=0xc000484540 m=nil [GC worker (idle)]: runtime.gopark(0x28af7e0fdff10?, 0x1?, 0x30?, 0x73?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00048df38 sp=0xc00048df18 pc=0x7ff6f3ae598e runtime.gcBgMarkWorker(0xc00009b810) runtime/mgc.go:1423 +0xe9 fp=0xc00048dfc8 sp=0xc00048df38 pc=0x7ff6f3a947a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00048dfe0 sp=0xc00048dfc8 pc=0x7ff6f3a94685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00048dfe8 sp=0xc00048dfe0 pc=0x7ff6f3aed8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 22 gp=0xc0001068c0 m=nil [GC worker (idle)]: runtime.gopark(0x28af7e0fdff10?, 0x3?, 0x68?, 0x88?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000487f38 sp=0xc000487f18 pc=0x7ff6f3ae598e runtime.gcBgMarkWorker(0xc00009b810) runtime/mgc.go:1423 +0xe9 fp=0xc000487fc8 sp=0xc000487f38 pc=0x7ff6f3a947a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000487fe0 sp=0xc000487fc8 pc=0x7ff6f3a94685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000487fe8 sp=0xc000487fe0 pc=0x7ff6f3aed8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 11 gp=0xc0003f8a80 m=nil [GC worker (idle)]: runtime.gopark(0x28af7e0eebec4?, 0x1?, 0xbc?, 0x68?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000473f38 sp=0xc000473f18 pc=0x7ff6f3ae598e runtime.gcBgMarkWorker(0xc00009b810) runtime/mgc.go:1423 +0xe9 fp=0xc000473fc8 sp=0xc000473f38 pc=0x7ff6f3a947a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000473fe0 sp=0xc000473fc8 pc=0x7ff6f3a94685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000473fe8 sp=0xc000473fe0 pc=0x7ff6f3aed8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 38 gp=0xc000484700 m=nil [GC worker (idle)]: runtime.gopark(0x28af7e0eebec4?, 0x3?, 0xc0?, 0xb0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00046ff38 sp=0xc00046ff18 pc=0x7ff6f3ae598e runtime.gcBgMarkWorker(0xc00009b810) runtime/mgc.go:1423 +0xe9 fp=0xc00046ffc8 sp=0xc00046ff38 pc=0x7ff6f3a947a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00046ffe0 sp=0xc00046ffc8 pc=0x7ff6f3a94685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00046ffe8 sp=0xc00046ffe0 pc=0x7ff6f3aed8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 23 gp=0xc000106a80 m=nil [GC worker (idle)]: runtime.gopark(0x28af7e0eebec4?, 0x1?, 0x1c?, 0x48?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000489f38 sp=0xc000489f18 pc=0x7ff6f3ae598e runtime.gcBgMarkWorker(0xc00009b810) runtime/mgc.go:1423 +0xe9 fp=0xc000489fc8 sp=0xc000489f38 pc=0x7ff6f3a947a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000489fe0 sp=0xc000489fc8 pc=0x7ff6f3a94685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000489fe8 sp=0xc000489fe0 pc=0x7ff6f3aed8e1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 25 gp=0xc0003f9880 m=nil [select]: runtime.gopark(0xc000049a08?, 0x2?, 0x0?, 0x90?, 0xc00004986c?) runtime/proc.go:435 +0xce fp=0xc000049698 sp=0xc000049678 pc=0x7ff6f3ae598e runtime.selectgo(0xc000049a08, 0xc000049868, 0x2c2?, 0x0, 0x1?, 0x1) runtime/select.go:351 +0x837 fp=0xc0000497d0 sp=0xc000049698 pc=0x7ff6f3ac6437 github.com/ollama/ollama/runner/ollamarunner.(*Server).completion(0xc000202d20, {0x7ff6f4fd1730, 0xc0015042a0}, 0xc00051a3c0) github.com/ollama/ollama/runner/ollamarunner/runner.go:950 +0xc4e fp=0xc000049ac0 sp=0xc0000497d0 pc=0x7ff6f401424e github.com/ollama/ollama/runner/ollamarunner.(*Server).completion-fm({0x7ff6f4fd1730?, 0xc0015042a0?}, 0xc000049b40?) <autogenerated>:1 +0x36 fp=0xc000049af0 sp=0xc000049ac0 pc=0x7ff6f4019576 net/http.HandlerFunc.ServeHTTP(0xc000170000?, {0x7ff6f4fd1730?, 0xc0015042a0?}, 0xc000049b60?) net/http/server.go:2294 +0x29 fp=0xc000049b18 sp=0xc000049af0 pc=0x7ff6f3df5089 net/http.(*ServeMux).ServeHTTP(0x7ff6f3a8b785?, {0x7ff6f4fd1730, 0xc0015042a0}, 0xc00051a3c0) net/http/server.go:2822 +0x1c4 fp=0xc000049b68 sp=0xc000049b18 pc=0x7ff6f3df6f84 net/http.serverHandler.ServeHTTP({0x7ff6f4fcdcd0?}, {0x7ff6f4fd1730?, 0xc0015042a0?}, 0x1?) net/http/server.go:3301 +0x8e fp=0xc000049b98 sp=0xc000049b68 pc=0x7ff6f3e14a0e net/http.(*conn).serve(0xc0001523f0, {0x7ff6f4fd3b48, 0xc00016a420}) net/http/server.go:2102 +0x625 fp=0xc000049fb8 sp=0xc000049b98 pc=0x7ff6f3df3585 net/http.(*Server).Serve.gowrap3() net/http/server.go:3454 +0x28 fp=0xc000049fe0 sp=0xc000049fb8 pc=0x7ff6f3df8e48 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000049fe8 sp=0xc000049fe0 pc=0x7ff6f3aed8e1 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3454 +0x485 goroutine 1016 gp=0xc0005068c0 m=nil [IO wait]: runtime.gopark(0x0?, 0xc0001396a0?, 0x48?, 0x97?, 0xc00013974c?) runtime/proc.go:435 +0xce fp=0xc000c9bd58 sp=0xc000c9bd38 pc=0x7ff6f3ae598e runtime.netpollblock(0x244?, 0xf3a80406?, 0xf6?) runtime/netpoll.go:575 +0xf7 fp=0xc000c9bd90 sp=0xc000c9bd58 pc=0x7ff6f3aabdf7 internal/poll.runtime_pollWait(0x1c5f1d86358, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc000c9bdb0 sp=0xc000c9bd90 pc=0x7ff6f3ae4b25 internal/poll.(*pollDesc).wait(0x244?, 0x72?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000c9bdd8 sp=0xc000c9bdb0 pc=0x7ff6f3b7bc87 internal/poll.execIO(0xc0001396a0, 0x7ff6f4e62278) internal/poll/fd_windows.go:177 +0x105 fp=0xc000c9be50 sp=0xc000c9bdd8 pc=0x7ff6f3b7d0e5 internal/poll.(*FD).Read(0xc000139688, {0xc0000b2041, 0x1, 0x1}) internal/poll/fd_windows.go:438 +0x29b fp=0xc000c9bef0 sp=0xc000c9be50 pc=0x7ff6f3b7ddbb net.(*netFD).Read(0xc000139688, {0xc0000b2041?, 0xc00010c098?, 0xc000c9bf70?}) net/fd_posix.go:55 +0x25 fp=0xc000c9bf38 sp=0xc000c9bef0 pc=0x7ff6f3bf1025 net.(*conn).Read(0xc00007c8a0, {0xc0000b2041?, 0xc000f08340?, 0x7ff6f3e65460?}) net/net.go:194 +0x45 fp=0xc000c9bf80 sp=0xc000c9bf38 pc=0x7ff6f3c00505 net/http.(*connReader).backgroundRead(0xc0000b2030) net/http/server.go:690 +0x37 fp=0xc000c9bfc8 sp=0xc000c9bf80 pc=0x7ff6f3ded457 net/http.(*connReader).startBackgroundRead.gowrap2() net/http/server.go:686 +0x25 fp=0xc000c9bfe0 sp=0xc000c9bfc8 pc=0x7ff6f3ded385 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000c9bfe8 sp=0xc000c9bfe0 pc=0x7ff6f3aed8e1 created by net/http.(*connReader).startBackgroundRead in goroutine 25 net/http/server.go:686 +0xb6 goroutine 217 gp=0xc0015a0700 m=nil [sync.Mutex.Lock]: runtime.gopark(0x7ff6f5986060?, 0xc000210060?, 0xc0?, 0xc0?, 0x7ff6f3ae3419?) runtime/proc.go:435 +0xce fp=0xc000475a88 sp=0xc000475a68 pc=0x7ff6f3ae598e runtime.goparkunlock(...) runtime/proc.go:441 runtime.semacquire1(0xc000202e1c, 0x0, 0x3, 0x2, 0x15) runtime/sema.go:188 +0x22f fp=0xc000475af0 sp=0xc000475a88 pc=0x7ff6f3ac750f internal/sync.runtime_SemacquireMutex(0x7ff6f3eabf94?, 0x48?, 0xc000210060?) runtime/sema.go:95 +0x25 fp=0xc000475b28 sp=0xc000475af0 pc=0x7ff6f3ae6e65 internal/sync.(*Mutex).lockSlow(0xc000202e18) internal/sync/mutex.go:149 +0x15d fp=0xc000475b78 sp=0xc000475b28 pc=0x7ff6f3af981d internal/sync.(*Mutex).Lock(...) internal/sync/mutex.go:70 sync.(*Mutex).Lock(...) sync/mutex.go:46 github.com/ollama/ollama/runner/ollamarunner.(*Server).computeBatch(0xc000202d20, {0x0, {0x7ff6f4fdeec0, 0xc000f08380}, {0x7ff6f4feb728, 0xc008f81f08}, {0xc000120200, 0x16, 0x20}, {{0x7ff6f4feb728, ...}, ...}, ...}) github.com/ollama/ollama/runner/ollamarunner/runner.go:735 +0x972 fp=0xc000475ef0 sp=0xc000475b78 pc=0x7ff6f4012252 github.com/ollama/ollama/runner/ollamarunner.(*Server).run.gowrap1() github.com/ollama/ollama/runner/ollamarunner/runner.go:458 +0x58 fp=0xc000475fe0 sp=0xc000475ef0 pc=0x7ff6f400fd58 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000475fe8 sp=0xc000475fe0 pc=0x7ff6f3aed8e1 created by github.com/ollama/ollama/runner/ollamarunner.(*Server).run in goroutine 24 github.com/ollama/ollama/runner/ollamarunner/runner.go:458 +0x2cd rax 0x0 rbx 0x3428 rcx 0x7ffdf50f61d9 rdx 0x1c5a061d270 rdi 0xffffffff rsi 0x0 rbp 0x12747cf140 rsp 0x12747cf040 r8 0x1c5ab5301d0 r9 0x0 r10 0x1c5c645b7b0 r11 0x685 r12 0x1 r13 0x1c5c645b7b0 r14 0x1c5a061d270 r15 0x1c5ab5301d0 rip 0x7ffdf4feaa95 rflags 0x10206 cs 0x33 fs 0x53 gs 0x2b time=2025-11-21T23:23:14.338-03:00 level=ERROR source=server.go:1539 msg="post predict" error="Post \"http://127.0.0.1:57368/completion\": read tcp 127.0.0.1:57373->127.0.0.1:57368: wsarecv: An existing connection was forcibly closed by the remote host." ``` EDIT: Thinking now, not sure if this is related to the failed memory buffer TBH.

GiteaMirror commented

2026-05-04 22:35:29 -05:00

@MightyPlaza commented on GitHub (Nov 22, 2025):

My log output is bellow.
From the ggml_vulkan: Device memory allocation of size 4199040000 failed. (3.9GiB) seems like it's trying to allocate more memory than it's available.

Running curl localhost:11434/api/generate -d '{"model":"qwen3-vl:4b","options":{"num_gpu":1}}' makes no difference either

Output

time=2025-11-22T02:33:02.240Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 45927"
time=2025-11-22T02:33:02.287Z level=INFO source=sched.go:583 msg="updated VRAM based on existing loaded models" gpu=00000000-0900-0000-0000-000000000000 library=Vulkan total="4.0 GiB" available="3.1 GiB"
time=2025-11-22T02:33:02.338Z level=INFO source=server.go:209 msg="enabling flash attention"
time=2025-11-22T02:33:02.339Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --model /home/user/.ollama/models/blobs/sha256-9c60bdd691c1897bbfe5ddbc67336848e18c346b7ee2ab8541b135f208e5bb38 --port 32795"
time=2025-11-22T02:33:02.339Z level=INFO source=sched.go:443 msg="system memory" total="31.2 GiB" free="6.7 GiB" free_swap="0 B"
time=2025-11-22T02:33:02.339Z level=INFO source=sched.go:450 msg="gpu memory" id=00000000-0900-0000-0000-000000000000 library=Vulkan available="2.6 GiB" free="3.1 GiB" minimum="457.0 MiB" overhead="0 B"
time=2025-11-22T02:33:02.339Z level=INFO source=server.go:702 msg="loading model" "model layers"=37 requested=-1
time=2025-11-22T02:33:02.348Z level=INFO source=runner.go:1398 msg="starting ollama engine"
time=2025-11-22T02:33:02.348Z level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:32795"
time=2025-11-22T02:33:02.350Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:37[ID:00000000-0900-0000-0000-000000000000 Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-11-22T02:33:02.374Z level=INFO source=ggml.go:136 msg="" architecture=qwen3vl file_type=Q4_K_M name="" description="" num_tensors=809 num_key_values=40
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 570 Series (RADV POLARIS10) (radv) | uma: 0 | fp16: 0 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
load_backend: loaded Vulkan backend from /usr/lib/ollama/libggml-vulkan.so
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so
time=2025-11-22T02:33:02.390Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:02.689Z level=INFO source=server.go:974 msg="model requires more memory than is currently available, evicting a model to make space" "loaded layers"=31
time=2025-11-22T02:33:02.689Z level=INFO source=runner.go:1271 msg=load request="{Operation:close LoraPath:[] Parallel:0 BatchSize:0 FlashAttention:false KvSize:0 KvCacheType: NumThreads:0 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-11-22T02:33:02.689Z level=INFO source=device.go:240 msg="model weights" device=Vulkan0 size="3.1 GiB"
time=2025-11-22T02:33:02.689Z level=INFO source=device.go:245 msg="model weights" device=CPU size="304.3 MiB"
time=2025-11-22T02:33:02.689Z level=INFO source=device.go:251 msg="kv cache" device=Vulkan0 size="576.0 MiB"
time=2025-11-22T02:33:02.689Z level=INFO source=device.go:262 msg="compute graph" device=Vulkan0 size="372.5 MiB"
time=2025-11-22T02:33:02.689Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="39.6 MiB"
time=2025-11-22T02:33:02.689Z level=INFO source=device.go:272 msg="total memory" size="4.3 GiB"
time=2025-11-22T02:33:03.436Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 39125"
time=2025-11-22T02:33:03.504Z level=INFO source=sched.go:443 msg="system memory" total="31.2 GiB" free="26.3 GiB" free_swap="0 B"
time=2025-11-22T02:33:03.504Z level=INFO source=sched.go:450 msg="gpu memory" id=00000000-0900-0000-0000-000000000000 library=Vulkan available="2.6 GiB" free="3.1 GiB" minimum="457.0 MiB" overhead="0 B"
time=2025-11-22T02:33:03.504Z level=INFO source=server.go:702 msg="loading model" "model layers"=37 requested=-1
time=2025-11-22T02:33:03.505Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:31[ID:00000000-0900-0000-0000-000000000000 Layers:31(5..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:03.763Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:04.029Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:35[ID:00000000-0900-0000-0000-000000000000 Layers:35(1..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:04.291Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:34[ID:00000000-0900-0000-0000-000000000000 Layers:34(2..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:04.559Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:33[ID:00000000-0900-0000-0000-000000000000 Layers:33(3..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:04.823Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:32[ID:00000000-0900-0000-0000-000000000000 Layers:32(4..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:05.083Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:31[ID:00000000-0900-0000-0000-000000000000 Layers:31(5..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:05.348Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:30[ID:00000000-0900-0000-0000-000000000000 Layers:30(6..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:05.609Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:29[ID:00000000-0900-0000-0000-000000000000 Layers:29(7..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:05.879Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:28[ID:00000000-0900-0000-0000-000000000000 Layers:28(8..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:06.143Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:27[ID:00000000-0900-0000-0000-000000000000 Layers:27(9..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:06.408Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:26[ID:00000000-0900-0000-0000-000000000000 Layers:26(10..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:06.676Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:25[ID:00000000-0900-0000-0000-000000000000 Layers:25(11..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:06.944Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:24[ID:00000000-0900-0000-0000-000000000000 Layers:24(12..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:07.215Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:23[ID:00000000-0900-0000-0000-000000000000 Layers:23(13..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:07.473Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:22[ID:00000000-0900-0000-0000-000000000000 Layers:22(14..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:07.738Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:21[ID:00000000-0900-0000-0000-000000000000 Layers:21(15..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:08.000Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:20[ID:00000000-0900-0000-0000-000000000000 Layers:20(16..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:08.264Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:19[ID:00000000-0900-0000-0000-000000000000 Layers:19(17..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:08.534Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:18[ID:00000000-0900-0000-0000-000000000000 Layers:18(18..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:08.809Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:17[ID:00000000-0900-0000-0000-000000000000 Layers:17(19..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:09.089Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:16[ID:00000000-0900-0000-0000-000000000000 Layers:16(20..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:09.358Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:15[ID:00000000-0900-0000-0000-000000000000 Layers:15(21..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:09.629Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:14[ID:00000000-0900-0000-0000-000000000000 Layers:14(22..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:09.900Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:13[ID:00000000-0900-0000-0000-000000000000 Layers:13(23..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:10.169Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:12[ID:00000000-0900-0000-0000-000000000000 Layers:12(24..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:10.441Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:11[ID:00000000-0900-0000-0000-000000000000 Layers:11(25..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:10.710Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:10[ID:00000000-0900-0000-0000-000000000000 Layers:10(26..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:10.982Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:9[ID:00000000-0900-0000-0000-000000000000 Layers:9(27..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:11.247Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:8[ID:00000000-0900-0000-0000-000000000000 Layers:8(28..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:11.505Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:7[ID:00000000-0900-0000-0000-000000000000 Layers:7(29..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:11.771Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:6[ID:00000000-0900-0000-0000-000000000000 Layers:6(30..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:12.038Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:5[ID:00000000-0900-0000-0000-000000000000 Layers:5(31..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:12.296Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:4[ID:00000000-0900-0000-0000-000000000000 Layers:4(32..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:12.562Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:3[ID:00000000-0900-0000-0000-000000000000 Layers:3(33..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:12.820Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:2[ID:00000000-0900-0000-0000-000000000000 Layers:2(34..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:13.083Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:1[ID:00000000-0900-0000-0000-000000000000 Layers:1(35..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:13.345Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:13.603Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:14.001Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:35[ID:00000000-0900-0000-0000-000000000000 Layers:35(1..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:14.284Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:34[ID:00000000-0900-0000-0000-000000000000 Layers:34(2..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:14.628Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:33[ID:00000000-0900-0000-0000-000000000000 Layers:33(3..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:14.887Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:32[ID:00000000-0900-0000-0000-000000000000 Layers:32(4..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:15.214Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:31[ID:00000000-0900-0000-0000-000000000000 Layers:31(5..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:15.478Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:30[ID:00000000-0900-0000-0000-000000000000 Layers:30(6..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:15.788Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:29[ID:00000000-0900-0000-0000-000000000000 Layers:29(7..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:16.049Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:28[ID:00000000-0900-0000-0000-000000000000 Layers:28(8..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:16.363Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:27[ID:00000000-0900-0000-0000-000000000000 Layers:27(9..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:16.625Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:26[ID:00000000-0900-0000-0000-000000000000 Layers:26(10..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:16.888Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:25[ID:00000000-0900-0000-0000-000000000000 Layers:25(11..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:17.143Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:24[ID:00000000-0900-0000-0000-000000000000 Layers:24(12..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:17.399Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:23[ID:00000000-0900-0000-0000-000000000000 Layers:23(13..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:17.659Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:22[ID:00000000-0900-0000-0000-000000000000 Layers:22(14..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:17.936Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:21[ID:00000000-0900-0000-0000-000000000000 Layers:21(15..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:18.212Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:20[ID:00000000-0900-0000-0000-000000000000 Layers:20(16..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:18.477Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:19[ID:00000000-0900-0000-0000-000000000000 Layers:19(17..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:18.749Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:18[ID:00000000-0900-0000-0000-000000000000 Layers:18(18..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:19.013Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:17[ID:00000000-0900-0000-0000-000000000000 Layers:17(19..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:19.268Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:16[ID:00000000-0900-0000-0000-000000000000 Layers:16(20..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:19.527Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:15[ID:00000000-0900-0000-0000-000000000000 Layers:15(21..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:19.784Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:14[ID:00000000-0900-0000-0000-000000000000 Layers:14(22..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:20.046Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:13[ID:00000000-0900-0000-0000-000000000000 Layers:13(23..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:20.319Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:12[ID:00000000-0900-0000-0000-000000000000 Layers:12(24..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:20.584Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:11[ID:00000000-0900-0000-0000-000000000000 Layers:11(25..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:20.850Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:10[ID:00000000-0900-0000-0000-000000000000 Layers:10(26..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:21.119Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:9[ID:00000000-0900-0000-0000-000000000000 Layers:9(27..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:21.387Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:8[ID:00000000-0900-0000-0000-000000000000 Layers:8(28..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:21.661Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:7[ID:00000000-0900-0000-0000-000000000000 Layers:7(29..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:21.932Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:6[ID:00000000-0900-0000-0000-000000000000 Layers:6(30..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:22.202Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:5[ID:00000000-0900-0000-0000-000000000000 Layers:5(31..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:22.467Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:4[ID:00000000-0900-0000-0000-000000000000 Layers:4(32..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:22.756Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:3[ID:00000000-0900-0000-0000-000000000000 Layers:3(33..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:23.035Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:2[ID:00000000-0900-0000-0000-000000000000 Layers:2(34..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:23.305Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:1[ID:00000000-0900-0000-0000-000000000000 Layers:1(35..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 4199040000 failed.
ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856
time=2025-11-22T02:33:23.567Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-11-22T02:33:23.970Z level=INFO source=runner.go:1271 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-11-22T02:33:23.970Z level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU"
time=2025-11-22T02:33:23.970Z level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
time=2025-11-22T02:33:23.970Z level=INFO source=ggml.go:494 msg="offloaded 0/37 layers to GPU"
time=2025-11-22T02:33:23.970Z level=INFO source=device.go:245 msg="model weights" device=CPU size="3.4 GiB"
time=2025-11-22T02:33:23.970Z level=INFO source=device.go:256 msg="kv cache" device=CPU size="576.0 MiB"
time=2025-11-22T02:33:23.970Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="4.2 GiB"
time=2025-11-22T02:33:23.970Z level=INFO source=device.go:272 msg="total memory" size="8.1 GiB"
time=2025-11-22T02:33:23.970Z level=INFO source=sched.go:517 msg="loaded runners" count=1
time=2025-11-22T02:33:23.970Z level=INFO source=server.go:1294 msg="waiting for llama runner to start responding"
time=2025-11-22T02:33:23.970Z level=INFO source=server.go:1328 msg="waiting for server to become available" status="llm server loading model"
time=2025-11-22T02:33:25.228Z level=INFO source=server.go:1332 msg="llama runner started in 22.89 seconds"

@MightyPlaza commented on GitHub (Nov 22, 2025): My log output is bellow. From the `ggml_vulkan: Device memory allocation of size 4199040000 failed.` (3.9GiB) seems like it's trying to allocate more memory than it's available. Running `curl localhost:11434/api/generate -d '{"model":"qwen3-vl:4b","options":{"num_gpu":1}}'` makes no difference either <details> <summary>Output</summary> time=2025-11-22T02:33:02.240Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 45927" time=2025-11-22T02:33:02.287Z level=INFO source=sched.go:583 msg="updated VRAM based on existing loaded models" gpu=00000000-0900-0000-0000-000000000000 library=Vulkan total="4.0 GiB" available="3.1 GiB" time=2025-11-22T02:33:02.338Z level=INFO source=server.go:209 msg="enabling flash attention" time=2025-11-22T02:33:02.339Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --model /home/user/.ollama/models/blobs/sha256-9c60bdd691c1897bbfe5ddbc67336848e18c346b7ee2ab8541b135f208e5bb38 --port 32795" time=2025-11-22T02:33:02.339Z level=INFO source=sched.go:443 msg="system memory" total="31.2 GiB" free="6.7 GiB" free_swap="0 B" time=2025-11-22T02:33:02.339Z level=INFO source=sched.go:450 msg="gpu memory" id=00000000-0900-0000-0000-000000000000 library=Vulkan available="2.6 GiB" free="3.1 GiB" minimum="457.0 MiB" overhead="0 B" time=2025-11-22T02:33:02.339Z level=INFO source=server.go:702 msg="loading model" "model layers"=37 requested=-1 time=2025-11-22T02:33:02.348Z level=INFO source=runner.go:1398 msg="starting ollama engine" time=2025-11-22T02:33:02.348Z level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:32795" time=2025-11-22T02:33:02.350Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:37[ID:00000000-0900-0000-0000-000000000000 Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-11-22T02:33:02.374Z level=INFO source=ggml.go:136 msg="" architecture=qwen3vl file_type=Q4_K_M name="" description="" num_tensors=809 num_key_values=40 ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = AMD Radeon RX 570 Series (RADV POLARIS10) (radv) | uma: 0 | fp16: 0 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none load_backend: loaded Vulkan backend from /usr/lib/ollama/libggml-vulkan.so load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so time=2025-11-22T02:33:02.390Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:02.689Z level=INFO source=server.go:974 msg="model requires more memory than is currently available, evicting a model to make space" "loaded layers"=31 time=2025-11-22T02:33:02.689Z level=INFO source=runner.go:1271 msg=load request="{Operation:close LoraPath:[] Parallel:0 BatchSize:0 FlashAttention:false KvSize:0 KvCacheType: NumThreads:0 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-11-22T02:33:02.689Z level=INFO source=device.go:240 msg="model weights" device=Vulkan0 size="3.1 GiB" time=2025-11-22T02:33:02.689Z level=INFO source=device.go:245 msg="model weights" device=CPU size="304.3 MiB" time=2025-11-22T02:33:02.689Z level=INFO source=device.go:251 msg="kv cache" device=Vulkan0 size="576.0 MiB" time=2025-11-22T02:33:02.689Z level=INFO source=device.go:262 msg="compute graph" device=Vulkan0 size="372.5 MiB" time=2025-11-22T02:33:02.689Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="39.6 MiB" time=2025-11-22T02:33:02.689Z level=INFO source=device.go:272 msg="total memory" size="4.3 GiB" time=2025-11-22T02:33:03.436Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 39125" time=2025-11-22T02:33:03.504Z level=INFO source=sched.go:443 msg="system memory" total="31.2 GiB" free="26.3 GiB" free_swap="0 B" time=2025-11-22T02:33:03.504Z level=INFO source=sched.go:450 msg="gpu memory" id=00000000-0900-0000-0000-000000000000 library=Vulkan available="2.6 GiB" free="3.1 GiB" minimum="457.0 MiB" overhead="0 B" time=2025-11-22T02:33:03.504Z level=INFO source=server.go:702 msg="loading model" "model layers"=37 requested=-1 time=2025-11-22T02:33:03.505Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:31[ID:00000000-0900-0000-0000-000000000000 Layers:31(5..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:03.763Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:04.029Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:35[ID:00000000-0900-0000-0000-000000000000 Layers:35(1..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:04.291Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:34[ID:00000000-0900-0000-0000-000000000000 Layers:34(2..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:04.559Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:33[ID:00000000-0900-0000-0000-000000000000 Layers:33(3..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:04.823Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:32[ID:00000000-0900-0000-0000-000000000000 Layers:32(4..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:05.083Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:31[ID:00000000-0900-0000-0000-000000000000 Layers:31(5..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:05.348Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:30[ID:00000000-0900-0000-0000-000000000000 Layers:30(6..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:05.609Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:29[ID:00000000-0900-0000-0000-000000000000 Layers:29(7..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:05.879Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:28[ID:00000000-0900-0000-0000-000000000000 Layers:28(8..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:06.143Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:27[ID:00000000-0900-0000-0000-000000000000 Layers:27(9..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:06.408Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:26[ID:00000000-0900-0000-0000-000000000000 Layers:26(10..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:06.676Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:25[ID:00000000-0900-0000-0000-000000000000 Layers:25(11..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:06.944Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:24[ID:00000000-0900-0000-0000-000000000000 Layers:24(12..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:07.215Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:23[ID:00000000-0900-0000-0000-000000000000 Layers:23(13..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:07.473Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:22[ID:00000000-0900-0000-0000-000000000000 Layers:22(14..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:07.738Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:21[ID:00000000-0900-0000-0000-000000000000 Layers:21(15..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:08.000Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:20[ID:00000000-0900-0000-0000-000000000000 Layers:20(16..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:08.264Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:19[ID:00000000-0900-0000-0000-000000000000 Layers:19(17..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:08.534Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:18[ID:00000000-0900-0000-0000-000000000000 Layers:18(18..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:08.809Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:17[ID:00000000-0900-0000-0000-000000000000 Layers:17(19..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:09.089Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:16[ID:00000000-0900-0000-0000-000000000000 Layers:16(20..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:09.358Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:15[ID:00000000-0900-0000-0000-000000000000 Layers:15(21..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:09.629Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:14[ID:00000000-0900-0000-0000-000000000000 Layers:14(22..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:09.900Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:13[ID:00000000-0900-0000-0000-000000000000 Layers:13(23..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:10.169Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:12[ID:00000000-0900-0000-0000-000000000000 Layers:12(24..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:10.441Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:11[ID:00000000-0900-0000-0000-000000000000 Layers:11(25..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:10.710Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:10[ID:00000000-0900-0000-0000-000000000000 Layers:10(26..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:10.982Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:9[ID:00000000-0900-0000-0000-000000000000 Layers:9(27..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:11.247Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:8[ID:00000000-0900-0000-0000-000000000000 Layers:8(28..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:11.505Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:7[ID:00000000-0900-0000-0000-000000000000 Layers:7(29..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:11.771Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:6[ID:00000000-0900-0000-0000-000000000000 Layers:6(30..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:12.038Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:5[ID:00000000-0900-0000-0000-000000000000 Layers:5(31..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:12.296Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:4[ID:00000000-0900-0000-0000-000000000000 Layers:4(32..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:12.562Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:3[ID:00000000-0900-0000-0000-000000000000 Layers:3(33..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:12.820Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:2[ID:00000000-0900-0000-0000-000000000000 Layers:2(34..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:13.083Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:1[ID:00000000-0900-0000-0000-000000000000 Layers:1(35..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:13.345Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:13.603Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:14.001Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:35[ID:00000000-0900-0000-0000-000000000000 Layers:35(1..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:14.284Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:34[ID:00000000-0900-0000-0000-000000000000 Layers:34(2..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:14.628Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:33[ID:00000000-0900-0000-0000-000000000000 Layers:33(3..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:14.887Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:32[ID:00000000-0900-0000-0000-000000000000 Layers:32(4..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:15.214Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:31[ID:00000000-0900-0000-0000-000000000000 Layers:31(5..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:15.478Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:30[ID:00000000-0900-0000-0000-000000000000 Layers:30(6..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:15.788Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:29[ID:00000000-0900-0000-0000-000000000000 Layers:29(7..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:16.049Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:28[ID:00000000-0900-0000-0000-000000000000 Layers:28(8..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:16.363Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:27[ID:00000000-0900-0000-0000-000000000000 Layers:27(9..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:16.625Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:26[ID:00000000-0900-0000-0000-000000000000 Layers:26(10..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:16.888Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:25[ID:00000000-0900-0000-0000-000000000000 Layers:25(11..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:17.143Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:24[ID:00000000-0900-0000-0000-000000000000 Layers:24(12..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:17.399Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:23[ID:00000000-0900-0000-0000-000000000000 Layers:23(13..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:17.659Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:22[ID:00000000-0900-0000-0000-000000000000 Layers:22(14..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:17.936Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:21[ID:00000000-0900-0000-0000-000000000000 Layers:21(15..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:18.212Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:20[ID:00000000-0900-0000-0000-000000000000 Layers:20(16..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:18.477Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:19[ID:00000000-0900-0000-0000-000000000000 Layers:19(17..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:18.749Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:18[ID:00000000-0900-0000-0000-000000000000 Layers:18(18..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:19.013Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:17[ID:00000000-0900-0000-0000-000000000000 Layers:17(19..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:19.268Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:16[ID:00000000-0900-0000-0000-000000000000 Layers:16(20..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:19.527Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:15[ID:00000000-0900-0000-0000-000000000000 Layers:15(21..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:19.784Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:14[ID:00000000-0900-0000-0000-000000000000 Layers:14(22..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:20.046Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:13[ID:00000000-0900-0000-0000-000000000000 Layers:13(23..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:20.319Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:12[ID:00000000-0900-0000-0000-000000000000 Layers:12(24..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:20.584Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:11[ID:00000000-0900-0000-0000-000000000000 Layers:11(25..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:20.850Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:10[ID:00000000-0900-0000-0000-000000000000 Layers:10(26..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:21.119Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:9[ID:00000000-0900-0000-0000-000000000000 Layers:9(27..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:21.387Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:8[ID:00000000-0900-0000-0000-000000000000 Layers:8(28..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:21.661Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:7[ID:00000000-0900-0000-0000-000000000000 Layers:7(29..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:21.932Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:6[ID:00000000-0900-0000-0000-000000000000 Layers:6(30..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:22.202Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:5[ID:00000000-0900-0000-0000-000000000000 Layers:5(31..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:22.467Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:4[ID:00000000-0900-0000-0000-000000000000 Layers:4(32..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:22.756Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:3[ID:00000000-0900-0000-0000-000000000000 Layers:3(33..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:23.035Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:2[ID:00000000-0900-0000-0000-000000000000 Layers:2(34..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:23.305Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:1[ID:00000000-0900-0000-0000-000000000000 Layers:1(35..35)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 4199040000 failed. ggml_vulkan: No suitable memory type found: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 4571325856 time=2025-11-22T02:33:23.567Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-0900-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-11-22T02:33:23.970Z level=INFO source=runner.go:1271 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:6 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-11-22T02:33:23.970Z level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU" time=2025-11-22T02:33:23.970Z level=INFO source=ggml.go:486 msg="offloading output layer to CPU" time=2025-11-22T02:33:23.970Z level=INFO source=ggml.go:494 msg="offloaded 0/37 layers to GPU" time=2025-11-22T02:33:23.970Z level=INFO source=device.go:245 msg="model weights" device=CPU size="3.4 GiB" time=2025-11-22T02:33:23.970Z level=INFO source=device.go:256 msg="kv cache" device=CPU size="576.0 MiB" time=2025-11-22T02:33:23.970Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="4.2 GiB" time=2025-11-22T02:33:23.970Z level=INFO source=device.go:272 msg="total memory" size="8.1 GiB" time=2025-11-22T02:33:23.970Z level=INFO source=sched.go:517 msg="loaded runners" count=1 time=2025-11-22T02:33:23.970Z level=INFO source=server.go:1294 msg="waiting for llama runner to start responding" time=2025-11-22T02:33:23.970Z level=INFO source=server.go:1328 msg="waiting for server to become available" status="llm server loading model" time=2025-11-22T02:33:25.228Z level=INFO source=server.go:1332 msg="llama runner started in 22.89 seconds" </details>

GiteaMirror commented

2026-05-04 22:35:30 -05:00

@jstebbins commented on GitHub (Dec 4, 2025):

I am also seeing this when attempting to run mistral-small3.2:24b-instruct-2506-q8_0. Ollama ends up falling back to running this model on the CPU. I have more than 100GB of available GPU memory and I am able to run larger models successfully on the GPU, e.g. qwen3:30b-a3b-instruct-2507-q8_0 and gpt-oss:120b.

From my web searching, it appears Vulkan has a 4GB per allocation limit and this is being exceeded when running this model.

Relevant log bits:

time=2025-12-04T19:37:06.917Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 40737"
time=2025-12-04T19:37:06.957Z level=WARN source=cpu_linux.go:130 msg="failed to parse CPU allowed micro secs" error="strconv.ParseInt: parsing \"max\": invalid syntax"
time=2025-12-04T19:37:06.991Z level=INFO source=server.go:209 msg="enabling flash attention"
time=2025-12-04T19:37:06.991Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-da17ea1e24268d63d8c5f876acc6a428a389f45e3d0d016ebdaaf3e1aed81c32 --port 35901"
time=2025-12-04T19:37:06.991Z level=INFO source=sched.go:443 msg="system memory" total="125.1 GiB" free="100.8 GiB" free_swap="8.0 GiB"
time=2025-12-04T19:37:06.991Z level=INFO source=sched.go:450 msg="gpu memory" id=00000000-c100-0000-0000-000000000000 library=Vulkan available="118.0 GiB" free="118.5 GiB" minimum="457.0 MiB" overhead="0 B"
time=2025-12-04T19:37:06.991Z level=INFO source=server.go:702 msg="loading model" "model layers"=49 requested=-1
time=2025-12-04T19:37:06.997Z level=INFO source=runner.go:1398 msg="starting ollama engine"
time=2025-12-04T19:37:06.997Z level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:35901"
time=2025-12-04T19:37:07.003Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:4 BatchSize:512 FlashAttention:true KvSize:16384 KvCacheType: NumThreads:16 GPULayers:49[ID:00000000-c100-0000-0000-000000000000 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-12-04T19:37:07.016Z level=INFO source=ggml.go:136 msg="" architecture=qwen3moe file_type=Q8_0 name="Qwen3 30B A3B Instruct 2507" description="" num_tensors=579 num_key_values=33
operator() double registration of ggml_uncaught_exception
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
load_backend: loaded Vulkan backend from /usr/lib/ollama/vulkan/libggml-vulkan.so
time=2025-12-04T19:37:07.037Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
ggml_backend_vk_get_device_memory called: uuid 00000000-c100-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-12-04T19:37:07.058Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:4 BatchSize:512 FlashAttention:true KvSize:16384 KvCacheType: NumThreads:16 GPULayers:49[ID:00000000-c100-0000-0000-000000000000 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-c100-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-12-04T19:37:08.481Z level=INFO source=runner.go:1271 msg=load request="{Operation:commit LoraPath:[] Parallel:4 BatchSize:512 FlashAttention:true KvSize:16384 KvCacheType: NumThreads:16 GPULayers:49[ID:00000000-c100-0000-0000-000000000000 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-12-04T19:37:08.481Z level=INFO source=ggml.go:482 msg="offloading 48 repeating layers to GPU"
time=2025-12-04T19:37:08.481Z level=INFO source=ggml.go:489 msg="offloading output layer to GPU"
time=2025-12-04T19:37:08.481Z level=INFO source=ggml.go:494 msg="offloaded 49/49 layers to GPU"
time=2025-12-04T19:37:08.481Z level=INFO source=device.go:240 msg="model weights" device=Vulkan0 size="29.9 GiB"
time=2025-12-04T19:37:08.481Z level=INFO source=device.go:245 msg="model weights" device=CPU size="315.3 MiB"
time=2025-12-04T19:37:08.481Z level=INFO source=device.go:251 msg="kv cache" device=Vulkan0 size="1.5 GiB"
time=2025-12-04T19:37:08.481Z level=INFO source=device.go:262 msg="compute graph" device=Vulkan0 size="84.0 MiB"
time=2025-12-04T19:37:08.481Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="4.0 MiB"
time=2025-12-04T19:37:08.481Z level=INFO source=device.go:272 msg="total memory" size="31.8 GiB"
time=2025-12-04T19:37:08.481Z level=INFO source=sched.go:517 msg="loaded runners" count=1
time=2025-12-04T19:37:08.481Z level=INFO source=server.go:1294 msg="waiting for llama runner to start responding"
time=2025-12-04T19:37:08.482Z level=INFO source=server.go:1328 msg="waiting for server to become available" status="llm server loading model"
time=2025-12-04T19:37:18.508Z level=INFO source=server.go:1332 msg="llama runner started in 11.52 seconds"
ggml_backend_vk_get_device_memory called: uuid 00000000-c100-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_backend_vk_get_device_memory called: uuid 00000000-c100-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-12-04T19:38:01.496Z level=WARN source=cpu_linux.go:130 msg="failed to parse CPU allowed micro secs" error="strconv.ParseInt: parsing \"max\": invalid syntax"
time=2025-12-04T19:38:01.506Z level=INFO source=sched.go:583 msg="updated VRAM based on existing loaded models" gpu=00000000-c100-0000-0000-000000000000 library=Vulkan total="120.5 GiB" available="89.0 GiB"
time=2025-12-04T19:38:01.541Z level=INFO source=server.go:209 msg="enabling flash attention"
time=2025-12-04T19:38:01.541Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-38a2989af9b8b38289ffb64b804ae1ce6bed23588f66ffeafb015e7113e45771 --port 39643"
time=2025-12-04T19:38:01.542Z level=INFO source=sched.go:443 msg="system memory" total="125.1 GiB" free="100.2 GiB" free_swap="8.0 GiB"
time=2025-12-04T19:38:01.542Z level=INFO source=sched.go:450 msg="gpu memory" id=00000000-c100-0000-0000-000000000000 library=Vulkan available="88.5 GiB" free="89.0 GiB" minimum="457.0 MiB" overhead="0 B"
time=2025-12-04T19:38:01.542Z level=INFO source=server.go:702 msg="loading model" "model layers"=41 requested=-1
time=2025-12-04T19:38:01.549Z level=INFO source=runner.go:1398 msg="starting ollama engine"
time=2025-12-04T19:38:01.549Z level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:39643"
time=2025-12-04T19:38:01.553Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:4 BatchSize:512 FlashAttention:true KvSize:16384 KvCacheType: NumThreads:16 GPULayers:41[ID:00000000-c100-0000-0000-000000000000 Layers:41(0..40)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-12-04T19:38:01.576Z level=INFO source=ggml.go:136 msg="" architecture=mistral3 file_type=Q8_0 name="" description="" num_tensors=585 num_key_values=43
operator() double registration of ggml_uncaught_exception
operator() double registration of ggml_uncaught_exception
operator() double registration of ggml_uncaught_exception
operator() double registration of ggml_uncaught_exception
operator() double registration of ggml_uncaught_exception
operator() double registration of ggml_uncaught_exception
operator() double registration of ggml_uncaught_exception
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
load_backend: loaded Vulkan backend from /usr/lib/ollama/vulkan/libggml-vulkan.so
time=2025-12-04T19:38:01.599Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
ggml_backend_vk_get_device_memory called: uuid 00000000-c100-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
time=2025-12-04T19:38:01.784Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:4 BatchSize:512 FlashAttention:true KvSize:16384 KvCacheType: NumThreads:16 GPULayers:41[ID:00000000-c100-0000-0000-000000000000 Layers:41(0..40)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid 00000000-c100-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000000000
ggml_vulkan: Device memory allocation of size 9370240000 failed.
ggml_vulkan: Requested buffer size exceeds device buffer size limit: ErrorOutOfDeviceMemory
ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 10092640464
time=2025-12-04T19:38:03.225Z level=INFO source=server.go:824 msg="model layout did not fit, applying backoff" backoff=0.10
time=2025-12-04T19:38:03.226Z level=INFO source=server.go:824 msg="model layout did not fit, applying backoff" backoff=0.20
time=2025-12-04T19:38:03.226Z level=INFO source=server.go:824 msg="model layout did not fit, applying backoff" backoff=0.30
time=2025-12-04T19:38:03.226Z level=INFO source=server.go:824 msg="model layout did not fit, applying backoff" backoff=0.40
time=2025-12-04T19:38:03.226Z level=INFO source=server.go:824 msg="model layout did not fit, applying backoff" backoff=0.50
time=2025-12-04T19:38:03.226Z level=INFO source=server.go:824 msg="model layout did not fit, applying backoff" backoff=0.60
time=2025-12-04T19:38:03.226Z level=INFO source=server.go:824 msg="model layout did not fit, applying backoff" backoff=0.70
time=2025-12-04T19:38:03.226Z level=INFO source=server.go:974 msg="model requires more memory than is currently available, evicting a model to make space" "loaded layers"=30
time=2025-12-04T19:38:03.226Z level=INFO source=runner.go:1271 msg=load request="{Operation:close LoraPath:[] Parallel:0 BatchSize:0 FlashAttention:false KvSize:0 KvCacheType: NumThreads:0 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-12-04T19:38:03.226Z level=INFO source=device.go:240 msg="model weights" device=Vulkan0 size="23.5 GiB"
time=2025-12-04T19:38:03.226Z level=INFO source=device.go:245 msg="model weights" device=CPU size="680.0 MiB"
time=2025-12-04T19:38:03.226Z level=INFO source=device.go:262 msg="compute graph" device=Vulkan0 size="9.4 GiB"
time=2025-12-04T19:38:03.226Z level=INFO source=device.go:272 msg="total memory" size="33.5 GiB"

@jstebbins commented on GitHub (Dec 4, 2025): I am also seeing this when attempting to run `mistral-small3.2:24b-instruct-2506-q8_0`. Ollama ends up falling back to running this model on the CPU. I have more than 100GB of available GPU memory and I am able to run larger models successfully on the GPU, e.g. `qwen3:30b-a3b-instruct-2507-q8_0` and `gpt-oss:120b`. From my web searching, it appears Vulkan has a 4GB per allocation limit and this is being exceeded when running this model. Relevant log bits: ``` time=2025-12-04T19:37:06.917Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 40737" time=2025-12-04T19:37:06.957Z level=WARN source=cpu_linux.go:130 msg="failed to parse CPU allowed micro secs" error="strconv.ParseInt: parsing \"max\": invalid syntax" time=2025-12-04T19:37:06.991Z level=INFO source=server.go:209 msg="enabling flash attention" time=2025-12-04T19:37:06.991Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-da17ea1e24268d63d8c5f876acc6a428a389f45e3d0d016ebdaaf3e1aed81c32 --port 35901" time=2025-12-04T19:37:06.991Z level=INFO source=sched.go:443 msg="system memory" total="125.1 GiB" free="100.8 GiB" free_swap="8.0 GiB" time=2025-12-04T19:37:06.991Z level=INFO source=sched.go:450 msg="gpu memory" id=00000000-c100-0000-0000-000000000000 library=Vulkan available="118.0 GiB" free="118.5 GiB" minimum="457.0 MiB" overhead="0 B" time=2025-12-04T19:37:06.991Z level=INFO source=server.go:702 msg="loading model" "model layers"=49 requested=-1 time=2025-12-04T19:37:06.997Z level=INFO source=runner.go:1398 msg="starting ollama engine" time=2025-12-04T19:37:06.997Z level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:35901" time=2025-12-04T19:37:07.003Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:4 BatchSize:512 FlashAttention:true KvSize:16384 KvCacheType: NumThreads:16 GPULayers:49[ID:00000000-c100-0000-0000-000000000000 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-12-04T19:37:07.016Z level=INFO source=ggml.go:136 msg="" architecture=qwen3moe file_type=Q8_0 name="Qwen3 30B A3B Instruct 2507" description="" num_tensors=579 num_key_values=33 operator() double registration of ggml_uncaught_exception load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat load_backend: loaded Vulkan backend from /usr/lib/ollama/vulkan/libggml-vulkan.so time=2025-12-04T19:37:07.037Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) ggml_backend_vk_get_device_memory called: uuid 00000000-c100-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-12-04T19:37:07.058Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:4 BatchSize:512 FlashAttention:true KvSize:16384 KvCacheType: NumThreads:16 GPULayers:49[ID:00000000-c100-0000-0000-000000000000 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-c100-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-12-04T19:37:08.481Z level=INFO source=runner.go:1271 msg=load request="{Operation:commit LoraPath:[] Parallel:4 BatchSize:512 FlashAttention:true KvSize:16384 KvCacheType: NumThreads:16 GPULayers:49[ID:00000000-c100-0000-0000-000000000000 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-12-04T19:37:08.481Z level=INFO source=ggml.go:482 msg="offloading 48 repeating layers to GPU" time=2025-12-04T19:37:08.481Z level=INFO source=ggml.go:489 msg="offloading output layer to GPU" time=2025-12-04T19:37:08.481Z level=INFO source=ggml.go:494 msg="offloaded 49/49 layers to GPU" time=2025-12-04T19:37:08.481Z level=INFO source=device.go:240 msg="model weights" device=Vulkan0 size="29.9 GiB" time=2025-12-04T19:37:08.481Z level=INFO source=device.go:245 msg="model weights" device=CPU size="315.3 MiB" time=2025-12-04T19:37:08.481Z level=INFO source=device.go:251 msg="kv cache" device=Vulkan0 size="1.5 GiB" time=2025-12-04T19:37:08.481Z level=INFO source=device.go:262 msg="compute graph" device=Vulkan0 size="84.0 MiB" time=2025-12-04T19:37:08.481Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="4.0 MiB" time=2025-12-04T19:37:08.481Z level=INFO source=device.go:272 msg="total memory" size="31.8 GiB" time=2025-12-04T19:37:08.481Z level=INFO source=sched.go:517 msg="loaded runners" count=1 time=2025-12-04T19:37:08.481Z level=INFO source=server.go:1294 msg="waiting for llama runner to start responding" time=2025-12-04T19:37:08.482Z level=INFO source=server.go:1328 msg="waiting for server to become available" status="llm server loading model" time=2025-12-04T19:37:18.508Z level=INFO source=server.go:1332 msg="llama runner started in 11.52 seconds" ggml_backend_vk_get_device_memory called: uuid 00000000-c100-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_backend_vk_get_device_memory called: uuid 00000000-c100-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-12-04T19:38:01.496Z level=WARN source=cpu_linux.go:130 msg="failed to parse CPU allowed micro secs" error="strconv.ParseInt: parsing \"max\": invalid syntax" time=2025-12-04T19:38:01.506Z level=INFO source=sched.go:583 msg="updated VRAM based on existing loaded models" gpu=00000000-c100-0000-0000-000000000000 library=Vulkan total="120.5 GiB" available="89.0 GiB" time=2025-12-04T19:38:01.541Z level=INFO source=server.go:209 msg="enabling flash attention" time=2025-12-04T19:38:01.541Z level=INFO source=server.go:392 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/blobs/sha256-38a2989af9b8b38289ffb64b804ae1ce6bed23588f66ffeafb015e7113e45771 --port 39643" time=2025-12-04T19:38:01.542Z level=INFO source=sched.go:443 msg="system memory" total="125.1 GiB" free="100.2 GiB" free_swap="8.0 GiB" time=2025-12-04T19:38:01.542Z level=INFO source=sched.go:450 msg="gpu memory" id=00000000-c100-0000-0000-000000000000 library=Vulkan available="88.5 GiB" free="89.0 GiB" minimum="457.0 MiB" overhead="0 B" time=2025-12-04T19:38:01.542Z level=INFO source=server.go:702 msg="loading model" "model layers"=41 requested=-1 time=2025-12-04T19:38:01.549Z level=INFO source=runner.go:1398 msg="starting ollama engine" time=2025-12-04T19:38:01.549Z level=INFO source=runner.go:1433 msg="Server listening on 127.0.0.1:39643" time=2025-12-04T19:38:01.553Z level=INFO source=runner.go:1271 msg=load request="{Operation:fit LoraPath:[] Parallel:4 BatchSize:512 FlashAttention:true KvSize:16384 KvCacheType: NumThreads:16 GPULayers:41[ID:00000000-c100-0000-0000-000000000000 Layers:41(0..40)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-12-04T19:38:01.576Z level=INFO source=ggml.go:136 msg="" architecture=mistral3 file_type=Q8_0 name="" description="" num_tensors=585 num_key_values=43 operator() double registration of ggml_uncaught_exception operator() double registration of ggml_uncaught_exception operator() double registration of ggml_uncaught_exception operator() double registration of ggml_uncaught_exception operator() double registration of ggml_uncaught_exception operator() double registration of ggml_uncaught_exception operator() double registration of ggml_uncaught_exception load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = Radeon 8060S Graphics (RADV GFX1151) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat load_backend: loaded Vulkan backend from /usr/lib/ollama/vulkan/libggml-vulkan.so time=2025-12-04T19:38:01.599Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) ggml_backend_vk_get_device_memory called: uuid 00000000-c100-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 time=2025-12-04T19:38:01.784Z level=INFO source=runner.go:1271 msg=load request="{Operation:alloc LoraPath:[] Parallel:4 BatchSize:512 FlashAttention:true KvSize:16384 KvCacheType: NumThreads:16 GPULayers:41[ID:00000000-c100-0000-0000-000000000000 Layers:41(0..40)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" ggml_backend_vk_get_device_memory called: uuid 00000000-c100-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000000000 ggml_vulkan: Device memory allocation of size 9370240000 failed. ggml_vulkan: Requested buffer size exceeds device buffer size limit: ErrorOutOfDeviceMemory ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 10092640464 time=2025-12-04T19:38:03.225Z level=INFO source=server.go:824 msg="model layout did not fit, applying backoff" backoff=0.10 time=2025-12-04T19:38:03.226Z level=INFO source=server.go:824 msg="model layout did not fit, applying backoff" backoff=0.20 time=2025-12-04T19:38:03.226Z level=INFO source=server.go:824 msg="model layout did not fit, applying backoff" backoff=0.30 time=2025-12-04T19:38:03.226Z level=INFO source=server.go:824 msg="model layout did not fit, applying backoff" backoff=0.40 time=2025-12-04T19:38:03.226Z level=INFO source=server.go:824 msg="model layout did not fit, applying backoff" backoff=0.50 time=2025-12-04T19:38:03.226Z level=INFO source=server.go:824 msg="model layout did not fit, applying backoff" backoff=0.60 time=2025-12-04T19:38:03.226Z level=INFO source=server.go:824 msg="model layout did not fit, applying backoff" backoff=0.70 time=2025-12-04T19:38:03.226Z level=INFO source=server.go:974 msg="model requires more memory than is currently available, evicting a model to make space" "loaded layers"=30 time=2025-12-04T19:38:03.226Z level=INFO source=runner.go:1271 msg=load request="{Operation:close LoraPath:[] Parallel:0 BatchSize:0 FlashAttention:false KvSize:0 KvCacheType: NumThreads:0 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2025-12-04T19:38:03.226Z level=INFO source=device.go:240 msg="model weights" device=Vulkan0 size="23.5 GiB" time=2025-12-04T19:38:03.226Z level=INFO source=device.go:245 msg="model weights" device=CPU size="680.0 MiB" time=2025-12-04T19:38:03.226Z level=INFO source=device.go:262 msg="compute graph" device=Vulkan0 size="9.4 GiB" time=2025-12-04T19:38:03.226Z level=INFO source=device.go:272 msg="total memory" size="33.5 GiB" ```

GiteaMirror commented

2026-05-04 22:35:31 -05:00

@jstebbins commented on GitHub (Dec 5, 2025):

I just updated from 0.13.2-rc0 to 0.13.2-rc1 and the allocation error is no longer happening. I don't see anything in the commit log that specifically addresses this, but something changed that has "solved" my problem.

@jstebbins commented on GitHub (Dec 5, 2025): I just updated from 0.13.2-rc0 to 0.13.2-rc1 and the allocation error is no longer happening. I don't see anything in the commit log that specifically addresses this, but something changed that has "solved" my problem.

GiteaMirror commented

2026-05-04 22:35:33 -05:00

@MightyPlaza commented on GitHub (Dec 11, 2025):

fixed for me too on the 0.13.2 release as well

@MightyPlaza commented on GitHub (Dec 11, 2025): fixed for me too on the 0.13.2 release as well

GiteaMirror commented

2026-05-04 22:35:36 -05:00

@rick-github commented on GitHub (Mar 19, 2026):

@NotJustAnna Is this still an issue?

@rick-github commented on GitHub (Mar 19, 2026): @NotJustAnna Is this still an issue?

Sign in to join this conversation.

Branches Tags

main

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-remove-claude-desktop-launch

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#70690