[GH-ISSUE #9416] Granite 3.2 vision seems to run on CPU with ROCm on 0.5.13 #68196

Closed
opened 2026-05-04 12:48:59 -05:00 by GiteaMirror · 7 comments
Owner

Originally created by @ProjectMoon on GitHub (Feb 28, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/9416

Updated this to reference the specific problem of the vision part of Granite 3.2 seeming to use the CPU, rather than the GPU, on 0.5.13.

What is the issue?

After upgrading to 0.5.13-rc1, I have noticed that ROCm fails to actually run. When the model is loaded, it loads onto the GPU (confirmed via rocm-smi), but when trying to chat, it seems to reload the model and I guess uses the CPU? This causes my computer to lock up too (but that might just be RAM thrashing).

In the log output, you can see it first loading on to ROCm, and then it reloads the model when the chat endpoint is called, and that seems to skip the GPU for some reason.

Downgrading back to 0.5.12 works perfectly. I am not using the system ROCm as far as I know. I always untar the ROCm package from ollama when upgrading.

Debug Logs Logs:
2025/02/28 12:20:14 routes.go:1215: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE:q8_0 OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/ollama OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:2 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-02-28T12:20:14.283+01:00 level=INFO source=images.go:432 msg="total blobs: 134"
time=2025-02-28T12:20:14.285+01:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0"
time=2025-02-28T12:20:14.286+01:00 level=INFO source=routes.go:1281 msg="Listening on [::]:11434 (version 0.5.13-rc1)"
time=2025-02-28T12:20:14.286+01:00 level=DEBUG source=sched.go:106 msg="starting llm scheduler"
time=2025-02-28T12:20:14.286+01:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-02-28T12:20:14.287+01:00 level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA"
time=2025-02-28T12:20:14.288+01:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=libcuda.so*
time=2025-02-28T12:20:14.288+01:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[/opt/ollama/lib/ollama/libcuda.so* /libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2025-02-28T12:20:14.310+01:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths="[/usr/lib/libcuda.so.570.86.16 /usr/lib64/libcuda.so.570.86.16]"
initializing /usr/lib/libcuda.so.570.86.16
library /usr/lib/libcuda.so.570.86.16 load err: /usr/lib/libcuda.so.570.86.16: wrong ELF class: ELFCLASS32
time=2025-02-28T12:20:14.310+01:00 level=DEBUG source=gpu.go:609 msg="skipping 32bit library" library=/usr/lib/libcuda.so.570.86.16
initializing /usr/lib64/libcuda.so.570.86.16
dlsym: cuInit - 0x7f0bc7d0de00
dlsym: cuDriverGetVersion - 0x7f0bc7d0de20
dlsym: cuDeviceGetCount - 0x7f0bc7d0de60
dlsym: cuDeviceGet - 0x7f0bc7d0de40
dlsym: cuDeviceGetAttribute - 0x7f0bc7d0df40
dlsym: cuDeviceGetUuid - 0x7f0bc7d0dea0                                                        dlsym: cuDeviceGetName - 0x7f0bc7d0de80
dlsym: cuCtxCreate_v3 - 0x7f0bc7d0e120                                                         dlsym: cuMemGetInfo_v2 - 0x7f0bc7d0e8a0
dlsym: cuCtxDestroy - 0x7f0bc7d6c9f0
calling cuInit
calling cuDriverGetVersion                                                                     raw version 0x2f30
CUDA driver version: 12.8                                                                      calling cuDeviceGetCount
device count 1
time=2025-02-28T12:20:14.334+01:00 level=DEBUG source=gpu.go:125 msg="detected GPUs" count=1 library=/usr/lib64/libcuda.so.570.86.16
[GPU-64fa45ff-fe00-d712-1796-ed74da57bfa7] CUDA totalMem 4030 mb                               [GPU-64fa45ff-fe00-d712-1796-ed74da57bfa7] CUDA freeMem 870 mb
[GPU-64fa45ff-fe00-d712-1796-ed74da57bfa7] Compute Capability 5.2                              time=2025-02-28T12:20:14.419+01:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2025-02-28T12:20:14.419+01:00 level=DEBUG source=amd_linux.go:101 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2025-02-28T12:20:14.419+01:00 level=DEBUG source=amd_linux.go:121 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2025-02-28T12:20:14.419+01:00 level=DEBUG source=amd_linux.go:101 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2025-02-28T12:20:14.419+01:00 level=DEBUG source=amd_linux.go:206 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=29631 unique_id=10870137312548343375                                                                 time=2025-02-28T12:20:14.420+01:00 level=DEBUG source=amd_linux.go:240 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card0/device                      time=2025-02-28T12:20:14.420+01:00 level=DEBUG source=amd_linux.go:318 msg="amdgpu memory" gpu=0 total="16.0 GiB"
time=2025-02-28T12:20:14.420+01:00 level=DEBUG source=amd_linux.go:319 msg="amdgpu memory" gpu=0 available="15.5 GiB"
time=2025-02-28T12:20:14.420+01:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/ollama/lib/ollama/rocm"
time=2025-02-28T12:20:14.420+01:00 level=DEBUG source=amd_common.go:44 msg="detected ROCM next to ollama executable /opt/ollama/lib/ollama/rocm"
time=2025-02-28T12:20:14.427+01:00 level=DEBUG source=amd_linux.go:371 msg="rocm supported GPUs" types="[gfx1010 gfx1012 gfx1030 gfx1100 gfx1101 gfx1102 gfx1151 gfx1200 gfx1201 gfx900 gfx906 gfx908 gfx90a gfx942]"
time=2025-02-28T12:20:14.427+01:00 level=INFO source=amd_linux.go:386 msg="amdgpu is supported" gpu=GPU-96da7c4b1629ce4f gpu_type=gfx1030
releasing cuda driver library
time=2025-02-28T12:20:14.427+01:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-64fa45ff-fe00-d712-1796-ed74da57bfa7 library=cuda variant=v12 compute=5.2 driver=12.8 name="NVIDIA GeForce GTX 970" total="3.9 GiB" available="870.0 MiB"
time=2025-02-28T12:20:14.427+01:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-96da7c4b1629ce4f library=rocm variant="" compute=gfx1030 driver=0.0 name=1002:73bf total="16.0 GiB" available="15.5 GiB"
[GIN] 2025/02/28 - 12:20:16 | 200 |    5.866593ms |       127.0.0.1 | HEAD     "/"
[GIN] 2025/02/28 - 12:20:16 | 200 |   28.764586ms |       127.0.0.1 | POST     "/api/show"
time=2025-02-28T12:20:16.515+01:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="62.7 GiB" before.free="32.8 GiB" before.free_swap="1.9 MiB" now.total="62.7 GiB" now.free="32.8 GiB" now.free_swap="1.9 MiB"
initializing /usr/lib64/libcuda.so.570.86.16                                                   dlsym: cuInit - 0x7f0bc7d0de00                                                                 dlsym: cuDriverGetVersion - 0x7f0bc7d0de20
dlsym: cuDeviceGetCount - 0x7f0bc7d0de60                                                       dlsym: cuDeviceGet - 0x7f0bc7d0de40
dlsym: cuDeviceGetAttribute - 0x7f0bc7d0df40                                                   dlsym: cuDeviceGetUuid - 0x7f0bc7d0dea0
dlsym: cuDeviceGetName - 0x7f0bc7d0de80                                                        dlsym: cuCtxCreate_v3 - 0x7f0bc7d0e120
dlsym: cuMemGetInfo_v2 - 0x7f0bc7d0e8a0                                                        dlsym: cuCtxDestroy - 0x7f0bc7d6c9f0
calling cuInit                                                                                 calling cuDriverGetVersion
raw version 0x2f30                                                                             CUDA driver version: 12.8
calling cuDeviceGetCount
device count 1
time=2025-02-28T12:20:16.583+01:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-64fa45ff-fe00-d712-1796-ed74da57bfa7 name="NVIDIA GeForce GTX 970" overhead="0 B" before.total="3.9 GiB" before.free="870.0 MiB" now.total="3.9 GiB" now.free="870.0 MiB" now.used="3.1 GiB"                                                                                       time=2025-02-28T12:20:16.584+01:00 level=DEBUG source=amd_linux.go:488 msg="updating rocm free memory" gpu=GPU-96da7c4b1629ce4f name=1002:73bf before="15.5 GiB" now="15.5 GiB"               releasing cuda driver library
time=2025-02-28T12:20:16.584+01:00 level=DEBUG source=sched.go:182 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=6 gpu_count=2
time=2025-02-28T12:20:16.620+01:00 level=DEBUG source=sched.go:225 msg="loading first model" model=/ollama/blobs/sha256-bbfb685133c274407d565c65b1ca806eb1593482b1c9d8524596797b24123862
time=2025-02-28T12:20:16.620+01:00 level=DEBUG source=memory.go:108 msg=evaluating library=cuda gpu_count=1 available="[870.0 MiB]"
time=2025-02-28T12:20:16.621+01:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="62.7 GiB" before.free="32.8 GiB" before.free_swap="1.9 MiB" now.total="62.7 GiB" now.free="32.8 GiB" now.free_swap="1.9 MiB"                                                initializing /usr/lib64/libcuda.so.570.86.16
dlsym: cuInit - 0x7f0bc7d0de00                                                                 dlsym: cuDriverGetVersion - 0x7f0bc7d0de20
dlsym: cuDeviceGetCount - 0x7f0bc7d0de60                                                       dlsym: cuDeviceGet - 0x7f0bc7d0de40
dlsym: cuDeviceGetAttribute - 0x7f0bc7d0df40                                                   dlsym: cuDeviceGetUuid - 0x7f0bc7d0dea0
dlsym: cuDeviceGetName - 0x7f0bc7d0de80
dlsym: cuCtxCreate_v3 - 0x7f0bc7d0e120                                                         dlsym: cuMemGetInfo_v2 - 0x7f0bc7d0e8a0
dlsym: cuCtxDestroy - 0x7f0bc7d6c9f0
calling cuInit
calling cuDriverGetVersion
raw version 0x2f30
CUDA driver version: 12.8
calling cuDeviceGetCount
device count 1
time=2025-02-28T12:20:16.685+01:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-64fa45ff-fe00-d712-1796-ed74da57bfa7 name="NVIDIA GeForce GTX 970" overhead="0 B" before.total="3.9 GiB" before.free="870.0 MiB" now.total="3.9 GiB" now.free="870.0 MiB" now.used="3.1 GiB"
time=2025-02-28T12:20:16.685+01:00 level=DEBUG source=amd_linux.go:488 msg="updating rocm free memory" gpu=GPU-96da7c4b1629ce4f name=1002:73bf before="15.5 GiB" now="15.5 GiB"
releasing cuda driver library
time=2025-02-28T12:20:16.685+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.key_length default=128
time=2025-02-28T12:20:16.685+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.value_length default=128
time=2025-02-28T12:20:16.685+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.key_length default=128
time=2025-02-28T12:20:16.685+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.value_length default=128
time=2025-02-28T12:20:16.686+01:00 level=DEBUG source=memory.go:185 msg="gpu has too little memory to allocate any layers" id=GPU-64fa45ff-fe00-d712-1796-ed74da57bfa7 library=cuda variant=v12 compute=5.2 driver=12.8 name="NVIDIA GeForce GTX 970" total="3.9 GiB" available="870.0 MiB" minimum_memory=479199232 layer_size="229.2 MiB" gpu_zer_overhead="0 B" partial_offload="2.0 GiB" full_offload="1.6 GiB"
time=2025-02-28T12:20:16.686+01:00 level=DEBUG source=memory.go:329 msg="insufficient VRAM to load any model layers"
time=2025-02-28T12:20:16.686+01:00 level=DEBUG source=memory.go:108 msg=evaluating library=cuda gpu_count=1 available="[870.0 MiB]"
time=2025-02-28T12:20:16.686+01:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="62.7 GiB" before.free="32.8 GiB" before.free_swap="1.9 MiB" now.total="62.7 GiB" now.free="32.8 GiB" now.free_swap="1.9 MiB"
initializing /usr/lib64/libcuda.so.570.86.16
dlsym: cuInit - 0x7f0bc7d0de00
dlsym: cuDriverGetVersion - 0x7f0bc7d0de20
dlsym: cuDeviceGetCount - 0x7f0bc7d0de60
dlsym: cuDeviceGet - 0x7f0bc7d0de40
dlsym: cuDeviceGetAttribute - 0x7f0bc7d0df40
dlsym: cuDeviceGetUuid - 0x7f0bc7d0dea0
dlsym: cuDeviceGetName - 0x7f0bc7d0de80
dlsym: cuCtxCreate_v3 - 0x7f0bc7d0e120
dlsym: cuMemGetInfo_v2 - 0x7f0bc7d0e8a0
dlsym: cuCtxDestroy - 0x7f0bc7d6c9f0
calling cuInit
calling cuDriverGetVersion
raw version 0x2f30
CUDA driver version: 12.8
calling cuDeviceGetCount
device count 1
time=2025-02-28T12:20:16.762+01:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-64fa45ff-fe00-d712-1796-ed74da57bfa7 name="NVIDIA GeForce GTX 970" overhead="0 B" before.total="3.9 GiB" before.free="870.0 MiB" now.total="3.9 GiB" now.free="870.0 MiB" now.used="3.1 GiB"
time=2025-02-28T12:20:16.762+01:00 level=DEBUG source=amd_linux.go:488 msg="updating rocm free memory" gpu=GPU-96da7c4b1629ce4f name=1002:73bf before="15.5 GiB" now="15.5 GiB"
releasing cuda driver library
time=2025-02-28T12:20:16.762+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.key_length default=128
time=2025-02-28T12:20:16.762+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.value_length default=128
time=2025-02-28T12:20:16.762+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.key_length default=128
time=2025-02-28T12:20:16.762+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.value_length default=128
time=2025-02-28T12:20:16.763+01:00 level=DEBUG source=memory.go:185 msg="gpu has too little memory to allocate any layers" id=GPU-64fa45ff-fe00-d712-1796-ed74da57bfa7 library=cuda variant=v12 compute=5.2 driver=12.8 name="NVIDIA GeForce GTX 970" total="3.9 GiB" available="870.0 MiB" minimum_memory=479199232 layer_size="229.2 MiB" gpu_zer_overhead="0 B" partial_offload="2.0 GiB" full_offload="1.6 GiB"
time=2025-02-28T12:20:16.763+01:00 level=DEBUG source=memory.go:329 msg="insufficient VRAM to load any model layers"
time=2025-02-28T12:20:16.763+01:00 level=DEBUG source=memory.go:108 msg=evaluating library=rocm gpu_count=1 available="[15.5 GiB]"
time=2025-02-28T12:20:16.763+01:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="62.7 GiB" before.free="32.8 GiB" before.free_swap="1.9 MiB" now.total="62.7 GiB" now.free="32.7 GiB" now.free_swap="1.9 MiB"
initializing /usr/lib64/libcuda.so.570.86.16
dlsym: cuInit - 0x7f0bc7d0de00
dlsym: cuDriverGetVersion - 0x7f0bc7d0de20
dlsym: cuDeviceGetCount - 0x7f0bc7d0de60
dlsym: cuDeviceGet - 0x7f0bc7d0de40
dlsym: cuDeviceGetAttribute - 0x7f0bc7d0df40
dlsym: cuDeviceGetUuid - 0x7f0bc7d0dea0
dlsym: cuDeviceGetName - 0x7f0bc7d0de80
dlsym: cuCtxCreate_v3 - 0x7f0bc7d0e120
dlsym: cuMemGetInfo_v2 - 0x7f0bc7d0e8a0
dlsym: cuCtxDestroy - 0x7f0bc7d6c9f0
calling cuInit
calling cuDriverGetVersion
raw version 0x2f30
CUDA driver version: 12.8
calling cuDeviceGetCount
device count 1
time=2025-02-28T12:20:16.829+01:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-64fa45ff-fe00-d712-1796-ed74da57bfa7 name="NVIDIA GeForce GTX 970" overhead="0 B" before.total="3.9 GiB" before.free="870.0 MiB" now.total="3.9 GiB" now.free="870.0 MiB" now.used="3.1 GiB"
time=2025-02-28T12:20:16.829+01:00 level=DEBUG source=amd_linux.go:488 msg="updating rocm free memory" gpu=GPU-96da7c4b1629ce4f name=1002:73bf before="15.5 GiB" now="15.5 GiB"
releasing cuda driver library
time=2025-02-28T12:20:16.829+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.key_length default=128
time=2025-02-28T12:20:16.829+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.value_length default=128
time=2025-02-28T12:20:16.829+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.key_length default=128
time=2025-02-28T12:20:16.829+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.value_length default=128
time=2025-02-28T12:20:16.830+01:00 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/ollama/blobs/sha256-bbfb685133c274407d565c65b1ca806eb1593482b1c9d8524596797b24123862 gpu=GPU-96da7c4b1629ce4f parallel=2 available=16604295168 required="13.4 GiB"
time=2025-02-28T12:20:16.830+01:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="62.7 GiB" before.free="32.7 GiB" before.free_swap="1.9 MiB" now.total="62.7 GiB" now.free="32.8 GiB" now.free_swap="1.9 MiB"
initializing /usr/lib64/libcuda.so.570.86.16
dlsym: cuInit - 0x7f0bc7d0de00
dlsym: cuDriverGetVersion - 0x7f0bc7d0de20
dlsym: cuDeviceGetCount - 0x7f0bc7d0de60
dlsym: cuDeviceGet - 0x7f0bc7d0de40
dlsym: cuDeviceGetAttribute - 0x7f0bc7d0df40
dlsym: cuDeviceGetUuid - 0x7f0bc7d0dea0
dlsym: cuDeviceGetName - 0x7f0bc7d0de80
dlsym: cuCtxCreate_v3 - 0x7f0bc7d0e120
dlsym: cuMemGetInfo_v2 - 0x7f0bc7d0e8a0
dlsym: cuCtxDestroy - 0x7f0bc7d6c9f0
calling cuInit
calling cuDriverGetVersion
raw version 0x2f30
CUDA driver version: 12.8
calling cuDeviceGetCount
device count 1
time=2025-02-28T12:20:16.894+01:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-64fa45ff-fe00-d712-1796-ed74da57bfa7 name="NVIDIA GeForce GTX 970" overhead="0 B" before.total="3.9 GiB" before.free="870.0 MiB" now.total="3.9 GiB" now.free="870.0 MiB" now.used="3.1 GiB"
time=2025-02-28T12:20:16.894+01:00 level=DEBUG source=amd_linux.go:488 msg="updating rocm free memory" gpu=GPU-96da7c4b1629ce4f name=1002:73bf before="15.5 GiB" now="15.5 GiB"
releasing cuda driver library
time=2025-02-28T12:20:16.894+01:00 level=INFO source=server.go:97 msg="system memory" total="62.7 GiB" free="32.8 GiB" free_swap="1.9 MiB"
time=2025-02-28T12:20:16.894+01:00 level=DEBUG source=memory.go:108 msg=evaluating library=rocm gpu_count=1 available="[15.5 GiB]"
time=2025-02-28T12:20:16.895+01:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="62.7 GiB" before.free="32.8 GiB" before.free_swap="1.9 MiB" now.total="62.7 GiB" now.free="32.8 GiB" now.free_swap="1.9 MiB"
initializing /usr/lib64/libcuda.so.570.86.16
dlsym: cuInit - 0x7f0bc7d0de00
dlsym: cuDriverGetVersion - 0x7f0bc7d0de20
dlsym: cuDeviceGetCount - 0x7f0bc7d0de60
dlsym: cuDeviceGet - 0x7f0bc7d0de40
dlsym: cuDeviceGetAttribute - 0x7f0bc7d0df40
dlsym: cuDeviceGetUuid - 0x7f0bc7d0dea0
dlsym: cuDeviceGetName - 0x7f0bc7d0de80
dlsym: cuCtxCreate_v3 - 0x7f0bc7d0e120
dlsym: cuMemGetInfo_v2 - 0x7f0bc7d0e8a0
dlsym: cuCtxDestroy - 0x7f0bc7d6c9f0
calling cuInit
calling cuDriverGetVersion
raw version 0x2f30
CUDA driver version: 12.8
calling cuDeviceGetCount
device count 1
time=2025-02-28T12:20:16.958+01:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-64fa45ff-fe00-d712-1796-ed74da57bfa7 name="NVIDIA GeForce GTX 970" overhead="0 B" before.total="3.9 GiB" before.free="870.0 MiB" now.total="3.9 GiB" now.free="870.0 MiB" now.used="3.1 GiB"
time=2025-02-28T12:20:16.958+01:00 level=DEBUG source=amd_linux.go:488 msg="updating rocm free memory" gpu=GPU-96da7c4b1629ce4f name=1002:73bf before="15.5 GiB" now="15.5 GiB"
releasing cuda driver library
time=2025-02-28T12:20:16.958+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.key_length default=128
time=2025-02-28T12:20:16.958+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.value_length default=128
time=2025-02-28T12:20:16.958+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.key_length default=128
time=2025-02-28T12:20:16.958+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.value_length default=128
time=2025-02-28T12:20:16.959+01:00 level=INFO source=server.go:130 msg=offload library=rocm layers.requested=-1 layers.model=49 layers.offload=49 layers.split="" memory.available="[15.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="13.4 GiB" memory.required.partial="13.4 GiB" memory.required.kv="1.8 GiB" memory.required.allocations="[13.4 GiB]" memory.weights.total="10.5 GiB" memory.weights.repeating="9.9 GiB" memory.weights.nonrepeating="609.1 MiB" memory.graph.full="1.6 GiB" memory.graph.partial="2.0 GiB"
time=2025-02-28T12:20:16.959+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.key_length default=128
time=2025-02-28T12:20:16.959+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.value_length default=128
time=2025-02-28T12:20:16.959+01:00 level=INFO source=server.go:182 msg="enabling flash attention"
time=2025-02-28T12:20:16.959+01:00 level=DEBUG source=server.go:259 msg="compatible gpu libraries" compatible=[rocm]
time=2025-02-28T12:20:16.959+01:00 level=DEBUG source=server.go:302 msg="adding gpu library" path=/opt/ollama/lib/ollama/rocm
time=2025-02-28T12:20:16.959+01:00 level=DEBUG source=server.go:310 msg="adding gpu dependency paths" paths=[/opt/ollama/lib/ollama/rocm]
time=2025-02-28T12:20:16.959+01:00 level=INFO source=server.go:380 msg="starting llama server" cmd="/opt/ollama/bin/ollama runner --model /ollama/blobs/sha256-bbfb685133c274407d565c65b1ca806eb1593482b1c9d8524596797b24123862 --ctx-size 20000 --batch-size 512 --n-gpu-layers 49 --verbose --threads 6 --flash-attn --kv-cache-type q8_0 --parallel 2 --port 34681"
time=2025-02-28T12:20:16.959+01:00 level=DEBUG source=server.go:398 msg=subprocess environment="[PATH=/opt/ollama/bin:/bin:/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/opt/bin:/usr/lib/llvm/19/bin:/usr/lib/llvm/18/bin:/etc/eselect/wine/bin:/opt/cuda/bin LD_LIBRARY_PATH=/opt/ollama/lib/ollama/rocm:/opt/ollama/lib/ollama/rocm:/opt/ollama/lib/ollama ROCR_VISIBLE_DEVICES=GPU-96da7c4b1629ce4f]"
time=2025-02-28T12:20:16.960+01:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-02-28T12:20:16.960+01:00 level=INFO source=server.go:557 msg="waiting for llama runner to start responding"
time=2025-02-28T12:20:16.960+01:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error"
time=2025-02-28T12:20:16.986+01:00 level=INFO source=runner.go:931 msg="starting go runner"
time=2025-02-28T12:20:16.986+01:00 level=DEBUG source=ggml.go:84 msg="ggml backend load all from path" path=/opt/ollama/lib/ollama/rocm
/opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon Graphics, gfx1030 (0x1030), VMM: yes, Wave Size: 32
load_backend: loaded ROCm backend from /opt/ollama/lib/ollama/rocm/libggml-hip.so
time=2025-02-28T12:20:19.366+01:00 level=DEBUG source=ggml.go:84 msg="ggml backend load all from path" path=/opt/ollama/lib/ollama
ggml_backend_load_best: /opt/ollama/lib/ollama/libggml-cpu-alderlake.so score: 0
ggml_backend_load_best: /opt/ollama/lib/ollama/libggml-cpu-haswell.so score: 55
ggml_backend_load_best: /opt/ollama/lib/ollama/libggml-cpu-icelake.so score: 0
ggml_backend_load_best: /opt/ollama/lib/ollama/libggml-cpu-skylakex.so score: 0
ggml_backend_load_best: /opt/ollama/lib/ollama/libggml-cpu-sandybridge.so score: 20
load_backend: loaded CPU backend from /opt/ollama/lib/ollama/libggml-cpu-haswell.so
time=2025-02-28T12:20:19.375+01:00 level=INFO source=runner.go:934 msg=system info="CPU : LLAMAFILE = 1 | ROCm : PEER_MAX_BATCH_SIZE = 128 | CPU : LLAMAFILE = 1 | cgo(gcc)" threads=6
time=2025-02-28T12:20:19.375+01:00 level=INFO source=runner.go:991 msg="Server listening on 127.0.0.1:34681"
llama_model_load_from_file_impl: using device ROCm0 (AMD Radeon Graphics) - 15966 MiB free
llama_model_loader: loaded meta data with 43 key-value pairs and 579 tensors from /ollama/blobs/sha256-bbfb685133c274407d565c65b1ca806eb1593482b1c9d8524596797b24123862 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 14B
llama_model_loader: - kv   3:                       general.organization str              = Qwen
llama_model_loader: - kv   4:                           general.basename str              = Qwen2.5
llama_model_loader: - kv   5:                         general.size_label str              = 14B
llama_model_loader: - kv   6:                   general.base_model.count u32              = 3
llama_model_loader: - kv   7:                  general.base_model.0.name str              = Qwamma 14b Merge v1
llama_model_loader: - kv   8:               general.base_model.0.version str              = v1
llama_model_loader: - kv   9:          general.base_model.0.organization str              = Chargoddard
llama_model_loader: - kv  10:              general.base_model.0.repo_url str              = https://huggingface.co/chargoddard/qw...
llama_model_loader: - kv  11:                  general.base_model.1.name str              = Qwen2.5 14B Instruct_arcee Qwen2 14B ...
llama_model_loader: - kv  12:               general.base_model.1.version str              = v0.2
llama_model_loader: - kv  13:          general.base_model.1.organization str              = Arcee Train
llama_model_loader: - kv  14:              general.base_model.1.repo_url str              = https://huggingface.co/arcee-train/Qw...
llama_model_loader: - kv  15:                  general.base_model.2.name str              = Qwen2.5 14B
llama_model_loader: - kv  16:          general.base_model.2.organization str              = Qwen
llama_model_loader: - kv  17:              general.base_model.2.repo_url str              = https://huggingface.co/Qwen/Qwen2.5-14B
llama_model_loader: - kv  18:                               general.tags arr[str,2]       = ["mergekit", "merge"]
llama_model_loader: - kv  19:                          qwen2.block_count u32              = 48
llama_model_loader: - kv  20:                       qwen2.context_length u32              = 131072
llama_model_loader: - kv  21:                     qwen2.embedding_length u32              = 5120
llama_model_loader: - kv  22:                  qwen2.feed_forward_length u32              = 13824
llama_model_loader: - kv  23:                 qwen2.attention.head_count u32              = 40
llama_model_loader: - kv  24:              qwen2.attention.head_count_kv u32              = 8
llama_model_loader: - kv  25:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  26:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  27:                          general.file_type u32              = 17
llama_model_loader: - kv  28:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  29:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  30:                      tokenizer.ggml.tokens arr[str,152064]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  31:                  tokenizer.ggml.token_type arr[i32,152064]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
time=2025-02-28T12:20:19.474+01:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: - kv  32:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  33:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  34:            tokenizer.ggml.padding_token_id u32              = 151643
llama_model_loader: - kv  35:                tokenizer.ggml.bos_token_id u32              = 151643
llama_model_loader: - kv  36:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  37:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
llama_model_loader: - kv  38:               general.quantization_version u32              = 2
llama_model_loader: - kv  39:                      quantize.imatrix.file str              = /models_out/SuperNova-14B-GGUF/SuperN...
llama_model_loader: - kv  40:                   quantize.imatrix.dataset str              = /training_dir/calibration_datav3.txt
llama_model_loader: - kv  41:             quantize.imatrix.entries_count i32              = 336
llama_model_loader: - kv  42:              quantize.imatrix.chunks_count i32              = 128
llama_model_loader: - type  f32:  241 tensors
llama_model_loader: - type q5_K:  289 tensors
llama_model_loader: - type q6_K:   49 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q5_K - Medium
print_info: file size   = 9.78 GiB (5.69 BPW)
init_tokenizer: initializing tokenizer for type 2
load: control token: 151660 '<|fim_middle|>' is not marked as EOG
load: control token: 151659 '<|fim_prefix|>' is not marked as EOG
load: control token: 151653 '<|vision_end|>' is not marked as EOG
load: control token: 151648 '<|box_start|>' is not marked as EOG
load: control token: 151646 '<|object_ref_start|>' is not marked as EOG
load: control token: 151649 '<|box_end|>' is not marked as EOG
load: control token: 151655 '<|image_pad|>' is not marked as EOG
load: control token: 151651 '<|quad_end|>' is not marked as EOG
load: control token: 151647 '<|object_ref_end|>' is not marked as EOG
load: control token: 151652 '<|vision_start|>' is not marked as EOG
load: control token: 151654 '<|vision_pad|>' is not marked as EOG
load: control token: 151656 '<|video_pad|>' is not marked as EOG
load: control token: 151644 '<|im_start|>' is not marked as EOG
load: control token: 151661 '<|fim_suffix|>' is not marked as EOG
load: control token: 151650 '<|quad_start|>' is not marked as EOG
load: special tokens cache size = 22
load: token to piece cache size = 0.9310 MB
print_info: arch             = qwen2
print_info: vocab_only       = 0
print_info: n_ctx_train      = 131072
print_info: n_embd           = 5120
print_info: n_layer          = 48
print_info: n_head           = 40
print_info: n_head_kv        = 8
print_info: n_rot            = 128
print_info: n_swa            = 0
print_info: n_embd_head_k    = 128
print_info: n_embd_head_v    = 128
print_info: n_gqa            = 5
print_info: n_embd_k_gqa     = 1024
print_info: n_embd_v_gqa     = 1024
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-05
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: n_ff             = 13824
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 2
print_info: rope scaling     = linear
print_info: freq_base_train  = 1000000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 131072
print_info: rope_finetuned   = unknown
print_info: ssm_d_conv       = 0
print_info: ssm_d_inner      = 0
print_info: ssm_d_state      = 0
print_info: ssm_dt_rank      = 0
print_info: ssm_dt_b_c_rms   = 0
print_info: model type       = 14B
print_info: model params     = 14.77 B
print_info: general.name     = Qwen2.5 14B
print_info: vocab type       = BPE
print_info: n_vocab          = 152064
print_info: n_merges         = 151387
print_info: BOS token        = 151643 '<|endoftext|>'
print_info: EOS token        = 151645 '<|im_end|>'
print_info: EOT token        = 151645 '<|im_end|>'
print_info: PAD token        = 151643 '<|endoftext|>'
print_info: LF token         = 198 'Ċ'
print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
print_info: FIM MID token    = 151660 '<|fim_middle|>'
print_info: FIM PAD token    = 151662 '<|fim_pad|>'
print_info: FIM REP token    = 151663 '<|repo_name|>'
print_info: FIM SEP token    = 151664 '<|file_sep|>'
print_info: EOG token        = 151643 '<|endoftext|>'
print_info: EOG token        = 151645 '<|im_end|>'
print_info: EOG token        = 151662 '<|fim_pad|>'
print_info: EOG token        = 151663 '<|repo_name|>'
print_info: EOG token        = 151664 '<|file_sep|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: layer   0 assigned to device ROCm0
load_tensors: layer   1 assigned to device ROCm0
load_tensors: layer   2 assigned to device ROCm0
load_tensors: layer   3 assigned to device ROCm0
load_tensors: layer   4 assigned to device ROCm0
load_tensors: layer   5 assigned to device ROCm0
load_tensors: layer   6 assigned to device ROCm0
load_tensors: layer   7 assigned to device ROCm0
load_tensors: layer   8 assigned to device ROCm0
load_tensors: layer   9 assigned to device ROCm0
load_tensors: layer  10 assigned to device ROCm0
load_tensors: layer  11 assigned to device ROCm0
load_tensors: layer  12 assigned to device ROCm0
load_tensors: layer  13 assigned to device ROCm0
load_tensors: layer  14 assigned to device ROCm0
load_tensors: layer  15 assigned to device ROCm0
load_tensors: layer  16 assigned to device ROCm0
load_tensors: layer  17 assigned to device ROCm0
load_tensors: layer  18 assigned to device ROCm0
load_tensors: layer  19 assigned to device ROCm0
load_tensors: layer  20 assigned to device ROCm0
load_tensors: layer  21 assigned to device ROCm0
load_tensors: layer  22 assigned to device ROCm0
load_tensors: layer  23 assigned to device ROCm0
load_tensors: layer  24 assigned to device ROCm0
load_tensors: layer  25 assigned to device ROCm0
load_tensors: layer  26 assigned to device ROCm0
load_tensors: layer  27 assigned to device ROCm0
load_tensors: layer  28 assigned to device ROCm0
load_tensors: layer  29 assigned to device ROCm0
load_tensors: layer  30 assigned to device ROCm0
load_tensors: layer  31 assigned to device ROCm0
load_tensors: layer  32 assigned to device ROCm0
load_tensors: layer  33 assigned to device ROCm0
load_tensors: layer  34 assigned to device ROCm0
load_tensors: layer  35 assigned to device ROCm0
load_tensors: layer  36 assigned to device ROCm0
load_tensors: layer  37 assigned to device ROCm0
load_tensors: layer  38 assigned to device ROCm0
load_tensors: layer  39 assigned to device ROCm0
load_tensors: layer  40 assigned to device ROCm0
load_tensors: layer  41 assigned to device ROCm0
load_tensors: layer  42 assigned to device ROCm0
load_tensors: layer  43 assigned to device ROCm0
load_tensors: layer  44 assigned to device ROCm0
load_tensors: layer  45 assigned to device ROCm0
load_tensors: layer  46 assigned to device ROCm0
load_tensors: layer  47 assigned to device ROCm0
load_tensors: layer  48 assigned to device ROCm0
load_tensors: tensor 'token_embd.weight' (q5_K) (and 0 others) cannot be used with preferred buffer type ROCm_Host, using CPU instead
load_tensors: offloading 48 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 49/49 layers to GPU
load_tensors:   CPU_Mapped model buffer size =   510.47 MiB
load_tensors:        ROCm0 model buffer size =  9505.88 MiB
time=2025-02-28T12:20:39.550+01:00 level=DEBUG source=server.go:602 msg="model load progress 0.05"
time=2025-02-28T12:20:40.052+01:00 level=DEBUG source=server.go:602 msg="model load progress 0.05"
time=2025-02-28T12:20:40.303+01:00 level=DEBUG source=server.go:602 msg="model load progress 0.13"
time=2025-02-28T12:20:40.554+01:00 level=DEBUG source=server.go:602 msg="model load progress 0.21"
time=2025-02-28T12:20:40.805+01:00 level=DEBUG source=server.go:602 msg="model load progress 0.30"
time=2025-02-28T12:20:41.056+01:00 level=DEBUG source=server.go:602 msg="model load progress 0.37"
time=2025-02-28T12:20:41.307+01:00 level=DEBUG source=server.go:602 msg="model load progress 0.44"
time=2025-02-28T12:20:41.558+01:00 level=DEBUG source=server.go:602 msg="model load progress 0.51"
time=2025-02-28T12:20:41.809+01:00 level=DEBUG source=server.go:602 msg="model load progress 0.60"
time=2025-02-28T12:20:42.060+01:00 level=DEBUG source=server.go:602 msg="model load progress 0.69"
time=2025-02-28T12:20:42.311+01:00 level=DEBUG source=server.go:602 msg="model load progress 0.79"
time=2025-02-28T12:20:42.562+01:00 level=DEBUG source=server.go:602 msg="model load progress 0.88"
time=2025-02-28T12:20:42.813+01:00 level=DEBUG source=server.go:602 msg="model load progress 0.98"
time=2025-02-28T12:20:43.064+01:00 level=DEBUG source=server.go:602 msg="model load progress 1.00"
llama_init_from_model: n_seq_max     = 2
llama_init_from_model: n_ctx         = 20224
llama_init_from_model: n_ctx_per_seq = 10112
llama_init_from_model: n_batch       = 1024
llama_init_from_model: n_ubatch      = 512
llama_init_from_model: flash_attn    = 1
llama_init_from_model: freq_base     = 1000000.0
llama_init_from_model: freq_scale    = 1
llama_init_from_model: n_ctx_per_seq (10112) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
llama_kv_cache_init: kv_size = 20224, offload = 1, type_k = 'q8_0', type_v = 'q8_0', n_layer = 48, can_shift = 1
llama_kv_cache_init: layer 0: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 1: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 2: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 3: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 4: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 5: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 6: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 7: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 8: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 9: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 10: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 11: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 12: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 13: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 14: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 15: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 16: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 17: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 18: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 19: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 20: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 21: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 22: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 23: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 24: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 25: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 26: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 27: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 28: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 29: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 30: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 31: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 32: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 33: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 34: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 35: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 36: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 37: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 38: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 39: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 40: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 41: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 42: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 43: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 44: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 45: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 46: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
llama_kv_cache_init: layer 47: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024
time=2025-02-28T12:20:43.315+01:00 level=DEBUG source=server.go:602 msg="model load progress 1.00"
llama_kv_cache_init:      ROCm0 KV buffer size =  2014.50 MiB
llama_init_from_model: KV self size  = 2014.50 MiB, K (q8_0): 1007.25 MiB, V (q8_0): 1007.25 MiB
llama_init_from_model:  ROCm_Host  output buffer size =     1.20 MiB
llama_init_from_model:      ROCm0 compute buffer size =   317.00 MiB
llama_init_from_model:  ROCm_Host compute buffer size =   101.22 MiB
llama_init_from_model: graph nodes  = 1495
llama_init_from_model: graph splits = 98
time=2025-02-28T12:20:43.566+01:00 level=INFO source=server.go:596 msg="llama runner started in 26.61 seconds"
time=2025-02-28T12:20:43.566+01:00 level=DEBUG source=sched.go:463 msg="finished setting up runner" model=/ollama/blobs/sha256-bbfb685133c274407d565c65b1ca806eb1593482b1c9d8524596797b24123862
[GIN] 2025/02/28 - 12:20:43 | 200 | 27.089327333s |       127.0.0.1 | POST     "/api/generate"
time=2025-02-28T12:20:43.566+01:00 level=DEBUG source=sched.go:467 msg="context for request finished"
time=2025-02-28T12:20:43.566+01:00 level=DEBUG source=sched.go:340 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/ollama/blobs/sha256-bbfb685133c274407d565c65b1ca806eb1593482b1c9d8524596797b24123862 duration=5m0s
time=2025-02-28T12:20:43.566+01:00 level=DEBUG source=sched.go:358 msg="after processing request finished event" modelPath=/ollama/blobs/sha256-bbfb685133c274407d565c65b1ca806eb1593482b1c9d8524596797b24123862 refCount=0
time=2025-02-28T12:20:50.971+01:00 level=DEBUG source=sched.go:576 msg="evaluating already loaded" model=/ollama/blobs/sha256-bbfb685133c274407d565c65b1ca806eb1593482b1c9d8524596797b24123862
time=2025-02-28T12:20:50.972+01:00 level=DEBUG source=server.go:968 msg="new runner detected, loading model for cgo tokenization"
llama_model_loader: loaded meta data with 43 key-value pairs and 579 tensors from /ollama/blobs/sha256-bbfb685133c274407d565c65b1ca806eb1593482b1c9d8524596797b24123862 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 14B
llama_model_loader: - kv   3:                       general.organization str              = Qwen
llama_model_loader: - kv   4:                           general.basename str              = Qwen2.5
llama_model_loader: - kv   5:                         general.size_label str              = 14B
llama_model_loader: - kv   6:                   general.base_model.count u32              = 3
llama_model_loader: - kv   7:                  general.base_model.0.name str              = Qwamma 14b Merge v1
llama_model_loader: - kv   8:               general.base_model.0.version str              = v1
llama_model_loader: - kv   9:          general.base_model.0.organization str              = Chargoddard
llama_model_loader: - kv  10:              general.base_model.0.repo_url str              = https://huggingface.co/chargoddard/qw...
llama_model_loader: - kv  11:                  general.base_model.1.name str              = Qwen2.5 14B Instruct_arcee Qwen2 14B ...
llama_model_loader: - kv  12:               general.base_model.1.version str              = v0.2
llama_model_loader: - kv  13:          general.base_model.1.organization str              = Arcee Train
llama_model_loader: - kv  14:              general.base_model.1.repo_url str              = https://huggingface.co/arcee-train/Qw...
llama_model_loader: - kv  15:                  general.base_model.2.name str              = Qwen2.5 14B
llama_model_loader: - kv  16:          general.base_model.2.organization str              = Qwen
llama_model_loader: - kv  17:              general.base_model.2.repo_url str              = https://huggingface.co/Qwen/Qwen2.5-14B
llama_model_loader: - kv  18:                               general.tags arr[str,2]       = ["mergekit", "merge"]
llama_model_loader: - kv  19:                          qwen2.block_count u32              = 48
llama_model_loader: - kv  20:                       qwen2.context_length u32              = 131072
llama_model_loader: - kv  21:                     qwen2.embedding_length u32              = 5120
llama_model_loader: - kv  22:                  qwen2.feed_forward_length u32              = 13824
llama_model_loader: - kv  23:                 qwen2.attention.head_count u32              = 40
llama_model_loader: - kv  24:              qwen2.attention.head_count_kv u32              = 8
llama_model_loader: - kv  25:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  26:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  27:                          general.file_type u32              = 17
llama_model_loader: - kv  28:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  29:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  30:                      tokenizer.ggml.tokens arr[str,152064]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  31:                  tokenizer.ggml.token_type arr[i32,152064]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  32:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  33:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  34:            tokenizer.ggml.padding_token_id u32              = 151643
llama_model_loader: - kv  35:                tokenizer.ggml.bos_token_id u32              = 151643
llama_model_loader: - kv  36:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  37:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
llama_model_loader: - kv  38:               general.quantization_version u32              = 2
llama_model_loader: - kv  39:                      quantize.imatrix.file str              = /models_out/SuperNova-14B-GGUF/SuperN...
llama_model_loader: - kv  40:                   quantize.imatrix.dataset str              = /training_dir/calibration_datav3.txt
llama_model_loader: - kv  41:             quantize.imatrix.entries_count i32              = 336
llama_model_loader: - kv  42:              quantize.imatrix.chunks_count i32              = 128
llama_model_loader: - type  f32:  241 tensors
llama_model_loader: - type q5_K:  289 tensors
llama_model_loader: - type q6_K:   49 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q5_K - Medium
print_info: file size   = 9.78 GiB (5.69 BPW)
init_tokenizer: initializing tokenizer for type 2
load: control token: 151660 '<|fim_middle|>' is not marked as EOG
load: control token: 151659 '<|fim_prefix|>' is not marked as EOG
load: control token: 151653 '<|vision_end|>' is not marked as EOG
load: control token: 151648 '<|box_start|>' is not marked as EOG
load: control token: 151646 '<|object_ref_start|>' is not marked as EOG
load: control token: 151649 '<|box_end|>' is not marked as EOG
load: control token: 151655 '<|image_pad|>' is not marked as EOG
load: control token: 151651 '<|quad_end|>' is not marked as EOG
load: control token: 151647 '<|object_ref_end|>' is not marked as EOG
load: control token: 151652 '<|vision_start|>' is not marked as EOG
load: control token: 151654 '<|vision_pad|>' is not marked as EOG
load: control token: 151656 '<|video_pad|>' is not marked as EOG
load: control token: 151644 '<|im_start|>' is not marked as EOG
load: control token: 151661 '<|fim_suffix|>' is not marked as EOG
load: control token: 151650 '<|quad_start|>' is not marked as EOG
load: special tokens cache size = 22
load: token to piece cache size = 0.9310 MB
print_info: arch             = qwen2
print_info: vocab_only       = 1
print_info: model type       = ?B
print_info: model params     = 14.77 B
print_info: general.name     = Qwen2.5 14B
print_info: vocab type       = BPE
print_info: n_vocab          = 152064
print_info: n_merges         = 151387
print_info: BOS token        = 151643 '<|endoftext|>'
print_info: EOS token        = 151645 '<|im_end|>'
print_info: EOT token        = 151645 '<|im_end|>'
print_info: PAD token        = 151643 '<|endoftext|>'
print_info: LF token         = 198 'Ċ'
print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
print_info: FIM MID token    = 151660 '<|fim_middle|>'
print_info: FIM PAD token    = 151662 '<|fim_pad|>'
print_info: FIM REP token    = 151663 '<|repo_name|>'
print_info: FIM SEP token    = 151664 '<|file_sep|>'
print_info: EOG token        = 151643 '<|endoftext|>'
print_info: EOG token        = 151645 '<|im_end|>'
print_info: EOG token        = 151662 '<|fim_pad|>'
print_info: EOG token        = 151663 '<|repo_name|>'
print_info: EOG token        = 151664 '<|file_sep|>'
print_info: max token length = 256
llama_model_load: vocab only - skipping tensors
time=2025-02-28T12:20:51.260+01:00 level=DEBUG source=routes.go:1505 msg="chat request" images=0 prompt="<|im_start|>system\n\nYou are Quinn, a happy and cheerful AI assistant. You are a sunshine and rainbows kind of gal. \n\nConversation instructions that you must follow:\n - You always respond in English.\n - Do not address the user by name.\n<|im_end|>\n<|im_start|>user\nblammo<|im_end|>\n<|im_start|>assistant\n"
time=2025-02-28T12:20:51.265+01:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=61 used=0 remaining=61

OS

Gentoo Linux

GPU

AMD, NVidia

CPU

AMD

Ollama version

0.5.13-rc1

Originally created by @ProjectMoon on GitHub (Feb 28, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/9416 Updated this to reference the specific problem of the vision part of Granite 3.2 seeming to use the CPU, rather than the GPU, on 0.5.13. ### What is the issue? After upgrading to 0.5.13-rc1, I have noticed that ROCm fails to actually run. When the model is loaded, it loads onto the GPU (confirmed via `rocm-smi`), but when trying to chat, it seems to reload the model and I guess uses the CPU? This causes my computer to lock up too (but that might just be RAM thrashing). In the log output, you can see it first loading on to ROCm, and then it reloads the model when the chat endpoint is called, and that seems to skip the GPU for some reason. Downgrading back to 0.5.12 works perfectly. I am not using the system ROCm as far as I know. I always untar the ROCm package from ollama when upgrading. <details> <summary>Debug Logs</summary> Logs: ``` 2025/02/28 12:20:14 routes.go:1215: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE:q8_0 OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/ollama OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:2 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-02-28T12:20:14.283+01:00 level=INFO source=images.go:432 msg="total blobs: 134" time=2025-02-28T12:20:14.285+01:00 level=INFO source=images.go:439 msg="total unused blobs removed: 0" time=2025-02-28T12:20:14.286+01:00 level=INFO source=routes.go:1281 msg="Listening on [::]:11434 (version 0.5.13-rc1)" time=2025-02-28T12:20:14.286+01:00 level=DEBUG source=sched.go:106 msg="starting llm scheduler" time=2025-02-28T12:20:14.286+01:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-02-28T12:20:14.287+01:00 level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA" time=2025-02-28T12:20:14.288+01:00 level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=libcuda.so* time=2025-02-28T12:20:14.288+01:00 level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[/opt/ollama/lib/ollama/libcuda.so* /libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2025-02-28T12:20:14.310+01:00 level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths="[/usr/lib/libcuda.so.570.86.16 /usr/lib64/libcuda.so.570.86.16]" initializing /usr/lib/libcuda.so.570.86.16 library /usr/lib/libcuda.so.570.86.16 load err: /usr/lib/libcuda.so.570.86.16: wrong ELF class: ELFCLASS32 time=2025-02-28T12:20:14.310+01:00 level=DEBUG source=gpu.go:609 msg="skipping 32bit library" library=/usr/lib/libcuda.so.570.86.16 initializing /usr/lib64/libcuda.so.570.86.16 dlsym: cuInit - 0x7f0bc7d0de00 dlsym: cuDriverGetVersion - 0x7f0bc7d0de20 dlsym: cuDeviceGetCount - 0x7f0bc7d0de60 dlsym: cuDeviceGet - 0x7f0bc7d0de40 dlsym: cuDeviceGetAttribute - 0x7f0bc7d0df40 dlsym: cuDeviceGetUuid - 0x7f0bc7d0dea0 dlsym: cuDeviceGetName - 0x7f0bc7d0de80 dlsym: cuCtxCreate_v3 - 0x7f0bc7d0e120 dlsym: cuMemGetInfo_v2 - 0x7f0bc7d0e8a0 dlsym: cuCtxDestroy - 0x7f0bc7d6c9f0 calling cuInit calling cuDriverGetVersion raw version 0x2f30 CUDA driver version: 12.8 calling cuDeviceGetCount device count 1 time=2025-02-28T12:20:14.334+01:00 level=DEBUG source=gpu.go:125 msg="detected GPUs" count=1 library=/usr/lib64/libcuda.so.570.86.16 [GPU-64fa45ff-fe00-d712-1796-ed74da57bfa7] CUDA totalMem 4030 mb [GPU-64fa45ff-fe00-d712-1796-ed74da57bfa7] CUDA freeMem 870 mb [GPU-64fa45ff-fe00-d712-1796-ed74da57bfa7] Compute Capability 5.2 time=2025-02-28T12:20:14.419+01:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2025-02-28T12:20:14.419+01:00 level=DEBUG source=amd_linux.go:101 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2025-02-28T12:20:14.419+01:00 level=DEBUG source=amd_linux.go:121 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2025-02-28T12:20:14.419+01:00 level=DEBUG source=amd_linux.go:101 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2025-02-28T12:20:14.419+01:00 level=DEBUG source=amd_linux.go:206 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=29631 unique_id=10870137312548343375 time=2025-02-28T12:20:14.420+01:00 level=DEBUG source=amd_linux.go:240 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card0/device time=2025-02-28T12:20:14.420+01:00 level=DEBUG source=amd_linux.go:318 msg="amdgpu memory" gpu=0 total="16.0 GiB" time=2025-02-28T12:20:14.420+01:00 level=DEBUG source=amd_linux.go:319 msg="amdgpu memory" gpu=0 available="15.5 GiB" time=2025-02-28T12:20:14.420+01:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/ollama/lib/ollama/rocm" time=2025-02-28T12:20:14.420+01:00 level=DEBUG source=amd_common.go:44 msg="detected ROCM next to ollama executable /opt/ollama/lib/ollama/rocm" time=2025-02-28T12:20:14.427+01:00 level=DEBUG source=amd_linux.go:371 msg="rocm supported GPUs" types="[gfx1010 gfx1012 gfx1030 gfx1100 gfx1101 gfx1102 gfx1151 gfx1200 gfx1201 gfx900 gfx906 gfx908 gfx90a gfx942]" time=2025-02-28T12:20:14.427+01:00 level=INFO source=amd_linux.go:386 msg="amdgpu is supported" gpu=GPU-96da7c4b1629ce4f gpu_type=gfx1030 releasing cuda driver library time=2025-02-28T12:20:14.427+01:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-64fa45ff-fe00-d712-1796-ed74da57bfa7 library=cuda variant=v12 compute=5.2 driver=12.8 name="NVIDIA GeForce GTX 970" total="3.9 GiB" available="870.0 MiB" time=2025-02-28T12:20:14.427+01:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-96da7c4b1629ce4f library=rocm variant="" compute=gfx1030 driver=0.0 name=1002:73bf total="16.0 GiB" available="15.5 GiB" [GIN] 2025/02/28 - 12:20:16 | 200 | 5.866593ms | 127.0.0.1 | HEAD "/" [GIN] 2025/02/28 - 12:20:16 | 200 | 28.764586ms | 127.0.0.1 | POST "/api/show" time=2025-02-28T12:20:16.515+01:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="62.7 GiB" before.free="32.8 GiB" before.free_swap="1.9 MiB" now.total="62.7 GiB" now.free="32.8 GiB" now.free_swap="1.9 MiB" initializing /usr/lib64/libcuda.so.570.86.16 dlsym: cuInit - 0x7f0bc7d0de00 dlsym: cuDriverGetVersion - 0x7f0bc7d0de20 dlsym: cuDeviceGetCount - 0x7f0bc7d0de60 dlsym: cuDeviceGet - 0x7f0bc7d0de40 dlsym: cuDeviceGetAttribute - 0x7f0bc7d0df40 dlsym: cuDeviceGetUuid - 0x7f0bc7d0dea0 dlsym: cuDeviceGetName - 0x7f0bc7d0de80 dlsym: cuCtxCreate_v3 - 0x7f0bc7d0e120 dlsym: cuMemGetInfo_v2 - 0x7f0bc7d0e8a0 dlsym: cuCtxDestroy - 0x7f0bc7d6c9f0 calling cuInit calling cuDriverGetVersion raw version 0x2f30 CUDA driver version: 12.8 calling cuDeviceGetCount device count 1 time=2025-02-28T12:20:16.583+01:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-64fa45ff-fe00-d712-1796-ed74da57bfa7 name="NVIDIA GeForce GTX 970" overhead="0 B" before.total="3.9 GiB" before.free="870.0 MiB" now.total="3.9 GiB" now.free="870.0 MiB" now.used="3.1 GiB" time=2025-02-28T12:20:16.584+01:00 level=DEBUG source=amd_linux.go:488 msg="updating rocm free memory" gpu=GPU-96da7c4b1629ce4f name=1002:73bf before="15.5 GiB" now="15.5 GiB" releasing cuda driver library time=2025-02-28T12:20:16.584+01:00 level=DEBUG source=sched.go:182 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=6 gpu_count=2 time=2025-02-28T12:20:16.620+01:00 level=DEBUG source=sched.go:225 msg="loading first model" model=/ollama/blobs/sha256-bbfb685133c274407d565c65b1ca806eb1593482b1c9d8524596797b24123862 time=2025-02-28T12:20:16.620+01:00 level=DEBUG source=memory.go:108 msg=evaluating library=cuda gpu_count=1 available="[870.0 MiB]" time=2025-02-28T12:20:16.621+01:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="62.7 GiB" before.free="32.8 GiB" before.free_swap="1.9 MiB" now.total="62.7 GiB" now.free="32.8 GiB" now.free_swap="1.9 MiB" initializing /usr/lib64/libcuda.so.570.86.16 dlsym: cuInit - 0x7f0bc7d0de00 dlsym: cuDriverGetVersion - 0x7f0bc7d0de20 dlsym: cuDeviceGetCount - 0x7f0bc7d0de60 dlsym: cuDeviceGet - 0x7f0bc7d0de40 dlsym: cuDeviceGetAttribute - 0x7f0bc7d0df40 dlsym: cuDeviceGetUuid - 0x7f0bc7d0dea0 dlsym: cuDeviceGetName - 0x7f0bc7d0de80 dlsym: cuCtxCreate_v3 - 0x7f0bc7d0e120 dlsym: cuMemGetInfo_v2 - 0x7f0bc7d0e8a0 dlsym: cuCtxDestroy - 0x7f0bc7d6c9f0 calling cuInit calling cuDriverGetVersion raw version 0x2f30 CUDA driver version: 12.8 calling cuDeviceGetCount device count 1 time=2025-02-28T12:20:16.685+01:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-64fa45ff-fe00-d712-1796-ed74da57bfa7 name="NVIDIA GeForce GTX 970" overhead="0 B" before.total="3.9 GiB" before.free="870.0 MiB" now.total="3.9 GiB" now.free="870.0 MiB" now.used="3.1 GiB" time=2025-02-28T12:20:16.685+01:00 level=DEBUG source=amd_linux.go:488 msg="updating rocm free memory" gpu=GPU-96da7c4b1629ce4f name=1002:73bf before="15.5 GiB" now="15.5 GiB" releasing cuda driver library time=2025-02-28T12:20:16.685+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.key_length default=128 time=2025-02-28T12:20:16.685+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.value_length default=128 time=2025-02-28T12:20:16.685+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.key_length default=128 time=2025-02-28T12:20:16.685+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.value_length default=128 time=2025-02-28T12:20:16.686+01:00 level=DEBUG source=memory.go:185 msg="gpu has too little memory to allocate any layers" id=GPU-64fa45ff-fe00-d712-1796-ed74da57bfa7 library=cuda variant=v12 compute=5.2 driver=12.8 name="NVIDIA GeForce GTX 970" total="3.9 GiB" available="870.0 MiB" minimum_memory=479199232 layer_size="229.2 MiB" gpu_zer_overhead="0 B" partial_offload="2.0 GiB" full_offload="1.6 GiB" time=2025-02-28T12:20:16.686+01:00 level=DEBUG source=memory.go:329 msg="insufficient VRAM to load any model layers" time=2025-02-28T12:20:16.686+01:00 level=DEBUG source=memory.go:108 msg=evaluating library=cuda gpu_count=1 available="[870.0 MiB]" time=2025-02-28T12:20:16.686+01:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="62.7 GiB" before.free="32.8 GiB" before.free_swap="1.9 MiB" now.total="62.7 GiB" now.free="32.8 GiB" now.free_swap="1.9 MiB" initializing /usr/lib64/libcuda.so.570.86.16 dlsym: cuInit - 0x7f0bc7d0de00 dlsym: cuDriverGetVersion - 0x7f0bc7d0de20 dlsym: cuDeviceGetCount - 0x7f0bc7d0de60 dlsym: cuDeviceGet - 0x7f0bc7d0de40 dlsym: cuDeviceGetAttribute - 0x7f0bc7d0df40 dlsym: cuDeviceGetUuid - 0x7f0bc7d0dea0 dlsym: cuDeviceGetName - 0x7f0bc7d0de80 dlsym: cuCtxCreate_v3 - 0x7f0bc7d0e120 dlsym: cuMemGetInfo_v2 - 0x7f0bc7d0e8a0 dlsym: cuCtxDestroy - 0x7f0bc7d6c9f0 calling cuInit calling cuDriverGetVersion raw version 0x2f30 CUDA driver version: 12.8 calling cuDeviceGetCount device count 1 time=2025-02-28T12:20:16.762+01:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-64fa45ff-fe00-d712-1796-ed74da57bfa7 name="NVIDIA GeForce GTX 970" overhead="0 B" before.total="3.9 GiB" before.free="870.0 MiB" now.total="3.9 GiB" now.free="870.0 MiB" now.used="3.1 GiB" time=2025-02-28T12:20:16.762+01:00 level=DEBUG source=amd_linux.go:488 msg="updating rocm free memory" gpu=GPU-96da7c4b1629ce4f name=1002:73bf before="15.5 GiB" now="15.5 GiB" releasing cuda driver library time=2025-02-28T12:20:16.762+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.key_length default=128 time=2025-02-28T12:20:16.762+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.value_length default=128 time=2025-02-28T12:20:16.762+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.key_length default=128 time=2025-02-28T12:20:16.762+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.value_length default=128 time=2025-02-28T12:20:16.763+01:00 level=DEBUG source=memory.go:185 msg="gpu has too little memory to allocate any layers" id=GPU-64fa45ff-fe00-d712-1796-ed74da57bfa7 library=cuda variant=v12 compute=5.2 driver=12.8 name="NVIDIA GeForce GTX 970" total="3.9 GiB" available="870.0 MiB" minimum_memory=479199232 layer_size="229.2 MiB" gpu_zer_overhead="0 B" partial_offload="2.0 GiB" full_offload="1.6 GiB" time=2025-02-28T12:20:16.763+01:00 level=DEBUG source=memory.go:329 msg="insufficient VRAM to load any model layers" time=2025-02-28T12:20:16.763+01:00 level=DEBUG source=memory.go:108 msg=evaluating library=rocm gpu_count=1 available="[15.5 GiB]" time=2025-02-28T12:20:16.763+01:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="62.7 GiB" before.free="32.8 GiB" before.free_swap="1.9 MiB" now.total="62.7 GiB" now.free="32.7 GiB" now.free_swap="1.9 MiB" initializing /usr/lib64/libcuda.so.570.86.16 dlsym: cuInit - 0x7f0bc7d0de00 dlsym: cuDriverGetVersion - 0x7f0bc7d0de20 dlsym: cuDeviceGetCount - 0x7f0bc7d0de60 dlsym: cuDeviceGet - 0x7f0bc7d0de40 dlsym: cuDeviceGetAttribute - 0x7f0bc7d0df40 dlsym: cuDeviceGetUuid - 0x7f0bc7d0dea0 dlsym: cuDeviceGetName - 0x7f0bc7d0de80 dlsym: cuCtxCreate_v3 - 0x7f0bc7d0e120 dlsym: cuMemGetInfo_v2 - 0x7f0bc7d0e8a0 dlsym: cuCtxDestroy - 0x7f0bc7d6c9f0 calling cuInit calling cuDriverGetVersion raw version 0x2f30 CUDA driver version: 12.8 calling cuDeviceGetCount device count 1 time=2025-02-28T12:20:16.829+01:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-64fa45ff-fe00-d712-1796-ed74da57bfa7 name="NVIDIA GeForce GTX 970" overhead="0 B" before.total="3.9 GiB" before.free="870.0 MiB" now.total="3.9 GiB" now.free="870.0 MiB" now.used="3.1 GiB" time=2025-02-28T12:20:16.829+01:00 level=DEBUG source=amd_linux.go:488 msg="updating rocm free memory" gpu=GPU-96da7c4b1629ce4f name=1002:73bf before="15.5 GiB" now="15.5 GiB" releasing cuda driver library time=2025-02-28T12:20:16.829+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.key_length default=128 time=2025-02-28T12:20:16.829+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.value_length default=128 time=2025-02-28T12:20:16.829+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.key_length default=128 time=2025-02-28T12:20:16.829+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.value_length default=128 time=2025-02-28T12:20:16.830+01:00 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/ollama/blobs/sha256-bbfb685133c274407d565c65b1ca806eb1593482b1c9d8524596797b24123862 gpu=GPU-96da7c4b1629ce4f parallel=2 available=16604295168 required="13.4 GiB" time=2025-02-28T12:20:16.830+01:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="62.7 GiB" before.free="32.7 GiB" before.free_swap="1.9 MiB" now.total="62.7 GiB" now.free="32.8 GiB" now.free_swap="1.9 MiB" initializing /usr/lib64/libcuda.so.570.86.16 dlsym: cuInit - 0x7f0bc7d0de00 dlsym: cuDriverGetVersion - 0x7f0bc7d0de20 dlsym: cuDeviceGetCount - 0x7f0bc7d0de60 dlsym: cuDeviceGet - 0x7f0bc7d0de40 dlsym: cuDeviceGetAttribute - 0x7f0bc7d0df40 dlsym: cuDeviceGetUuid - 0x7f0bc7d0dea0 dlsym: cuDeviceGetName - 0x7f0bc7d0de80 dlsym: cuCtxCreate_v3 - 0x7f0bc7d0e120 dlsym: cuMemGetInfo_v2 - 0x7f0bc7d0e8a0 dlsym: cuCtxDestroy - 0x7f0bc7d6c9f0 calling cuInit calling cuDriverGetVersion raw version 0x2f30 CUDA driver version: 12.8 calling cuDeviceGetCount device count 1 time=2025-02-28T12:20:16.894+01:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-64fa45ff-fe00-d712-1796-ed74da57bfa7 name="NVIDIA GeForce GTX 970" overhead="0 B" before.total="3.9 GiB" before.free="870.0 MiB" now.total="3.9 GiB" now.free="870.0 MiB" now.used="3.1 GiB" time=2025-02-28T12:20:16.894+01:00 level=DEBUG source=amd_linux.go:488 msg="updating rocm free memory" gpu=GPU-96da7c4b1629ce4f name=1002:73bf before="15.5 GiB" now="15.5 GiB" releasing cuda driver library time=2025-02-28T12:20:16.894+01:00 level=INFO source=server.go:97 msg="system memory" total="62.7 GiB" free="32.8 GiB" free_swap="1.9 MiB" time=2025-02-28T12:20:16.894+01:00 level=DEBUG source=memory.go:108 msg=evaluating library=rocm gpu_count=1 available="[15.5 GiB]" time=2025-02-28T12:20:16.895+01:00 level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="62.7 GiB" before.free="32.8 GiB" before.free_swap="1.9 MiB" now.total="62.7 GiB" now.free="32.8 GiB" now.free_swap="1.9 MiB" initializing /usr/lib64/libcuda.so.570.86.16 dlsym: cuInit - 0x7f0bc7d0de00 dlsym: cuDriverGetVersion - 0x7f0bc7d0de20 dlsym: cuDeviceGetCount - 0x7f0bc7d0de60 dlsym: cuDeviceGet - 0x7f0bc7d0de40 dlsym: cuDeviceGetAttribute - 0x7f0bc7d0df40 dlsym: cuDeviceGetUuid - 0x7f0bc7d0dea0 dlsym: cuDeviceGetName - 0x7f0bc7d0de80 dlsym: cuCtxCreate_v3 - 0x7f0bc7d0e120 dlsym: cuMemGetInfo_v2 - 0x7f0bc7d0e8a0 dlsym: cuCtxDestroy - 0x7f0bc7d6c9f0 calling cuInit calling cuDriverGetVersion raw version 0x2f30 CUDA driver version: 12.8 calling cuDeviceGetCount device count 1 time=2025-02-28T12:20:16.958+01:00 level=DEBUG source=gpu.go:441 msg="updating cuda memory data" gpu=GPU-64fa45ff-fe00-d712-1796-ed74da57bfa7 name="NVIDIA GeForce GTX 970" overhead="0 B" before.total="3.9 GiB" before.free="870.0 MiB" now.total="3.9 GiB" now.free="870.0 MiB" now.used="3.1 GiB" time=2025-02-28T12:20:16.958+01:00 level=DEBUG source=amd_linux.go:488 msg="updating rocm free memory" gpu=GPU-96da7c4b1629ce4f name=1002:73bf before="15.5 GiB" now="15.5 GiB" releasing cuda driver library time=2025-02-28T12:20:16.958+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.key_length default=128 time=2025-02-28T12:20:16.958+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.value_length default=128 time=2025-02-28T12:20:16.958+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.key_length default=128 time=2025-02-28T12:20:16.958+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.value_length default=128 time=2025-02-28T12:20:16.959+01:00 level=INFO source=server.go:130 msg=offload library=rocm layers.requested=-1 layers.model=49 layers.offload=49 layers.split="" memory.available="[15.5 GiB]" memory.gpu_overhead="0 B" memory.required.full="13.4 GiB" memory.required.partial="13.4 GiB" memory.required.kv="1.8 GiB" memory.required.allocations="[13.4 GiB]" memory.weights.total="10.5 GiB" memory.weights.repeating="9.9 GiB" memory.weights.nonrepeating="609.1 MiB" memory.graph.full="1.6 GiB" memory.graph.partial="2.0 GiB" time=2025-02-28T12:20:16.959+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.key_length default=128 time=2025-02-28T12:20:16.959+01:00 level=WARN source=ggml.go:136 msg="key not found" key=qwen2.attention.value_length default=128 time=2025-02-28T12:20:16.959+01:00 level=INFO source=server.go:182 msg="enabling flash attention" time=2025-02-28T12:20:16.959+01:00 level=DEBUG source=server.go:259 msg="compatible gpu libraries" compatible=[rocm] time=2025-02-28T12:20:16.959+01:00 level=DEBUG source=server.go:302 msg="adding gpu library" path=/opt/ollama/lib/ollama/rocm time=2025-02-28T12:20:16.959+01:00 level=DEBUG source=server.go:310 msg="adding gpu dependency paths" paths=[/opt/ollama/lib/ollama/rocm] time=2025-02-28T12:20:16.959+01:00 level=INFO source=server.go:380 msg="starting llama server" cmd="/opt/ollama/bin/ollama runner --model /ollama/blobs/sha256-bbfb685133c274407d565c65b1ca806eb1593482b1c9d8524596797b24123862 --ctx-size 20000 --batch-size 512 --n-gpu-layers 49 --verbose --threads 6 --flash-attn --kv-cache-type q8_0 --parallel 2 --port 34681" time=2025-02-28T12:20:16.959+01:00 level=DEBUG source=server.go:398 msg=subprocess environment="[PATH=/opt/ollama/bin:/bin:/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/opt/bin:/usr/lib/llvm/19/bin:/usr/lib/llvm/18/bin:/etc/eselect/wine/bin:/opt/cuda/bin LD_LIBRARY_PATH=/opt/ollama/lib/ollama/rocm:/opt/ollama/lib/ollama/rocm:/opt/ollama/lib/ollama ROCR_VISIBLE_DEVICES=GPU-96da7c4b1629ce4f]" time=2025-02-28T12:20:16.960+01:00 level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2025-02-28T12:20:16.960+01:00 level=INFO source=server.go:557 msg="waiting for llama runner to start responding" time=2025-02-28T12:20:16.960+01:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error" time=2025-02-28T12:20:16.986+01:00 level=INFO source=runner.go:931 msg="starting go runner" time=2025-02-28T12:20:16.986+01:00 level=DEBUG source=ggml.go:84 msg="ggml backend load all from path" path=/opt/ollama/lib/ollama/rocm /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1030 (0x1030), VMM: yes, Wave Size: 32 load_backend: loaded ROCm backend from /opt/ollama/lib/ollama/rocm/libggml-hip.so time=2025-02-28T12:20:19.366+01:00 level=DEBUG source=ggml.go:84 msg="ggml backend load all from path" path=/opt/ollama/lib/ollama ggml_backend_load_best: /opt/ollama/lib/ollama/libggml-cpu-alderlake.so score: 0 ggml_backend_load_best: /opt/ollama/lib/ollama/libggml-cpu-haswell.so score: 55 ggml_backend_load_best: /opt/ollama/lib/ollama/libggml-cpu-icelake.so score: 0 ggml_backend_load_best: /opt/ollama/lib/ollama/libggml-cpu-skylakex.so score: 0 ggml_backend_load_best: /opt/ollama/lib/ollama/libggml-cpu-sandybridge.so score: 20 load_backend: loaded CPU backend from /opt/ollama/lib/ollama/libggml-cpu-haswell.so time=2025-02-28T12:20:19.375+01:00 level=INFO source=runner.go:934 msg=system info="CPU : LLAMAFILE = 1 | ROCm : PEER_MAX_BATCH_SIZE = 128 | CPU : LLAMAFILE = 1 | cgo(gcc)" threads=6 time=2025-02-28T12:20:19.375+01:00 level=INFO source=runner.go:991 msg="Server listening on 127.0.0.1:34681" llama_model_load_from_file_impl: using device ROCm0 (AMD Radeon Graphics) - 15966 MiB free llama_model_loader: loaded meta data with 43 key-value pairs and 579 tensors from /ollama/blobs/sha256-bbfb685133c274407d565c65b1ca806eb1593482b1c9d8524596797b24123862 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = qwen2 llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Qwen2.5 14B llama_model_loader: - kv 3: general.organization str = Qwen llama_model_loader: - kv 4: general.basename str = Qwen2.5 llama_model_loader: - kv 5: general.size_label str = 14B llama_model_loader: - kv 6: general.base_model.count u32 = 3 llama_model_loader: - kv 7: general.base_model.0.name str = Qwamma 14b Merge v1 llama_model_loader: - kv 8: general.base_model.0.version str = v1 llama_model_loader: - kv 9: general.base_model.0.organization str = Chargoddard llama_model_loader: - kv 10: general.base_model.0.repo_url str = https://huggingface.co/chargoddard/qw... llama_model_loader: - kv 11: general.base_model.1.name str = Qwen2.5 14B Instruct_arcee Qwen2 14B ... llama_model_loader: - kv 12: general.base_model.1.version str = v0.2 llama_model_loader: - kv 13: general.base_model.1.organization str = Arcee Train llama_model_loader: - kv 14: general.base_model.1.repo_url str = https://huggingface.co/arcee-train/Qw... llama_model_loader: - kv 15: general.base_model.2.name str = Qwen2.5 14B llama_model_loader: - kv 16: general.base_model.2.organization str = Qwen llama_model_loader: - kv 17: general.base_model.2.repo_url str = https://huggingface.co/Qwen/Qwen2.5-14B llama_model_loader: - kv 18: general.tags arr[str,2] = ["mergekit", "merge"] llama_model_loader: - kv 19: qwen2.block_count u32 = 48 llama_model_loader: - kv 20: qwen2.context_length u32 = 131072 llama_model_loader: - kv 21: qwen2.embedding_length u32 = 5120 llama_model_loader: - kv 22: qwen2.feed_forward_length u32 = 13824 llama_model_loader: - kv 23: qwen2.attention.head_count u32 = 40 llama_model_loader: - kv 24: qwen2.attention.head_count_kv u32 = 8 llama_model_loader: - kv 25: qwen2.rope.freq_base f32 = 1000000.000000 llama_model_loader: - kv 26: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 27: general.file_type u32 = 17 llama_model_loader: - kv 28: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 29: tokenizer.ggml.pre str = qwen2 llama_model_loader: - kv 30: tokenizer.ggml.tokens arr[str,152064] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 31: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... time=2025-02-28T12:20:19.474+01:00 level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: - kv 32: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 33: tokenizer.ggml.eos_token_id u32 = 151645 llama_model_loader: - kv 34: tokenizer.ggml.padding_token_id u32 = 151643 llama_model_loader: - kv 35: tokenizer.ggml.bos_token_id u32 = 151643 llama_model_loader: - kv 36: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 37: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... llama_model_loader: - kv 38: general.quantization_version u32 = 2 llama_model_loader: - kv 39: quantize.imatrix.file str = /models_out/SuperNova-14B-GGUF/SuperN... llama_model_loader: - kv 40: quantize.imatrix.dataset str = /training_dir/calibration_datav3.txt llama_model_loader: - kv 41: quantize.imatrix.entries_count i32 = 336 llama_model_loader: - kv 42: quantize.imatrix.chunks_count i32 = 128 llama_model_loader: - type f32: 241 tensors llama_model_loader: - type q5_K: 289 tensors llama_model_loader: - type q6_K: 49 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q5_K - Medium print_info: file size = 9.78 GiB (5.69 BPW) init_tokenizer: initializing tokenizer for type 2 load: control token: 151660 '<|fim_middle|>' is not marked as EOG load: control token: 151659 '<|fim_prefix|>' is not marked as EOG load: control token: 151653 '<|vision_end|>' is not marked as EOG load: control token: 151648 '<|box_start|>' is not marked as EOG load: control token: 151646 '<|object_ref_start|>' is not marked as EOG load: control token: 151649 '<|box_end|>' is not marked as EOG load: control token: 151655 '<|image_pad|>' is not marked as EOG load: control token: 151651 '<|quad_end|>' is not marked as EOG load: control token: 151647 '<|object_ref_end|>' is not marked as EOG load: control token: 151652 '<|vision_start|>' is not marked as EOG load: control token: 151654 '<|vision_pad|>' is not marked as EOG load: control token: 151656 '<|video_pad|>' is not marked as EOG load: control token: 151644 '<|im_start|>' is not marked as EOG load: control token: 151661 '<|fim_suffix|>' is not marked as EOG load: control token: 151650 '<|quad_start|>' is not marked as EOG load: special tokens cache size = 22 load: token to piece cache size = 0.9310 MB print_info: arch = qwen2 print_info: vocab_only = 0 print_info: n_ctx_train = 131072 print_info: n_embd = 5120 print_info: n_layer = 48 print_info: n_head = 40 print_info: n_head_kv = 8 print_info: n_rot = 128 print_info: n_swa = 0 print_info: n_embd_head_k = 128 print_info: n_embd_head_v = 128 print_info: n_gqa = 5 print_info: n_embd_k_gqa = 1024 print_info: n_embd_v_gqa = 1024 print_info: f_norm_eps = 0.0e+00 print_info: f_norm_rms_eps = 1.0e-05 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: n_ff = 13824 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: causal attn = 1 print_info: pooling type = 0 print_info: rope type = 2 print_info: rope scaling = linear print_info: freq_base_train = 1000000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 131072 print_info: rope_finetuned = unknown print_info: ssm_d_conv = 0 print_info: ssm_d_inner = 0 print_info: ssm_d_state = 0 print_info: ssm_dt_rank = 0 print_info: ssm_dt_b_c_rms = 0 print_info: model type = 14B print_info: model params = 14.77 B print_info: general.name = Qwen2.5 14B print_info: vocab type = BPE print_info: n_vocab = 152064 print_info: n_merges = 151387 print_info: BOS token = 151643 '<|endoftext|>' print_info: EOS token = 151645 '<|im_end|>' print_info: EOT token = 151645 '<|im_end|>' print_info: PAD token = 151643 '<|endoftext|>' print_info: LF token = 198 'Ċ' print_info: FIM PRE token = 151659 '<|fim_prefix|>' print_info: FIM SUF token = 151661 '<|fim_suffix|>' print_info: FIM MID token = 151660 '<|fim_middle|>' print_info: FIM PAD token = 151662 '<|fim_pad|>' print_info: FIM REP token = 151663 '<|repo_name|>' print_info: FIM SEP token = 151664 '<|file_sep|>' print_info: EOG token = 151643 '<|endoftext|>' print_info: EOG token = 151645 '<|im_end|>' print_info: EOG token = 151662 '<|fim_pad|>' print_info: EOG token = 151663 '<|repo_name|>' print_info: EOG token = 151664 '<|file_sep|>' print_info: max token length = 256 load_tensors: loading model tensors, this can take a while... (mmap = true) load_tensors: layer 0 assigned to device ROCm0 load_tensors: layer 1 assigned to device ROCm0 load_tensors: layer 2 assigned to device ROCm0 load_tensors: layer 3 assigned to device ROCm0 load_tensors: layer 4 assigned to device ROCm0 load_tensors: layer 5 assigned to device ROCm0 load_tensors: layer 6 assigned to device ROCm0 load_tensors: layer 7 assigned to device ROCm0 load_tensors: layer 8 assigned to device ROCm0 load_tensors: layer 9 assigned to device ROCm0 load_tensors: layer 10 assigned to device ROCm0 load_tensors: layer 11 assigned to device ROCm0 load_tensors: layer 12 assigned to device ROCm0 load_tensors: layer 13 assigned to device ROCm0 load_tensors: layer 14 assigned to device ROCm0 load_tensors: layer 15 assigned to device ROCm0 load_tensors: layer 16 assigned to device ROCm0 load_tensors: layer 17 assigned to device ROCm0 load_tensors: layer 18 assigned to device ROCm0 load_tensors: layer 19 assigned to device ROCm0 load_tensors: layer 20 assigned to device ROCm0 load_tensors: layer 21 assigned to device ROCm0 load_tensors: layer 22 assigned to device ROCm0 load_tensors: layer 23 assigned to device ROCm0 load_tensors: layer 24 assigned to device ROCm0 load_tensors: layer 25 assigned to device ROCm0 load_tensors: layer 26 assigned to device ROCm0 load_tensors: layer 27 assigned to device ROCm0 load_tensors: layer 28 assigned to device ROCm0 load_tensors: layer 29 assigned to device ROCm0 load_tensors: layer 30 assigned to device ROCm0 load_tensors: layer 31 assigned to device ROCm0 load_tensors: layer 32 assigned to device ROCm0 load_tensors: layer 33 assigned to device ROCm0 load_tensors: layer 34 assigned to device ROCm0 load_tensors: layer 35 assigned to device ROCm0 load_tensors: layer 36 assigned to device ROCm0 load_tensors: layer 37 assigned to device ROCm0 load_tensors: layer 38 assigned to device ROCm0 load_tensors: layer 39 assigned to device ROCm0 load_tensors: layer 40 assigned to device ROCm0 load_tensors: layer 41 assigned to device ROCm0 load_tensors: layer 42 assigned to device ROCm0 load_tensors: layer 43 assigned to device ROCm0 load_tensors: layer 44 assigned to device ROCm0 load_tensors: layer 45 assigned to device ROCm0 load_tensors: layer 46 assigned to device ROCm0 load_tensors: layer 47 assigned to device ROCm0 load_tensors: layer 48 assigned to device ROCm0 load_tensors: tensor 'token_embd.weight' (q5_K) (and 0 others) cannot be used with preferred buffer type ROCm_Host, using CPU instead load_tensors: offloading 48 repeating layers to GPU load_tensors: offloading output layer to GPU load_tensors: offloaded 49/49 layers to GPU load_tensors: CPU_Mapped model buffer size = 510.47 MiB load_tensors: ROCm0 model buffer size = 9505.88 MiB time=2025-02-28T12:20:39.550+01:00 level=DEBUG source=server.go:602 msg="model load progress 0.05" time=2025-02-28T12:20:40.052+01:00 level=DEBUG source=server.go:602 msg="model load progress 0.05" time=2025-02-28T12:20:40.303+01:00 level=DEBUG source=server.go:602 msg="model load progress 0.13" time=2025-02-28T12:20:40.554+01:00 level=DEBUG source=server.go:602 msg="model load progress 0.21" time=2025-02-28T12:20:40.805+01:00 level=DEBUG source=server.go:602 msg="model load progress 0.30" time=2025-02-28T12:20:41.056+01:00 level=DEBUG source=server.go:602 msg="model load progress 0.37" time=2025-02-28T12:20:41.307+01:00 level=DEBUG source=server.go:602 msg="model load progress 0.44" time=2025-02-28T12:20:41.558+01:00 level=DEBUG source=server.go:602 msg="model load progress 0.51" time=2025-02-28T12:20:41.809+01:00 level=DEBUG source=server.go:602 msg="model load progress 0.60" time=2025-02-28T12:20:42.060+01:00 level=DEBUG source=server.go:602 msg="model load progress 0.69" time=2025-02-28T12:20:42.311+01:00 level=DEBUG source=server.go:602 msg="model load progress 0.79" time=2025-02-28T12:20:42.562+01:00 level=DEBUG source=server.go:602 msg="model load progress 0.88" time=2025-02-28T12:20:42.813+01:00 level=DEBUG source=server.go:602 msg="model load progress 0.98" time=2025-02-28T12:20:43.064+01:00 level=DEBUG source=server.go:602 msg="model load progress 1.00" llama_init_from_model: n_seq_max = 2 llama_init_from_model: n_ctx = 20224 llama_init_from_model: n_ctx_per_seq = 10112 llama_init_from_model: n_batch = 1024 llama_init_from_model: n_ubatch = 512 llama_init_from_model: flash_attn = 1 llama_init_from_model: freq_base = 1000000.0 llama_init_from_model: freq_scale = 1 llama_init_from_model: n_ctx_per_seq (10112) < n_ctx_train (131072) -- the full capacity of the model will not be utilized llama_kv_cache_init: kv_size = 20224, offload = 1, type_k = 'q8_0', type_v = 'q8_0', n_layer = 48, can_shift = 1 llama_kv_cache_init: layer 0: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 1: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 2: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 3: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 4: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 5: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 6: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 7: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 8: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 9: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 10: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 11: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 12: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 13: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 14: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 15: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 16: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 17: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 18: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 19: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 20: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 21: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 22: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 23: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 24: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 25: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 26: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 27: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 28: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 29: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 30: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 31: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 32: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 33: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 34: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 35: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 36: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 37: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 38: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 39: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 40: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 41: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 42: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 43: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 44: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 45: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 46: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 llama_kv_cache_init: layer 47: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024 time=2025-02-28T12:20:43.315+01:00 level=DEBUG source=server.go:602 msg="model load progress 1.00" llama_kv_cache_init: ROCm0 KV buffer size = 2014.50 MiB llama_init_from_model: KV self size = 2014.50 MiB, K (q8_0): 1007.25 MiB, V (q8_0): 1007.25 MiB llama_init_from_model: ROCm_Host output buffer size = 1.20 MiB llama_init_from_model: ROCm0 compute buffer size = 317.00 MiB llama_init_from_model: ROCm_Host compute buffer size = 101.22 MiB llama_init_from_model: graph nodes = 1495 llama_init_from_model: graph splits = 98 time=2025-02-28T12:20:43.566+01:00 level=INFO source=server.go:596 msg="llama runner started in 26.61 seconds" time=2025-02-28T12:20:43.566+01:00 level=DEBUG source=sched.go:463 msg="finished setting up runner" model=/ollama/blobs/sha256-bbfb685133c274407d565c65b1ca806eb1593482b1c9d8524596797b24123862 [GIN] 2025/02/28 - 12:20:43 | 200 | 27.089327333s | 127.0.0.1 | POST "/api/generate" time=2025-02-28T12:20:43.566+01:00 level=DEBUG source=sched.go:467 msg="context for request finished" time=2025-02-28T12:20:43.566+01:00 level=DEBUG source=sched.go:340 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/ollama/blobs/sha256-bbfb685133c274407d565c65b1ca806eb1593482b1c9d8524596797b24123862 duration=5m0s time=2025-02-28T12:20:43.566+01:00 level=DEBUG source=sched.go:358 msg="after processing request finished event" modelPath=/ollama/blobs/sha256-bbfb685133c274407d565c65b1ca806eb1593482b1c9d8524596797b24123862 refCount=0 time=2025-02-28T12:20:50.971+01:00 level=DEBUG source=sched.go:576 msg="evaluating already loaded" model=/ollama/blobs/sha256-bbfb685133c274407d565c65b1ca806eb1593482b1c9d8524596797b24123862 time=2025-02-28T12:20:50.972+01:00 level=DEBUG source=server.go:968 msg="new runner detected, loading model for cgo tokenization" llama_model_loader: loaded meta data with 43 key-value pairs and 579 tensors from /ollama/blobs/sha256-bbfb685133c274407d565c65b1ca806eb1593482b1c9d8524596797b24123862 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = qwen2 llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Qwen2.5 14B llama_model_loader: - kv 3: general.organization str = Qwen llama_model_loader: - kv 4: general.basename str = Qwen2.5 llama_model_loader: - kv 5: general.size_label str = 14B llama_model_loader: - kv 6: general.base_model.count u32 = 3 llama_model_loader: - kv 7: general.base_model.0.name str = Qwamma 14b Merge v1 llama_model_loader: - kv 8: general.base_model.0.version str = v1 llama_model_loader: - kv 9: general.base_model.0.organization str = Chargoddard llama_model_loader: - kv 10: general.base_model.0.repo_url str = https://huggingface.co/chargoddard/qw... llama_model_loader: - kv 11: general.base_model.1.name str = Qwen2.5 14B Instruct_arcee Qwen2 14B ... llama_model_loader: - kv 12: general.base_model.1.version str = v0.2 llama_model_loader: - kv 13: general.base_model.1.organization str = Arcee Train llama_model_loader: - kv 14: general.base_model.1.repo_url str = https://huggingface.co/arcee-train/Qw... llama_model_loader: - kv 15: general.base_model.2.name str = Qwen2.5 14B llama_model_loader: - kv 16: general.base_model.2.organization str = Qwen llama_model_loader: - kv 17: general.base_model.2.repo_url str = https://huggingface.co/Qwen/Qwen2.5-14B llama_model_loader: - kv 18: general.tags arr[str,2] = ["mergekit", "merge"] llama_model_loader: - kv 19: qwen2.block_count u32 = 48 llama_model_loader: - kv 20: qwen2.context_length u32 = 131072 llama_model_loader: - kv 21: qwen2.embedding_length u32 = 5120 llama_model_loader: - kv 22: qwen2.feed_forward_length u32 = 13824 llama_model_loader: - kv 23: qwen2.attention.head_count u32 = 40 llama_model_loader: - kv 24: qwen2.attention.head_count_kv u32 = 8 llama_model_loader: - kv 25: qwen2.rope.freq_base f32 = 1000000.000000 llama_model_loader: - kv 26: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 27: general.file_type u32 = 17 llama_model_loader: - kv 28: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 29: tokenizer.ggml.pre str = qwen2 llama_model_loader: - kv 30: tokenizer.ggml.tokens arr[str,152064] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 31: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 32: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 33: tokenizer.ggml.eos_token_id u32 = 151645 llama_model_loader: - kv 34: tokenizer.ggml.padding_token_id u32 = 151643 llama_model_loader: - kv 35: tokenizer.ggml.bos_token_id u32 = 151643 llama_model_loader: - kv 36: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 37: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... llama_model_loader: - kv 38: general.quantization_version u32 = 2 llama_model_loader: - kv 39: quantize.imatrix.file str = /models_out/SuperNova-14B-GGUF/SuperN... llama_model_loader: - kv 40: quantize.imatrix.dataset str = /training_dir/calibration_datav3.txt llama_model_loader: - kv 41: quantize.imatrix.entries_count i32 = 336 llama_model_loader: - kv 42: quantize.imatrix.chunks_count i32 = 128 llama_model_loader: - type f32: 241 tensors llama_model_loader: - type q5_K: 289 tensors llama_model_loader: - type q6_K: 49 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q5_K - Medium print_info: file size = 9.78 GiB (5.69 BPW) init_tokenizer: initializing tokenizer for type 2 load: control token: 151660 '<|fim_middle|>' is not marked as EOG load: control token: 151659 '<|fim_prefix|>' is not marked as EOG load: control token: 151653 '<|vision_end|>' is not marked as EOG load: control token: 151648 '<|box_start|>' is not marked as EOG load: control token: 151646 '<|object_ref_start|>' is not marked as EOG load: control token: 151649 '<|box_end|>' is not marked as EOG load: control token: 151655 '<|image_pad|>' is not marked as EOG load: control token: 151651 '<|quad_end|>' is not marked as EOG load: control token: 151647 '<|object_ref_end|>' is not marked as EOG load: control token: 151652 '<|vision_start|>' is not marked as EOG load: control token: 151654 '<|vision_pad|>' is not marked as EOG load: control token: 151656 '<|video_pad|>' is not marked as EOG load: control token: 151644 '<|im_start|>' is not marked as EOG load: control token: 151661 '<|fim_suffix|>' is not marked as EOG load: control token: 151650 '<|quad_start|>' is not marked as EOG load: special tokens cache size = 22 load: token to piece cache size = 0.9310 MB print_info: arch = qwen2 print_info: vocab_only = 1 print_info: model type = ?B print_info: model params = 14.77 B print_info: general.name = Qwen2.5 14B print_info: vocab type = BPE print_info: n_vocab = 152064 print_info: n_merges = 151387 print_info: BOS token = 151643 '<|endoftext|>' print_info: EOS token = 151645 '<|im_end|>' print_info: EOT token = 151645 '<|im_end|>' print_info: PAD token = 151643 '<|endoftext|>' print_info: LF token = 198 'Ċ' print_info: FIM PRE token = 151659 '<|fim_prefix|>' print_info: FIM SUF token = 151661 '<|fim_suffix|>' print_info: FIM MID token = 151660 '<|fim_middle|>' print_info: FIM PAD token = 151662 '<|fim_pad|>' print_info: FIM REP token = 151663 '<|repo_name|>' print_info: FIM SEP token = 151664 '<|file_sep|>' print_info: EOG token = 151643 '<|endoftext|>' print_info: EOG token = 151645 '<|im_end|>' print_info: EOG token = 151662 '<|fim_pad|>' print_info: EOG token = 151663 '<|repo_name|>' print_info: EOG token = 151664 '<|file_sep|>' print_info: max token length = 256 llama_model_load: vocab only - skipping tensors time=2025-02-28T12:20:51.260+01:00 level=DEBUG source=routes.go:1505 msg="chat request" images=0 prompt="<|im_start|>system\n\nYou are Quinn, a happy and cheerful AI assistant. You are a sunshine and rainbows kind of gal. \n\nConversation instructions that you must follow:\n - You always respond in English.\n - Do not address the user by name.\n<|im_end|>\n<|im_start|>user\nblammo<|im_end|>\n<|im_start|>assistant\n" time=2025-02-28T12:20:51.265+01:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=61 used=0 remaining=61 ``` </details> ### OS Gentoo Linux ### GPU AMD, NVidia ### CPU AMD ### Ollama version 0.5.13-rc1
GiteaMirror added the bug label 2026-05-04 12:48:59 -05:00
Author
Owner

@jmorganca commented on GitHub (Feb 28, 2025):

@ProjectMoon thanks! Possible to share the logs?

<!-- gh-comment-id:2691717030 --> @jmorganca commented on GitHub (Feb 28, 2025): @ProjectMoon thanks! Possible to [share the logs](https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues)?
Author
Owner

@jmorganca commented on GitHub (Feb 28, 2025):

Oops, you did! Thanks so much. Looking

<!-- gh-comment-id:2691718469 --> @jmorganca commented on GitHub (Feb 28, 2025): Oops, you did! Thanks so much. Looking
Author
Owner

@githubdebugger commented on GitHub (Mar 1, 2025):

Maybe these issues are related, hence adding the logs here as I am facing issue with 0.5.13-rc2-rocm and this one in docker container. The model loads in GPU, but as soon as I am sending a message, it exits:

Click to view logs

➜ ~ docker stop ollama
ollama
➜ ~ docker rm ollama
ollama
➜ ~ docker run -d --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -e OLLAMA_ORIGINS="" -e HSA_OVERRIDE_GFX_VERSION=10.3.0 -p 11434:11434 --name ollama ollama/ollama:0.5.13-rc2-rocm
993105481ed3bcd488bd82b5477a9749248437e15aee10d99de73a104a37e090
➜ ~
➜ ~
➜ ~ docker logs -f ollama
2025/03/01 07:09:11 routes.go:1215: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:10.3.0 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[ http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-03-01T07:09:11.854Z level=INFO source=images.go:432 msg="total blobs: 23"
time=2025-03-01T07:09:11.855Z level=INFO source=images.go:439 msg="total unused blobs removed: 0"
time=2025-03-01T07:09:11.856Z level=INFO source=routes.go:1281 msg="Listening on [::]:11434 (version 0.5.13-rc2)"
time=2025-03-01T07:09:11.856Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-03-01T07:09:11.859Z level=INFO source=amd_linux.go:389 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=10.3.0
time=2025-03-01T07:09:11.864Z level=INFO source=types.go:130 msg="inference compute" id=0 library=rocm variant="" compute=gfx1035 driver=6.8 name=1002:1681 total="16.0 GiB" available="16.0 GiB"
^C% ➜ ~ docker exec -it ollama ollama ls
NAME ID SIZE MODIFIED
phi4-mini:latest 78fad5d182a7 2.5 GB 7 minutes ago
deepseek-r1:1.5b a42b25d8c10a 1.1 GB 3 hours ago
qwen2.5-coder:latest 2b0496514337 4.7 GB 12 days ago
nomic-embed-text:latest 0a109f422b47 274 MB 12 days ago
deepseek-r1:latest 0a8c26691023 4.7 GB 12 days ago
bge-m3:latest 790764642607 1.2 GB 12 days ago
➜ ~ docker exec -it ollama ollama ps
NAME ID SIZE PROCESSOR UNTIL
➜ ~ docker exec -it ollama ollama ps
NAME ID SIZE PROCESSOR UNTIL
phi4-mini:latest 78fad5d182a7 4.7 GB 100% GPU 4 minutes from now
➜ ~
➜ ~ docker exec -it ollama ollama ps
NAME ID SIZE PROCESSOR UNTIL
phi4-mini:latest 78fad5d182a7 4.7 GB 100% GPU 4 minutes from now
➜ ~ docker logs -f ollama
2025/03/01 07:09:11 routes.go:1215: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:10.3.0 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[ http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-03-01T07:09:11.854Z level=INFO source=images.go:432 msg="total blobs: 23"
time=2025-03-01T07:09:11.855Z level=INFO source=images.go:439 msg="total unused blobs removed: 0"
time=2025-03-01T07:09:11.856Z level=INFO source=routes.go:1281 msg="Listening on [::]:11434 (version 0.5.13-rc2)"
time=2025-03-01T07:09:11.856Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-03-01T07:09:11.859Z level=INFO source=amd_linux.go:389 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=10.3.0
time=2025-03-01T07:09:11.864Z level=INFO source=types.go:130 msg="inference compute" id=0 library=rocm variant="" compute=gfx1035 driver=6.8 name=1002:1681 total="16.0 GiB" available="16.0 GiB"
[GIN] 2025/03/01 - 07:09:44 | 200 | 106.108µs | 127.0.0.1 | HEAD "/"
[GIN] 2025/03/01 - 07:09:44 | 200 | 1.294668ms | 127.0.0.1 | GET "/api/tags"
[GIN] 2025/03/01 - 07:09:47 | 200 | 20.921µs | 127.0.0.1 | HEAD "/"
[GIN] 2025/03/01 - 07:09:47 | 200 | 118.183µs | 127.0.0.1 | GET "/api/ps"
[GIN] 2025/03/01 - 07:09:52 | 200 | 29.288µs | 127.0.0.1 | HEAD "/"
[GIN] 2025/03/01 - 07:09:52 | 200 | 23.668722ms | 127.0.0.1 | POST "/api/show"
time=2025-03-01T07:09:52.534Z level=WARN source=ggml.go:136 msg="key not found" key=phi3.attention.key_length default=128
time=2025-03-01T07:09:52.534Z level=WARN source=ggml.go:136 msg="key not found" key=phi3.attention.value_length default=128
time=2025-03-01T07:09:52.534Z level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-3c168af1dea0a414299c7d9077e100ac763370e5a98b3c53801a958a47f0a5db gpu=0 parallel=4 available=17163341824 required="4.4 GiB"
time=2025-03-01T07:09:52.534Z level=INFO source=server.go:97 msg="system memory" total="15.4 GiB" free="12.2 GiB" free_swap="0 B"
time=2025-03-01T07:09:52.534Z level=WARN source=ggml.go:136 msg="key not found" key=phi3.attention.key_length default=128
time=2025-03-01T07:09:52.534Z level=WARN source=ggml.go:136 msg="key not found" key=phi3.attention.value_length default=128
time=2025-03-01T07:09:52.534Z level=INFO source=server.go:130 msg=offload library=rocm layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[16.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="4.4 GiB" memory.required.partial="4.4 GiB" memory.required.kv="1.0 GiB" memory.required.allocations="[4.4 GiB]" memory.weights.total="2.8 GiB" memory.weights.repeating="2.4 GiB" memory.weights.nonrepeating="480.8 MiB" memory.graph.full="512.0 MiB" memory.graph.partial="512.0 MiB"
time=2025-03-01T07:09:52.535Z level=INFO source=server.go:380 msg="starting llama server" cmd="/usr/bin/ollama runner --model /root/.ollama/models/blobs/sha256-3c168af1dea0a414299c7d9077e100ac763370e5a98b3c53801a958a47f0a5db --ctx-size 8192 --batch-size 512 --n-gpu-layers 33 --threads 8 --parallel 4 --port 44845"
time=2025-03-01T07:09:52.535Z level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-03-01T07:09:52.535Z level=INFO source=server.go:557 msg="waiting for llama runner to start responding"
time=2025-03-01T07:09:52.536Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error"
time=2025-03-01T07:09:52.552Z level=INFO source=runner.go:931 msg="starting go runner"
/opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, gfx1030 (0x1030), VMM: yes, Wave Size: 32
load_backend: loaded ROCm backend from /usr/lib/ollama/rocm/libggml-hip.so
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so
time=2025-03-01T07:09:54.075Z level=INFO source=runner.go:934 msg=system info="CPU : LLAMAFILE = 1 | ROCm : PEER_MAX_BATCH_SIZE = 128 | CPU : LLAMAFILE = 1 | cgo(gcc)" threads=8
llama_model_load_from_file_impl: using device ROCm0 (AMD Radeon Graphics) - 7860 MiB free
time=2025-03-01T07:09:54.076Z level=INFO source=runner.go:992 msg="Server listening on 127.0.0.1:44845"
llama_model_loader: loaded meta data with 36 key-value pairs and 196 tensors from /root/.ollama/models/blobs/sha256-3c168af1dea0a414299c7d9077e100ac763370e5a98b3c53801a958a47f0a5db (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = phi3
llama_model_loader: - kv 1: phi3.rope.scaling.attn_factor f32 = 1.190238
llama_model_loader: - kv 2: general.type str = model
llama_model_loader: - kv 3: general.name str = Phi 4 Mini Instruct
llama_model_loader: - kv 4: general.finetune str = instruct
llama_model_loader: - kv 5: general.basename str = Phi-4
llama_model_loader: - kv 6: general.size_label str = mini
llama_model_loader: - kv 7: general.license str = mit
llama_model_loader: - kv 8: general.license.link str = https://huggingface.co/microsoft/Phi-...
llama_model_loader: - kv 9: general.tags arr[str,3] = ["nlp", "code", "text-generation"]
llama_model_loader: - kv 10: general.languages arr[str,24] = ["multilingual", "ar", "zh", "cs", "d...
llama_model_loader: - kv 11: phi3.context_length u32 = 131072
llama_model_loader: - kv 12: phi3.rope.scaling.original_context_length u32 = 4096
llama_model_loader: - kv 13: phi3.embedding_length u32 = 3072
llama_model_loader: - kv 14: phi3.feed_forward_length u32 = 8192
llama_model_loader: - kv 15: phi3.block_count u32 = 32
llama_model_loader: - kv 16: phi3.attention.head_count u32 = 24
llama_model_loader: - kv 17: phi3.attention.head_count_kv u32 = 8
llama_model_loader: - kv 18: phi3.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 19: phi3.rope.dimension_count u32 = 96
llama_model_loader: - kv 20: phi3.rope.freq_base f32 = 10000.000000
llama_model_loader: - kv 21: phi3.attention.sliding_window u32 = 262144
llama_model_loader: - kv 22: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 23: tokenizer.ggml.pre str = gpt-4o
llama_model_loader: - kv 24: tokenizer.ggml.tokens arr[str,200064] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 25: tokenizer.ggml.token_type arr[i32,200064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 26: tokenizer.ggml.merges arr[str,199742] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "e r", ...
llama_model_loader: - kv 27: tokenizer.ggml.bos_token_id u32 = 199999
llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 199999
llama_model_loader: - kv 29: tokenizer.ggml.unknown_token_id u32 = 199999
llama_model_loader: - kv 30: tokenizer.ggml.padding_token_id u32 = 199999
llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false
llama_model_loader: - kv 32: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 33: tokenizer.chat_template str = {% for message in messages %}{% if me...
llama_model_loader: - kv 34: general.quantization_version u32 = 2
llama_model_loader: - kv 35: general.file_type u32 = 15
llama_model_loader: - type f32: 67 tensors
llama_model_loader: - type q4_K: 80 tensors
llama_model_loader: - type q5_K: 32 tensors
llama_model_loader: - type q6_K: 17 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q4_K - Medium
print_info: file size = 2.31 GiB (5.18 BPW)
load: special tokens cache size = 12
time=2025-03-01T07:09:54.293Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model"
load: token to piece cache size = 1.3333 MB
print_info: arch = phi3
print_info: vocab_only = 0
print_info: n_ctx_train = 131072
print_info: n_embd = 3072
print_info: n_layer = 32
print_info: n_head = 24
print_info: n_head_kv = 8
print_info: n_rot = 96
print_info: n_swa = 262144
print_info: n_embd_head_k = 128
print_info: n_embd_head_v = 128
print_info: n_gqa = 3
print_info: n_embd_k_gqa = 1024
print_info: n_embd_v_gqa = 1024
print_info: f_norm_eps = 0.0e+00
print_info: f_norm_rms_eps = 1.0e-05
print_info: f_clamp_kqv = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale = 0.0e+00
print_info: n_ff = 8192
print_info: n_expert = 0
print_info: n_expert_used = 0
print_info: causal attn = 1
print_info: pooling type = 0
print_info: rope type = 2
print_info: rope scaling = linear
print_info: freq_base_train = 10000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn = 4096
print_info: rope_finetuned = unknown
print_info: ssm_d_conv = 0
print_info: ssm_d_inner = 0
print_info: ssm_d_state = 0
print_info: ssm_dt_rank = 0
print_info: ssm_dt_b_c_rms = 0
print_info: model type = 3B
print_info: model params = 3.84 B
print_info: general.name = Phi 4 Mini Instruct
print_info: vocab type = BPE
print_info: n_vocab = 200064
print_info: n_merges = 199742
print_info: BOS token = 199999 '<|endoftext|>'
print_info: EOS token = 199999 '<|endoftext|>'
print_info: EOT token = 199999 '<|endoftext|>'
print_info: UNK token = 199999 '<|endoftext|>'
print_info: PAD token = 199999 '<|endoftext|>'
print_info: LF token = 198 'Ċ'
print_info: EOG token = 199999 '<|endoftext|>'
print_info: EOG token = 200020 '<|end|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: offloading 32 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 33/33 layers to GPU
load_tensors: CPU_Mapped model buffer size = 480.81 MiB
load_tensors: ROCm0 model buffer size = 2368.57 MiB
llama_init_from_model: n_seq_max = 4
llama_init_from_model: n_ctx = 8192
llama_init_from_model: n_ctx_per_seq = 2048
llama_init_from_model: n_batch = 2048
llama_init_from_model: n_ubatch = 512
llama_init_from_model: flash_attn = 0
llama_init_from_model: freq_base = 10000.0
llama_init_from_model: freq_scale = 1
llama_init_from_model: n_ctx_per_seq (2048) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
llama_kv_cache_init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 32, can_shift = 1
llama_kv_cache_init: ROCm0 KV buffer size = 1024.00 MiB
llama_init_from_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB
llama_init_from_model: ROCm_Host output buffer size = 3.10 MiB
llama_init_from_model: ROCm0 compute buffer size = 428.00 MiB
llama_init_from_model: ROCm_Host compute buffer size = 22.01 MiB
llama_init_from_model: graph nodes = 1286
llama_init_from_model: graph splits = 2
time=2025-03-01T07:09:55.298Z level=INFO source=server.go:596 msg="llama runner started in 2.76 seconds"
[GIN] 2025/03/01 - 07:09:55 | 200 | 2.832161663s | 127.0.0.1 | POST "/api/generate"
[GIN] 2025/03/01 - 07:09:58 | 200 | 25.49µs | 127.0.0.1 | HEAD "/"
[GIN] 2025/03/01 - 07:09:58 | 200 | 26.983µs | 127.0.0.1 | GET "/api/ps"
[GIN] 2025/03/01 - 07:10:00 | 200 | 34.017µs | 127.0.0.1 | HEAD "/"
[GIN] 2025/03/01 - 07:10:00 | 200 | 29.648µs | 127.0.0.1 | GET "/api/ps"
//ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:449: HipVMM Failure: out of memory

Memory critical error by agent node-0 (Agent handle: 0x63996d6c1680) on address 0x70d978500000. Reason: Memory in use.
SIGABRT: abort
PC=0x70d9e8efe00b m=0 sigcode=18446744073709551610
signal arrived during cgo execution

goroutine 16 gp=0xc000504a80 m=0 mp=0x6399654f96c0 [syscall]:
runtime.cgocall(0x639964647600, 0xc000093bc8)
runtime/cgocall.go:167 +0x4b fp=0xc000093ba0 sp=0xc000093b68 pc=0x6399639f76ab
github.com/ollama/ollama/llama._Cfunc_llama_decode(0x70d7cc836840, {0x4, 0x70d7cc857970, 0x0, 0x0, 0x70d7cc8610d0, 0x70d7cc859e30, 0x70d7cc78bbc0, 0x70d97c6252d0})
_cgo_gotypes.go:557 +0x4a fp=0xc000093bc8 sp=0xc000093ba0 pc=0x639963d7d46a
github.com/ollama/ollama/llama.(*Context).Decode.func1(...)
github.com/ollama/ollama/llama/llama.go:157
github.com/ollama/ollama/llama.(*Context).Decode(0xc00011c5d0?, 0x0?)
github.com/ollama/ollama/llama/llama.go:157 +0xf6 fp=0xc000093cc8 sp=0xc000093bc8 pc=0x639963d80076
github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0xc0004ba000, 0xc0004264e0, 0xc00011c720)
github.com/ollama/ollama/runner/llamarunner/runner.go:435 +0x23e fp=0xc000093ee0 sp=0xc000093cc8 pc=0x639963d990be
github.com/ollama/ollama/runner/llamarunner.(*Server).run(0xc0004ba000, {0x639964ca7e60, 0xc0001280a0})
github.com/ollama/ollama/runner/llamarunner/runner.go:343 +0x1d5 fp=0xc000093fb8 sp=0xc000093ee0 pc=0x639963d98d15
github.com/ollama/ollama/runner/llamarunner.Execute.gowrap2()
github.com/ollama/ollama/runner/llamarunner/runner.go:973 +0x28 fp=0xc000093fe0 sp=0xc000093fb8 pc=0x639963d9d6a8
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000093fe8 sp=0xc000093fe0 pc=0x639963a020c1
created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
github.com/ollama/ollama/runner/llamarunner/runner.go:973 +0xd97

goroutine 1 gp=0xc000002380 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0005875b8 sp=0xc000587598 pc=0x6399639fa98e
runtime.netpollblock(0xc000587608?, 0x639942c6?, 0x99?)
runtime/netpoll.go:575 +0xf7 fp=0xc0005875f0 sp=0xc0005875b8 pc=0x6399639bf797
internal/poll.runtime_pollWait(0x70d9a23c7eb0, 0x72)
runtime/netpoll.go:351 +0x85 fp=0xc000587610 sp=0xc0005875f0 pc=0x6399639f9ba5
internal/poll.(*pollDesc).wait(0xc00004e100?, 0x900000036?, 0x0)
internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000587638 sp=0xc000587610 pc=0x639963a81027
internal/poll.(*pollDesc).waitRead(...)
internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc00004e100)
internal/poll/fd_unix.go:620 +0x295 fp=0xc0005876e0 sp=0xc000587638 pc=0x639963a863f5
net.(*netFD).accept(0xc00004e100)
net/fd_unix.go:172 +0x29 fp=0xc000587798 sp=0xc0005876e0 pc=0x639963af8869
net.(*TCPListener).accept(0xc00071a080)
net/tcpsock_posix.go:159 +0x1b fp=0xc0005877e8 sp=0xc000587798 pc=0x639963b0e21b
net.(*TCPListener).Accept(0xc00071a080)
net/tcpsock.go:380 +0x30 fp=0xc000587818 sp=0xc0005877e8 pc=0x639963b0d0d0
net/http.(*onceCloseListener).Accept(0xc0004ba120?)
:1 +0x24 fp=0xc000587830 sp=0xc000587818 pc=0x639963d23f84
net/http.(*Server).Serve(0xc000534200, {0x639964ca5be8, 0xc00071a080})
net/http/server.go:3424 +0x30c fp=0xc000587960 sp=0xc000587830 pc=0x639963cfb84c
github.com/ollama/ollama/runner/llamarunner.Execute({0xc000034120, 0xe, 0xe})
github.com/ollama/ollama/runner/llamarunner/runner.go:993 +0x116a fp=0xc000587d08 sp=0xc000587960 pc=0x639963d9d3ea
github.com/ollama/ollama/runner.Execute({0xc000034110?, 0x0?, 0x0?})
github.com/ollama/ollama/runner/runner.go:22 +0xd4 fp=0xc000587d30 sp=0xc000587d08 pc=0x639963fc6514
github.com/ollama/ollama/cmd.NewCLI.func2(0xc000035300?, {0x639964825055?, 0x4?, 0x639964825059?})
github.com/ollama/ollama/cmd/cmd.go:1280 +0x45 fp=0xc000587d58 sp=0xc000587d30 pc=0x6399645daae5
github.com/spf13/cobra.(*Command).execute(0xc00013ef08, {0xc0005149a0, 0xe, 0xe})
github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc000587e78 sp=0xc000587d58 pc=0x639963b71afc
github.com/spf13/cobra.(*Command).ExecuteC(0xc00054e908)
github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc000587f30 sp=0xc000587e78 pc=0x639963b72345
github.com/spf13/cobra.(*Command).Execute(...)
github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
github.com/ollama/ollama/main.go:12 +0x4d fp=0xc000587f50 sp=0xc000587f30 pc=0x6399645dae4d
runtime.main()
runtime/proc.go:283 +0x29d fp=0xc000587fe0 sp=0xc000587f50 pc=0x6399639c6d9d
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000587fe8 sp=0xc000587fe0 pc=0x639963a020c1

goroutine 2 gp=0xc000002e00 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000084fa8 sp=0xc000084f88 pc=0x6399639fa98e
runtime.goparkunlock(...)
runtime/proc.go:441
runtime.forcegchelper()
runtime/proc.go:348 +0xb8 fp=0xc000084fe0 sp=0xc000084fa8 pc=0x6399639c70d8
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000084fe8 sp=0xc000084fe0 pc=0x639963a020c1
created by runtime.init.7 in goroutine 1
runtime/proc.go:336 +0x1a

goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000085780 sp=0xc000085760 pc=0x6399639fa98e
runtime.goparkunlock(...)
runtime/proc.go:441
runtime.bgsweep(0xc00003e080)
runtime/mgcsweep.go:316 +0xdf fp=0xc0000857c8 sp=0xc000085780 pc=0x6399639b18ff
runtime.gcenable.gowrap1()
runtime/mgc.go:204 +0x25 fp=0xc0000857e0 sp=0xc0000857c8 pc=0x6399639a5ce5
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000857e8 sp=0xc0000857e0 pc=0x639963a020c1
created by runtime.gcenable in goroutine 1
runtime/mgc.go:204 +0x66

goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x6399649d76b8?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000085f78 sp=0xc000085f58 pc=0x6399639fa98e
runtime.goparkunlock(...)
runtime/proc.go:441
runtime.(*scavengerState).park(0x6399654f68a0)
runtime/mgcscavenge.go:425 +0x49 fp=0xc000085fa8 sp=0xc000085f78 pc=0x6399639af349
runtime.bgscavenge(0xc00003e080)
runtime/mgcscavenge.go:658 +0x59 fp=0xc000085fc8 sp=0xc000085fa8 pc=0x6399639af8d9
runtime.gcenable.gowrap2()
runtime/mgc.go:205 +0x25 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x6399639a5c85
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x639963a020c1
created by runtime.gcenable in goroutine 1
runtime/mgc.go:205 +0xa5

goroutine 5 gp=0xc000003dc0 m=nil [finalizer wait]:
runtime.gopark(0x1b8?, 0xc000002380?, 0x1?, 0x23?, 0xc000084688?)
runtime/proc.go:435 +0xce fp=0xc000084630 sp=0xc000084610 pc=0x6399639fa98e
runtime.runfinq()
runtime/mfinal.go:196 +0x107 fp=0xc0000847e0 sp=0xc000084630 pc=0x6399639a4ca7
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000847e8 sp=0xc0000847e0 pc=0x639963a020c1
created by runtime.createfing in goroutine 1
runtime/mfinal.go:166 +0x3d

goroutine 6 gp=0xc0001e08c0 m=nil [chan receive]:
runtime.gopark(0xc0000ff900?, 0xc000588018?, 0x60?, 0x67?, 0x639963adf5a8?)
runtime/proc.go:435 +0xce fp=0xc000086718 sp=0xc0000866f8 pc=0x6399639fa98e
runtime.chanrecv(0xc0000b6380, 0x0, 0x1)
runtime/chan.go:664 +0x445 fp=0xc000086790 sp=0xc000086718 pc=0x639963996ea5
runtime.chanrecv1(0x0?, 0x0?)
runtime/chan.go:506 +0x12 fp=0xc0000867b8 sp=0xc000086790 pc=0x639963996a32
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
runtime/mgc.go:1799 +0x2f fp=0xc0000867e0 sp=0xc0000867b8 pc=0x6399639a8e8f
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000867e8 sp=0xc0000867e0 pc=0x639963a020c1
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
runtime/mgc.go:1794 +0x85

goroutine 7 gp=0xc0001e1180 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000086f38 sp=0xc000086f18 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc000086fc8 sp=0xc000086f38 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc000086fe0 sp=0xc000086fc8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000086fe8 sp=0xc000086fe0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 8 gp=0xc0001e1340 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000087738 sp=0xc000087718 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc0000877c8 sp=0xc000087738 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0000877e0 sp=0xc0000877c8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000877e8 sp=0xc0000877e0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 18 gp=0xc000504000 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000080738 sp=0xc000080718 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc0000807c8 sp=0xc000080738 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0000807e0 sp=0xc0000807c8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000807e8 sp=0xc0000807e0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 34 gp=0xc000102380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc00011a738 sp=0xc00011a718 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc00011a7c8 sp=0xc00011a738 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc00011a7e0 sp=0xc00011a7c8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00011a7e8 sp=0xc00011a7e0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 9 gp=0xc0001e1500 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000087f38 sp=0xc000087f18 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc000087fc8 sp=0xc000087f38 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc000087fe0 sp=0xc000087fc8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000087fe8 sp=0xc000087fe0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 10 gp=0xc0001e16c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000116738 sp=0xc000116718 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc0001167c8 sp=0xc000116738 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0001167e0 sp=0xc0001167c8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0001167e8 sp=0xc0001167e0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 11 gp=0xc0001e1880 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000116f38 sp=0xc000116f18 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc000116fc8 sp=0xc000116f38 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc000116fe0 sp=0xc000116fc8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000116fe8 sp=0xc000116fe0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 19 gp=0xc0005041c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000080f38 sp=0xc000080f18 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc000080fc8 sp=0xc000080f38 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc000080fe0 sp=0xc000080fc8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000080fe8 sp=0xc000080fe0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 20 gp=0xc000504380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000081738 sp=0xc000081718 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc0000817c8 sp=0xc000081738 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0000817e0 sp=0xc0000817c8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000817e8 sp=0xc0000817e0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 35 gp=0xc000102540 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc00011af38 sp=0xc00011af18 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc00011afc8 sp=0xc00011af38 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc00011afe0 sp=0xc00011afc8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00011afe8 sp=0xc00011afe0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 12 gp=0xc0001e1a40 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000117738 sp=0xc000117718 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc0001177c8 sp=0xc000117738 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0001177e0 sp=0xc0001177c8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0001177e8 sp=0xc0001177e0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 21 gp=0xc000504540 m=nil [GC worker (idle)]:
runtime.gopark(0x21a0c049ba1f?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000081f38 sp=0xc000081f18 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc000081fc8 sp=0xc000081f38 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc000081fe0 sp=0xc000081fc8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000081fe8 sp=0xc000081fe0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 13 gp=0xc0001e1c00 m=nil [GC worker (idle)]:
runtime.gopark(0x21a0c049ac80?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000117f38 sp=0xc000117f18 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc000117fc8 sp=0xc000117f38 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc000117fe0 sp=0xc000117fc8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000117fe8 sp=0xc000117fe0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 22 gp=0xc000504700 m=nil [GC worker (idle)]:
runtime.gopark(0x21a0c049b309?, 0x3?, 0xe9?, 0x5?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000082738 sp=0xc000082718 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc0000827c8 sp=0xc000082738 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0000827e0 sp=0xc0000827c8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000827e8 sp=0xc0000827e0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 14 gp=0xc0001e1dc0 m=nil [GC worker (idle)]:
runtime.gopark(0x21a0c049b3b3?, 0x3?, 0x29?, 0x9?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000118738 sp=0xc000118718 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc0001187c8 sp=0xc000118738 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0001187e0 sp=0xc0001187c8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0001187e8 sp=0xc0001187e0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 36 gp=0xc000102700 m=nil [GC worker (idle)]:
runtime.gopark(0x21a0c049bbce?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc00011b738 sp=0xc00011b718 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc00011b7c8 sp=0xc00011b738 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc00011b7e0 sp=0xc00011b7c8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00011b7e8 sp=0xc00011b7e0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 50 gp=0xc000504c40 m=nil [select]:
runtime.gopark(0xc000047a58?, 0x2?, 0x40?, 0x68?, 0xc000047834?)
runtime/proc.go:435 +0xce fp=0xc000047648 sp=0xc000047628 pc=0x6399639fa98e
runtime.selectgo(0xc000047a58, 0xc000047830, 0x4?, 0x0, 0x1?, 0x1)
runtime/select.go:351 +0x837 fp=0xc000047780 sp=0xc000047648 pc=0x6399639d9297
github.com/ollama/ollama/runner/llamarunner.(*Server).completion(0xc0004ba000, {0x639964ca5dc8, 0xc000514d20}, 0xc000154280)
github.com/ollama/ollama/runner/llamarunner/runner.go:688 +0xa25 fp=0xc000047ac0 sp=0xc000047780 pc=0x639963d9aac5
github.com/ollama/ollama/runner/llamarunner.(*Server).completion-fm({0x639964ca5dc8?, 0xc000514d20?}, 0xc0004e3b40?)
:1 +0x36 fp=0xc000047af0 sp=0xc000047ac0 pc=0x639963d9dad6
net/http.HandlerFunc.ServeHTTP(0xc0000ea240?, {0x639964ca5dc8?, 0xc000514d20?}, 0xc0004e3b60?)
net/http/server.go:2294 +0x29 fp=0xc000047b18 sp=0xc000047af0 pc=0x639963cf7e89
net/http.(*ServeMux).ServeHTTP(0x63996399f1c5?, {0x639964ca5dc8, 0xc000514d20}, 0xc000154280)
net/http/server.go:2822 +0x1c4 fp=0xc000047b68 sp=0xc000047b18 pc=0x639963cf9d84
net/http.serverHandler.ServeHTTP({0x639964ca2370?}, {0x639964ca5dc8?, 0xc000514d20?}, 0x1?)
net/http/server.go:3301 +0x8e fp=0xc000047b98 sp=0xc000047b68 pc=0x639963d1780e
net/http.(*conn).serve(0xc0004ba120, {0x639964ca7e28, 0xc00071c3f0})
net/http/server.go:2102 +0x625 fp=0xc000047fb8 sp=0xc000047b98 pc=0x639963cf6385
net/http.(*Server).Serve.gowrap3()
net/http/server.go:3454 +0x28 fp=0xc000047fe0 sp=0xc000047fb8 pc=0x639963cfbc48
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000047fe8 sp=0xc000047fe0 pc=0x639963a020c1
created by net/http.(*Server).Serve in goroutine 1
net/http/server.go:3454 +0x485

goroutine 39 gp=0xc000102a80 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0xb?)
runtime/proc.go:435 +0xce fp=0xc0002445d8 sp=0xc0002445b8 pc=0x6399639fa98e
runtime.netpollblock(0x639963a1de18?, 0x639942c6?, 0x99?)
runtime/netpoll.go:575 +0xf7 fp=0xc000244610 sp=0xc0002445d8 pc=0x6399639bf797
internal/poll.runtime_pollWait(0x70d9a23c7d98, 0x72)
runtime/netpoll.go:351 +0x85 fp=0xc000244630 sp=0xc000244610 pc=0x6399639f9ba5
internal/poll.(*pollDesc).wait(0xc00004eb80?, 0xc00071c521?, 0x0)
internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000244658 sp=0xc000244630 pc=0x639963a81027
internal/poll.(*pollDesc).waitRead(...)
internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc00004eb80, {0xc00071c521, 0x1, 0x1})
internal/poll/fd_unix.go:165 +0x27a fp=0xc0002446f0 sp=0xc000244658 pc=0x639963a8231a
net.(*netFD).Read(0xc00004eb80, {0xc00071c521?, 0xc00071a158?, 0xc000244770?})
net/fd_posix.go:55 +0x25 fp=0xc000244738 sp=0xc0002446f0 pc=0x639963af68c5
net.(*conn).Read(0xc00051e078, {0xc00071c521?, 0x0?, 0x0?})
net/net.go:194 +0x45 fp=0xc000244780 sp=0xc000244738 pc=0x639963b04c85
net/http.(*connReader).backgroundRead(0xc00071c510)
net/http/server.go:690 +0x37 fp=0xc0002447c8 sp=0xc000244780 pc=0x639963cf0257
net/http.(*connReader).startBackgroundRead.gowrap2()
net/http/server.go:686 +0x25 fp=0xc0002447e0 sp=0xc0002447c8 pc=0x639963cf0185
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0002447e8 sp=0xc0002447e0 pc=0x639963a020c1
created by net/http.(*connReader).startBackgroundRead in goroutine 50
net/http/server.go:686 +0xb6

rax 0x0
rbx 0x70d9e8eb9fc0
rcx 0x70d9e8efe00b
rdx 0x0
rdi 0x2
rsi 0x7ffec1e71dd0
rbp 0x70d978500000
rsp 0x7ffec1e71dd0
r8 0x0
r9 0x7ffec1e71dd0
r10 0x8
r11 0x246
r12 0x7ffec1e72050
r13 0x0
r14 0x1000
r15 0x0
rip 0x70d9e8efe00b
rflags 0x246
cs 0x33
fs 0x0
gs 0x0
SIGABRT: abort
PC=0x70d9e8efe00b m=0 sigcode=18446744073709551610
signal arrived during cgo execution

goroutine 16 gp=0xc000504a80 m=0 mp=0x6399654f96c0 [syscall]:
runtime.cgocall(0x639964647600, 0xc000093bc8)
runtime/cgocall.go:167 +0x4b fp=0xc000093ba0 sp=0xc000093b68 pc=0x6399639f76ab
github.com/ollama/ollama/llama._Cfunc_llama_decode(0x70d7cc836840, {0x4, 0x70d7cc857970, 0x0, 0x0, 0x70d7cc8610d0, 0x70d7cc859e30, 0x70d7cc78bbc0, 0x70d97c6252d0})
_cgo_gotypes.go:557 +0x4a fp=0xc000093bc8 sp=0xc000093ba0 pc=0x639963d7d46a
github.com/ollama/ollama/llama.(*Context).Decode.func1(...)
github.com/ollama/ollama/llama/llama.go:157
github.com/ollama/ollama/llama.(*Context).Decode(0xc00011c5d0?, 0x0?)
github.com/ollama/ollama/llama/llama.go:157 +0xf6 fp=0xc000093cc8 sp=0xc000093bc8 pc=0x639963d80076
github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0xc0004ba000, 0xc0004264e0, 0xc00011c720)
github.com/ollama/ollama/runner/llamarunner/runner.go:435 +0x23e fp=0xc000093ee0 sp=0xc000093cc8 pc=0x639963d990be
github.com/ollama/ollama/runner/llamarunner.(*Server).run(0xc0004ba000, {0x639964ca7e60, 0xc0001280a0})
github.com/ollama/ollama/runner/llamarunner/runner.go:343 +0x1d5 fp=0xc000093fb8 sp=0xc000093ee0 pc=0x639963d98d15
github.com/ollama/ollama/runner/llamarunner.Execute.gowrap2()
github.com/ollama/ollama/runner/llamarunner/runner.go:973 +0x28 fp=0xc000093fe0 sp=0xc000093fb8 pc=0x639963d9d6a8
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000093fe8 sp=0xc000093fe0 pc=0x639963a020c1
created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
github.com/ollama/ollama/runner/llamarunner/runner.go:973 +0xd97

goroutine 1 gp=0xc000002380 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc0005875b8 sp=0xc000587598 pc=0x6399639fa98e
runtime.netpollblock(0xc000587608?, 0x639942c6?, 0x99?)
runtime/netpoll.go:575 +0xf7 fp=0xc0005875f0 sp=0xc0005875b8 pc=0x6399639bf797
internal/poll.runtime_pollWait(0x70d9a23c7eb0, 0x72)
runtime/netpoll.go:351 +0x85 fp=0xc000587610 sp=0xc0005875f0 pc=0x6399639f9ba5
internal/poll.(*pollDesc).wait(0xc00004e100?, 0x900000036?, 0x0)
internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000587638 sp=0xc000587610 pc=0x639963a81027
internal/poll.(*pollDesc).waitRead(...)
internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc00004e100)
internal/poll/fd_unix.go:620 +0x295 fp=0xc0005876e0 sp=0xc000587638 pc=0x639963a863f5
net.(*netFD).accept(0xc00004e100)
net/fd_unix.go:172 +0x29 fp=0xc000587798 sp=0xc0005876e0 pc=0x639963af8869
net.(*TCPListener).accept(0xc00071a080)
net/tcpsock_posix.go:159 +0x1b fp=0xc0005877e8 sp=0xc000587798 pc=0x639963b0e21b
net.(*TCPListener).Accept(0xc00071a080)
net/tcpsock.go:380 +0x30 fp=0xc000587818 sp=0xc0005877e8 pc=0x639963b0d0d0
net/http.(*onceCloseListener).Accept(0xc0004ba120?)
:1 +0x24 fp=0xc000587830 sp=0xc000587818 pc=0x639963d23f84
net/http.(*Server).Serve(0xc000534200, {0x639964ca5be8, 0xc00071a080})
net/http/server.go:3424 +0x30c fp=0xc000587960 sp=0xc000587830 pc=0x639963cfb84c
github.com/ollama/ollama/runner/llamarunner.Execute({0xc000034120, 0xe, 0xe})
github.com/ollama/ollama/runner/llamarunner/runner.go:993 +0x116a fp=0xc000587d08 sp=0xc000587960 pc=0x639963d9d3ea
github.com/ollama/ollama/runner.Execute({0xc000034110?, 0x0?, 0x0?})
github.com/ollama/ollama/runner/runner.go:22 +0xd4 fp=0xc000587d30 sp=0xc000587d08 pc=0x639963fc6514
github.com/ollama/ollama/cmd.NewCLI.func2(0xc000035300?, {0x639964825055?, 0x4?, 0x639964825059?})
github.com/ollama/ollama/cmd/cmd.go:1280 +0x45 fp=0xc000587d58 sp=0xc000587d30 pc=0x6399645daae5
github.com/spf13/cobra.(*Command).execute(0xc00013ef08, {0xc0005149a0, 0xe, 0xe})
github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc000587e78 sp=0xc000587d58 pc=0x639963b71afc
github.com/spf13/cobra.(*Command).ExecuteC(0xc00054e908)
github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc000587f30 sp=0xc000587e78 pc=0x639963b72345
github.com/spf13/cobra.(*Command).Execute(...)
github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
github.com/ollama/ollama/main.go:12 +0x4d fp=0xc000587f50 sp=0xc000587f30 pc=0x6399645dae4d
runtime.main()
runtime/proc.go:283 +0x29d fp=0xc000587fe0 sp=0xc000587f50 pc=0x6399639c6d9d
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000587fe8 sp=0xc000587fe0 pc=0x639963a020c1

goroutine 2 gp=0xc000002e00 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000084fa8 sp=0xc000084f88 pc=0x6399639fa98e
runtime.goparkunlock(...)
runtime/proc.go:441
runtime.forcegchelper()
runtime/proc.go:348 +0xb8 fp=0xc000084fe0 sp=0xc000084fa8 pc=0x6399639c70d8
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000084fe8 sp=0xc000084fe0 pc=0x639963a020c1
created by runtime.init.7 in goroutine 1
runtime/proc.go:336 +0x1a

goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000085780 sp=0xc000085760 pc=0x6399639fa98e
runtime.goparkunlock(...)
runtime/proc.go:441
runtime.bgsweep(0xc00003e080)
runtime/mgcsweep.go:316 +0xdf fp=0xc0000857c8 sp=0xc000085780 pc=0x6399639b18ff
runtime.gcenable.gowrap1()
runtime/mgc.go:204 +0x25 fp=0xc0000857e0 sp=0xc0000857c8 pc=0x6399639a5ce5
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000857e8 sp=0xc0000857e0 pc=0x639963a020c1
created by runtime.gcenable in goroutine 1
runtime/mgc.go:204 +0x66

goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x6399649d76b8?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000085f78 sp=0xc000085f58 pc=0x6399639fa98e
runtime.goparkunlock(...)
runtime/proc.go:441
runtime.(*scavengerState).park(0x6399654f68a0)
runtime/mgcscavenge.go:425 +0x49 fp=0xc000085fa8 sp=0xc000085f78 pc=0x6399639af349
runtime.bgscavenge(0xc00003e080)
runtime/mgcscavenge.go:658 +0x59 fp=0xc000085fc8 sp=0xc000085fa8 pc=0x6399639af8d9
runtime.gcenable.gowrap2()
runtime/mgc.go:205 +0x25 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x6399639a5c85
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x639963a020c1
created by runtime.gcenable in goroutine 1
runtime/mgc.go:205 +0xa5

goroutine 5 gp=0xc000003dc0 m=nil [finalizer wait]:
runtime.gopark(0x1b8?, 0xc000002380?, 0x1?, 0x23?, 0xc000084688?)
runtime/proc.go:435 +0xce fp=0xc000084630 sp=0xc000084610 pc=0x6399639fa98e
runtime.runfinq()
runtime/mfinal.go:196 +0x107 fp=0xc0000847e0 sp=0xc000084630 pc=0x6399639a4ca7
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000847e8 sp=0xc0000847e0 pc=0x639963a020c1
created by runtime.createfing in goroutine 1
runtime/mfinal.go:166 +0x3d

goroutine 6 gp=0xc0001e08c0 m=nil [chan receive]:
runtime.gopark(0xc0000ff900?, 0xc000588018?, 0x60?, 0x67?, 0x639963adf5a8?)
runtime/proc.go:435 +0xce fp=0xc000086718 sp=0xc0000866f8 pc=0x6399639fa98e
runtime.chanrecv(0xc0000b6380, 0x0, 0x1)
runtime/chan.go:664 +0x445 fp=0xc000086790 sp=0xc000086718 pc=0x639963996ea5
runtime.chanrecv1(0x0?, 0x0?)
runtime/chan.go:506 +0x12 fp=0xc0000867b8 sp=0xc000086790 pc=0x639963996a32
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
runtime/mgc.go:1799 +0x2f fp=0xc0000867e0 sp=0xc0000867b8 pc=0x6399639a8e8f
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000867e8 sp=0xc0000867e0 pc=0x639963a020c1
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
runtime/mgc.go:1794 +0x85

goroutine 7 gp=0xc0001e1180 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000086f38 sp=0xc000086f18 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc000086fc8 sp=0xc000086f38 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc000086fe0 sp=0xc000086fc8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000086fe8 sp=0xc000086fe0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 8 gp=0xc0001e1340 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000087738 sp=0xc000087718 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc0000877c8 sp=0xc000087738 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0000877e0 sp=0xc0000877c8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000877e8 sp=0xc0000877e0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 18 gp=0xc000504000 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000080738 sp=0xc000080718 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc0000807c8 sp=0xc000080738 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0000807e0 sp=0xc0000807c8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000807e8 sp=0xc0000807e0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 34 gp=0xc000102380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc00011a738 sp=0xc00011a718 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc00011a7c8 sp=0xc00011a738 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc00011a7e0 sp=0xc00011a7c8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00011a7e8 sp=0xc00011a7e0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 9 gp=0xc0001e1500 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000087f38 sp=0xc000087f18 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc000087fc8 sp=0xc000087f38 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc000087fe0 sp=0xc000087fc8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000087fe8 sp=0xc000087fe0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 10 gp=0xc0001e16c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000116738 sp=0xc000116718 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc0001167c8 sp=0xc000116738 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0001167e0 sp=0xc0001167c8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0001167e8 sp=0xc0001167e0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 11 gp=0xc0001e1880 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000116f38 sp=0xc000116f18 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc000116fc8 sp=0xc000116f38 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc000116fe0 sp=0xc000116fc8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000116fe8 sp=0xc000116fe0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 19 gp=0xc0005041c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000080f38 sp=0xc000080f18 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc000080fc8 sp=0xc000080f38 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc000080fe0 sp=0xc000080fc8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000080fe8 sp=0xc000080fe0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 20 gp=0xc000504380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000081738 sp=0xc000081718 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc0000817c8 sp=0xc000081738 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0000817e0 sp=0xc0000817c8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000817e8 sp=0xc0000817e0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 35 gp=0xc000102540 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc00011af38 sp=0xc00011af18 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc00011afc8 sp=0xc00011af38 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc00011afe0 sp=0xc00011afc8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00011afe8 sp=0xc00011afe0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 12 gp=0xc0001e1a40 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000117738 sp=0xc000117718 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc0001177c8 sp=0xc000117738 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0001177e0 sp=0xc0001177c8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0001177e8 sp=0xc0001177e0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 21 gp=0xc000504540 m=nil [GC worker (idle)]:
runtime.gopark(0x21a0c049ba1f?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000081f38 sp=0xc000081f18 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc000081fc8 sp=0xc000081f38 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc000081fe0 sp=0xc000081fc8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000081fe8 sp=0xc000081fe0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 13 gp=0xc0001e1c00 m=nil [GC worker (idle)]:
runtime.gopark(0x21a0c049ac80?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000117f38 sp=0xc000117f18 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc000117fc8 sp=0xc000117f38 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc000117fe0 sp=0xc000117fc8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000117fe8 sp=0xc000117fe0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 22 gp=0xc000504700 m=nil [GC worker (idle)]:
runtime.gopark(0x21a0c049b309?, 0x3?, 0xe9?, 0x5?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000082738 sp=0xc000082718 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc0000827c8 sp=0xc000082738 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0000827e0 sp=0xc0000827c8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000827e8 sp=0xc0000827e0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 14 gp=0xc0001e1dc0 m=nil [GC worker (idle)]:
runtime.gopark(0x21a0c049b3b3?, 0x3?, 0x29?, 0x9?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc000118738 sp=0xc000118718 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc0001187c8 sp=0xc000118738 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc0001187e0 sp=0xc0001187c8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0001187e8 sp=0xc0001187e0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 36 gp=0xc000102700 m=nil [GC worker (idle)]:
runtime.gopark(0x21a0c049bbce?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:435 +0xce fp=0xc00011b738 sp=0xc00011b718 pc=0x6399639fa98e
runtime.gcBgMarkWorker(0xc0000b7b20)
runtime/mgc.go:1423 +0xe9 fp=0xc00011b7c8 sp=0xc00011b738 pc=0x6399639a81a9
runtime.gcBgMarkStartWorkers.gowrap1()
runtime/mgc.go:1339 +0x25 fp=0xc00011b7e0 sp=0xc00011b7c8 pc=0x6399639a8085
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc00011b7e8 sp=0xc00011b7e0 pc=0x639963a020c1
created by runtime.gcBgMarkStartWorkers in goroutine 1
runtime/mgc.go:1339 +0x105

goroutine 50 gp=0xc000504c40 m=nil [select]:
runtime.gopark(0xc000047a58?, 0x2?, 0x40?, 0x68?, 0xc000047834?)
runtime/proc.go:435 +0xce fp=0xc000047648 sp=0xc000047628 pc=0x6399639fa98e
runtime.selectgo(0xc000047a58, 0xc000047830, 0x4?, 0x0, 0x1?, 0x1)
runtime/select.go:351 +0x837 fp=0xc000047780 sp=0xc000047648 pc=0x6399639d9297
github.com/ollama/ollama/runner/llamarunner.(*Server).completion(0xc0004ba000, {0x639964ca5dc8, 0xc000514d20}, 0xc000154280)
github.com/ollama/ollama/runner/llamarunner/runner.go:688 +0xa25 fp=0xc000047ac0 sp=0xc000047780 pc=0x639963d9aac5
github.com/ollama/ollama/runner/llamarunner.(*Server).completion-fm({0x639964ca5dc8?, 0xc000514d20?}, 0xc0004e3b40?)
:1 +0x36 fp=0xc000047af0 sp=0xc000047ac0 pc=0x639963d9dad6
net/http.HandlerFunc.ServeHTTP(0xc0000ea240?, {0x639964ca5dc8?, 0xc000514d20?}, 0xc0004e3b60?)
net/http/server.go:2294 +0x29 fp=0xc000047b18 sp=0xc000047af0 pc=0x639963cf7e89
net/http.(*ServeMux).ServeHTTP(0x63996399f1c5?, {0x639964ca5dc8, 0xc000514d20}, 0xc000154280)
net/http/server.go:2822 +0x1c4 fp=0xc000047b68 sp=0xc000047b18 pc=0x639963cf9d84
net/http.serverHandler.ServeHTTP({0x639964ca2370?}, {0x639964ca5dc8?, 0xc000514d20?}, 0x1?)
net/http/server.go:3301 +0x8e fp=0xc000047b98 sp=0xc000047b68 pc=0x639963d1780e
net/http.(*conn).serve(0xc0004ba120, {0x639964ca7e28, 0xc00071c3f0})
net/http/server.go:2102 +0x625 fp=0xc000047fb8 sp=0xc000047b98 pc=0x639963cf6385
net/http.(*Server).Serve.gowrap3()
net/http/server.go:3454 +0x28 fp=0xc000047fe0 sp=0xc000047fb8 pc=0x639963cfbc48
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000047fe8 sp=0xc000047fe0 pc=0x639963a020c1
created by net/http.(*Server).Serve in goroutine 1
net/http/server.go:3454 +0x485

goroutine 39 gp=0xc000102a80 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0xb?)
runtime/proc.go:435 +0xce fp=0xc0002445d8 sp=0xc0002445b8 pc=0x6399639fa98e
runtime.netpollblock(0x639963a1de18?, 0x639942c6?, 0x99?)
runtime/netpoll.go:575 +0xf7 fp=0xc000244610 sp=0xc0002445d8 pc=0x6399639bf797
internal/poll.runtime_pollWait(0x70d9a23c7d98, 0x72)
runtime/netpoll.go:351 +0x85 fp=0xc000244630 sp=0xc000244610 pc=0x6399639f9ba5
internal/poll.(*pollDesc).wait(0xc00004eb80?, 0xc00071c521?, 0x0)
internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000244658 sp=0xc000244630 pc=0x639963a81027
internal/poll.(*pollDesc).waitRead(...)
internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc00004eb80, {0xc00071c521, 0x1, 0x1})
internal/poll/fd_unix.go:165 +0x27a fp=0xc0002446f0 sp=0xc000244658 pc=0x639963a8231a
net.(*netFD).Read(0xc00004eb80, {0xc00071c521?, 0xc00071a158?, 0xc000244770?})
net/fd_posix.go:55 +0x25 fp=0xc000244738 sp=0xc0002446f0 pc=0x639963af68c5
net.(*conn).Read(0xc00051e078, {0xc00071c521?, 0x0?, 0x0?})
net/net.go:194 +0x45 fp=0xc000244780 sp=0xc000244738 pc=0x639963b04c85
net/http.(*connReader).backgroundRead(0xc00071c510)
net/http/server.go:690 +0x37 fp=0xc0002447c8 sp=0xc000244780 pc=0x639963cf0257
net/http.(*connReader).startBackgroundRead.gowrap2()
net/http/server.go:686 +0x25 fp=0xc0002447e0 sp=0xc0002447c8 pc=0x639963cf0185
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0002447e8 sp=0xc0002447e0 pc=0x639963a020c1
created by net/http.(*connReader).startBackgroundRead in goroutine 50
net/http/server.go:686 +0xb6

rax 0x0
rbx 0x70d9e8eb9fc0
rcx 0x70d9e8efe00b
rdx 0x0
rdi 0x2
rsi 0x7ffec1e72040
rbp 0x70d94eeb6303
rsp 0x7ffec1e72040
r8 0x0
r9 0x7ffec1e72040
r10 0x8
r11 0x246
r12 0x70d94eebee13
r13 0x1c1
r14 0x63996e7b6fd0
r15 0x63996e7b6fc0
rip 0x70d9e8efe00b
rflags 0x246
cs 0x33
fs 0x0
gs 0x0
[GIN] 2025/03/01 - 07:10:04 | 200 | 117.845349ms | 127.0.0.1 | POST "/api/chat"
^C% ➜ ~

This is on another terminal where I am running the model:

➜ ~ docker exec -it ollama ollama run phi4-mini

hi
Error: POST predict: Post "http://127.0.0.1:44845/completion": EOF
➜ ~

As you can see it exits from the prompt with the log dump as above. As phi4-mini has a requirement of 0.5.13 I could never run phi4-mini on 0.5.12, but, however with the same commands/step I was able to deepseek-r1:latest distilled models on 0.5.12.

OS: Docker container running on Linuxmint with rocm support
CPU/iGPU: 7735HS hence passing HSA_OVERRIDE_GFX_VERSION=10.3.0 and device driver location in docker run command

PS: If the OP issue and the issue I am facing are different (in my case its docker and able to load the model into GPU, can see with ollama ps command, however, added here as both are ROCm issue), I will create a new issue. Please let me know.

<!-- gh-comment-id:2692034940 --> @githubdebugger commented on GitHub (Mar 1, 2025): Maybe these issues are related, hence adding the logs here as I am facing issue with 0.5.13-rc2-rocm and this one in docker container. The model loads in GPU, but as soon as I am sending a message, it exits: <details> <summary>Click to view logs</summary> ➜ ~ docker stop ollama ollama ➜ ~ docker rm ollama ollama ➜ ~ docker run -d --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -e OLLAMA_ORIGINS="" -e HSA_OVERRIDE_GFX_VERSION=10.3.0 -p 11434:11434 --name ollama ollama/ollama:0.5.13-rc2-rocm 993105481ed3bcd488bd82b5477a9749248437e15aee10d99de73a104a37e090 ➜ ~ ➜ ~ ➜ ~ docker logs -f ollama 2025/03/01 07:09:11 routes.go:1215: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:10.3.0 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[ http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-03-01T07:09:11.854Z level=INFO source=images.go:432 msg="total blobs: 23" time=2025-03-01T07:09:11.855Z level=INFO source=images.go:439 msg="total unused blobs removed: 0" time=2025-03-01T07:09:11.856Z level=INFO source=routes.go:1281 msg="Listening on [::]:11434 (version 0.5.13-rc2)" time=2025-03-01T07:09:11.856Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-03-01T07:09:11.859Z level=INFO source=amd_linux.go:389 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=10.3.0 time=2025-03-01T07:09:11.864Z level=INFO source=types.go:130 msg="inference compute" id=0 library=rocm variant="" compute=gfx1035 driver=6.8 name=1002:1681 total="16.0 GiB" available="16.0 GiB" ^C% ➜ ~ docker exec -it ollama ollama ls NAME ID SIZE MODIFIED phi4-mini:latest 78fad5d182a7 2.5 GB 7 minutes ago deepseek-r1:1.5b a42b25d8c10a 1.1 GB 3 hours ago qwen2.5-coder:latest 2b0496514337 4.7 GB 12 days ago nomic-embed-text:latest 0a109f422b47 274 MB 12 days ago deepseek-r1:latest 0a8c26691023 4.7 GB 12 days ago bge-m3:latest 790764642607 1.2 GB 12 days ago ➜ ~ docker exec -it ollama ollama ps NAME ID SIZE PROCESSOR UNTIL ➜ ~ docker exec -it ollama ollama ps NAME ID SIZE PROCESSOR UNTIL phi4-mini:latest 78fad5d182a7 4.7 GB 100% GPU 4 minutes from now ➜ ~ ➜ ~ docker exec -it ollama ollama ps NAME ID SIZE PROCESSOR UNTIL phi4-mini:latest 78fad5d182a7 4.7 GB 100% GPU 4 minutes from now ➜ ~ docker logs -f ollama 2025/03/01 07:09:11 routes.go:1215: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:10.3.0 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[ http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-03-01T07:09:11.854Z level=INFO source=images.go:432 msg="total blobs: 23" time=2025-03-01T07:09:11.855Z level=INFO source=images.go:439 msg="total unused blobs removed: 0" time=2025-03-01T07:09:11.856Z level=INFO source=routes.go:1281 msg="Listening on [::]:11434 (version 0.5.13-rc2)" time=2025-03-01T07:09:11.856Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-03-01T07:09:11.859Z level=INFO source=amd_linux.go:389 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=10.3.0 time=2025-03-01T07:09:11.864Z level=INFO source=types.go:130 msg="inference compute" id=0 library=rocm variant="" compute=gfx1035 driver=6.8 name=1002:1681 total="16.0 GiB" available="16.0 GiB" [GIN] 2025/03/01 - 07:09:44 | 200 | 106.108µs | 127.0.0.1 | HEAD "/" [GIN] 2025/03/01 - 07:09:44 | 200 | 1.294668ms | 127.0.0.1 | GET "/api/tags" [GIN] 2025/03/01 - 07:09:47 | 200 | 20.921µs | 127.0.0.1 | HEAD "/" [GIN] 2025/03/01 - 07:09:47 | 200 | 118.183µs | 127.0.0.1 | GET "/api/ps" [GIN] 2025/03/01 - 07:09:52 | 200 | 29.288µs | 127.0.0.1 | HEAD "/" [GIN] 2025/03/01 - 07:09:52 | 200 | 23.668722ms | 127.0.0.1 | POST "/api/show" time=2025-03-01T07:09:52.534Z level=WARN source=ggml.go:136 msg="key not found" key=phi3.attention.key_length default=128 time=2025-03-01T07:09:52.534Z level=WARN source=ggml.go:136 msg="key not found" key=phi3.attention.value_length default=128 time=2025-03-01T07:09:52.534Z level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-3c168af1dea0a414299c7d9077e100ac763370e5a98b3c53801a958a47f0a5db gpu=0 parallel=4 available=17163341824 required="4.4 GiB" time=2025-03-01T07:09:52.534Z level=INFO source=server.go:97 msg="system memory" total="15.4 GiB" free="12.2 GiB" free_swap="0 B" time=2025-03-01T07:09:52.534Z level=WARN source=ggml.go:136 msg="key not found" key=phi3.attention.key_length default=128 time=2025-03-01T07:09:52.534Z level=WARN source=ggml.go:136 msg="key not found" key=phi3.attention.value_length default=128 time=2025-03-01T07:09:52.534Z level=INFO source=server.go:130 msg=offload library=rocm layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[16.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="4.4 GiB" memory.required.partial="4.4 GiB" memory.required.kv="1.0 GiB" memory.required.allocations="[4.4 GiB]" memory.weights.total="2.8 GiB" memory.weights.repeating="2.4 GiB" memory.weights.nonrepeating="480.8 MiB" memory.graph.full="512.0 MiB" memory.graph.partial="512.0 MiB" time=2025-03-01T07:09:52.535Z level=INFO source=server.go:380 msg="starting llama server" cmd="/usr/bin/ollama runner --model /root/.ollama/models/blobs/sha256-3c168af1dea0a414299c7d9077e100ac763370e5a98b3c53801a958a47f0a5db --ctx-size 8192 --batch-size 512 --n-gpu-layers 33 --threads 8 --parallel 4 --port 44845" time=2025-03-01T07:09:52.535Z level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2025-03-01T07:09:52.535Z level=INFO source=server.go:557 msg="waiting for llama runner to start responding" time=2025-03-01T07:09:52.536Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server error" time=2025-03-01T07:09:52.552Z level=INFO source=runner.go:931 msg="starting go runner" /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, gfx1030 (0x1030), VMM: yes, Wave Size: 32 load_backend: loaded ROCm backend from /usr/lib/ollama/rocm/libggml-hip.so load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so time=2025-03-01T07:09:54.075Z level=INFO source=runner.go:934 msg=system info="CPU : LLAMAFILE = 1 | ROCm : PEER_MAX_BATCH_SIZE = 128 | CPU : LLAMAFILE = 1 | cgo(gcc)" threads=8 llama_model_load_from_file_impl: using device ROCm0 (AMD Radeon Graphics) - 7860 MiB free time=2025-03-01T07:09:54.076Z level=INFO source=runner.go:992 msg="Server listening on 127.0.0.1:44845" llama_model_loader: loaded meta data with 36 key-value pairs and 196 tensors from /root/.ollama/models/blobs/sha256-3c168af1dea0a414299c7d9077e100ac763370e5a98b3c53801a958a47f0a5db (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = phi3 llama_model_loader: - kv 1: phi3.rope.scaling.attn_factor f32 = 1.190238 llama_model_loader: - kv 2: general.type str = model llama_model_loader: - kv 3: general.name str = Phi 4 Mini Instruct llama_model_loader: - kv 4: general.finetune str = instruct llama_model_loader: - kv 5: general.basename str = Phi-4 llama_model_loader: - kv 6: general.size_label str = mini llama_model_loader: - kv 7: general.license str = mit llama_model_loader: - kv 8: general.license.link str = https://huggingface.co/microsoft/Phi-... llama_model_loader: - kv 9: general.tags arr[str,3] = ["nlp", "code", "text-generation"] llama_model_loader: - kv 10: general.languages arr[str,24] = ["multilingual", "ar", "zh", "cs", "d... llama_model_loader: - kv 11: phi3.context_length u32 = 131072 llama_model_loader: - kv 12: phi3.rope.scaling.original_context_length u32 = 4096 llama_model_loader: - kv 13: phi3.embedding_length u32 = 3072 llama_model_loader: - kv 14: phi3.feed_forward_length u32 = 8192 llama_model_loader: - kv 15: phi3.block_count u32 = 32 llama_model_loader: - kv 16: phi3.attention.head_count u32 = 24 llama_model_loader: - kv 17: phi3.attention.head_count_kv u32 = 8 llama_model_loader: - kv 18: phi3.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 19: phi3.rope.dimension_count u32 = 96 llama_model_loader: - kv 20: phi3.rope.freq_base f32 = 10000.000000 llama_model_loader: - kv 21: phi3.attention.sliding_window u32 = 262144 llama_model_loader: - kv 22: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 23: tokenizer.ggml.pre str = gpt-4o llama_model_loader: - kv 24: tokenizer.ggml.tokens arr[str,200064] = ["!", """, "#", "$", "%", "&", "'", ... llama_model_loader: - kv 25: tokenizer.ggml.token_type arr[i32,200064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 26: tokenizer.ggml.merges arr[str,199742] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "e r", ... llama_model_loader: - kv 27: tokenizer.ggml.bos_token_id u32 = 199999 llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 199999 llama_model_loader: - kv 29: tokenizer.ggml.unknown_token_id u32 = 199999 llama_model_loader: - kv 30: tokenizer.ggml.padding_token_id u32 = 199999 llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 32: tokenizer.ggml.add_eos_token bool = false llama_model_loader: - kv 33: tokenizer.chat_template str = {% for message in messages %}{% if me... llama_model_loader: - kv 34: general.quantization_version u32 = 2 llama_model_loader: - kv 35: general.file_type u32 = 15 llama_model_loader: - type f32: 67 tensors llama_model_loader: - type q4_K: 80 tensors llama_model_loader: - type q5_K: 32 tensors llama_model_loader: - type q6_K: 17 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 2.31 GiB (5.18 BPW) load: special tokens cache size = 12 time=2025-03-01T07:09:54.293Z level=INFO source=server.go:591 msg="waiting for server to become available" status="llm server loading model" load: token to piece cache size = 1.3333 MB print_info: arch = phi3 print_info: vocab_only = 0 print_info: n_ctx_train = 131072 print_info: n_embd = 3072 print_info: n_layer = 32 print_info: n_head = 24 print_info: n_head_kv = 8 print_info: n_rot = 96 print_info: n_swa = 262144 print_info: n_embd_head_k = 128 print_info: n_embd_head_v = 128 print_info: n_gqa = 3 print_info: n_embd_k_gqa = 1024 print_info: n_embd_v_gqa = 1024 print_info: f_norm_eps = 0.0e+00 print_info: f_norm_rms_eps = 1.0e-05 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: n_ff = 8192 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: causal attn = 1 print_info: pooling type = 0 print_info: rope type = 2 print_info: rope scaling = linear print_info: freq_base_train = 10000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 4096 print_info: rope_finetuned = unknown print_info: ssm_d_conv = 0 print_info: ssm_d_inner = 0 print_info: ssm_d_state = 0 print_info: ssm_dt_rank = 0 print_info: ssm_dt_b_c_rms = 0 print_info: model type = 3B print_info: model params = 3.84 B print_info: general.name = Phi 4 Mini Instruct print_info: vocab type = BPE print_info: n_vocab = 200064 print_info: n_merges = 199742 print_info: BOS token = 199999 '<|endoftext|>' print_info: EOS token = 199999 '<|endoftext|>' print_info: EOT token = 199999 '<|endoftext|>' print_info: UNK token = 199999 '<|endoftext|>' print_info: PAD token = 199999 '<|endoftext|>' print_info: LF token = 198 'Ċ' print_info: EOG token = 199999 '<|endoftext|>' print_info: EOG token = 200020 '<|end|>' print_info: max token length = 256 load_tensors: loading model tensors, this can take a while... (mmap = true) load_tensors: offloading 32 repeating layers to GPU load_tensors: offloading output layer to GPU load_tensors: offloaded 33/33 layers to GPU load_tensors: CPU_Mapped model buffer size = 480.81 MiB load_tensors: ROCm0 model buffer size = 2368.57 MiB llama_init_from_model: n_seq_max = 4 llama_init_from_model: n_ctx = 8192 llama_init_from_model: n_ctx_per_seq = 2048 llama_init_from_model: n_batch = 2048 llama_init_from_model: n_ubatch = 512 llama_init_from_model: flash_attn = 0 llama_init_from_model: freq_base = 10000.0 llama_init_from_model: freq_scale = 1 llama_init_from_model: n_ctx_per_seq (2048) < n_ctx_train (131072) -- the full capacity of the model will not be utilized llama_kv_cache_init: kv_size = 8192, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 32, can_shift = 1 llama_kv_cache_init: ROCm0 KV buffer size = 1024.00 MiB llama_init_from_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB llama_init_from_model: ROCm_Host output buffer size = 3.10 MiB llama_init_from_model: ROCm0 compute buffer size = 428.00 MiB llama_init_from_model: ROCm_Host compute buffer size = 22.01 MiB llama_init_from_model: graph nodes = 1286 llama_init_from_model: graph splits = 2 time=2025-03-01T07:09:55.298Z level=INFO source=server.go:596 msg="llama runner started in 2.76 seconds" [GIN] 2025/03/01 - 07:09:55 | 200 | 2.832161663s | 127.0.0.1 | POST "/api/generate" [GIN] 2025/03/01 - 07:09:58 | 200 | 25.49µs | 127.0.0.1 | HEAD "/" [GIN] 2025/03/01 - 07:09:58 | 200 | 26.983µs | 127.0.0.1 | GET "/api/ps" [GIN] 2025/03/01 - 07:10:00 | 200 | 34.017µs | 127.0.0.1 | HEAD "/" [GIN] 2025/03/01 - 07:10:00 | 200 | 29.648µs | 127.0.0.1 | GET "/api/ps" //ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:449: HipVMM Failure: out of memory Memory critical error by agent node-0 (Agent handle: 0x63996d6c1680) on address 0x70d978500000. Reason: Memory in use. SIGABRT: abort PC=0x70d9e8efe00b m=0 sigcode=18446744073709551610 signal arrived during cgo execution goroutine 16 gp=0xc000504a80 m=0 mp=0x6399654f96c0 [syscall]: runtime.cgocall(0x639964647600, 0xc000093bc8) runtime/cgocall.go:167 +0x4b fp=0xc000093ba0 sp=0xc000093b68 pc=0x6399639f76ab github.com/ollama/ollama/llama._Cfunc_llama_decode(0x70d7cc836840, {0x4, 0x70d7cc857970, 0x0, 0x0, 0x70d7cc8610d0, 0x70d7cc859e30, 0x70d7cc78bbc0, 0x70d97c6252d0}) _cgo_gotypes.go:557 +0x4a fp=0xc000093bc8 sp=0xc000093ba0 pc=0x639963d7d46a github.com/ollama/ollama/llama.(*Context).Decode.func1(...) github.com/ollama/ollama/llama/llama.go:157 github.com/ollama/ollama/llama.(*Context).Decode(0xc00011c5d0?, 0x0?) github.com/ollama/ollama/llama/llama.go:157 +0xf6 fp=0xc000093cc8 sp=0xc000093bc8 pc=0x639963d80076 github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0xc0004ba000, 0xc0004264e0, 0xc00011c720) github.com/ollama/ollama/runner/llamarunner/runner.go:435 +0x23e fp=0xc000093ee0 sp=0xc000093cc8 pc=0x639963d990be github.com/ollama/ollama/runner/llamarunner.(*Server).run(0xc0004ba000, {0x639964ca7e60, 0xc0001280a0}) github.com/ollama/ollama/runner/llamarunner/runner.go:343 +0x1d5 fp=0xc000093fb8 sp=0xc000093ee0 pc=0x639963d98d15 github.com/ollama/ollama/runner/llamarunner.Execute.gowrap2() github.com/ollama/ollama/runner/llamarunner/runner.go:973 +0x28 fp=0xc000093fe0 sp=0xc000093fb8 pc=0x639963d9d6a8 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000093fe8 sp=0xc000093fe0 pc=0x639963a020c1 created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/llamarunner/runner.go:973 +0xd97 goroutine 1 gp=0xc000002380 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0005875b8 sp=0xc000587598 pc=0x6399639fa98e runtime.netpollblock(0xc000587608?, 0x639942c6?, 0x99?) runtime/netpoll.go:575 +0xf7 fp=0xc0005875f0 sp=0xc0005875b8 pc=0x6399639bf797 internal/poll.runtime_pollWait(0x70d9a23c7eb0, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc000587610 sp=0xc0005875f0 pc=0x6399639f9ba5 internal/poll.(*pollDesc).wait(0xc00004e100?, 0x900000036?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000587638 sp=0xc000587610 pc=0x639963a81027 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0xc00004e100) internal/poll/fd_unix.go:620 +0x295 fp=0xc0005876e0 sp=0xc000587638 pc=0x639963a863f5 net.(*netFD).accept(0xc00004e100) net/fd_unix.go:172 +0x29 fp=0xc000587798 sp=0xc0005876e0 pc=0x639963af8869 net.(*TCPListener).accept(0xc00071a080) net/tcpsock_posix.go:159 +0x1b fp=0xc0005877e8 sp=0xc000587798 pc=0x639963b0e21b net.(*TCPListener).Accept(0xc00071a080) net/tcpsock.go:380 +0x30 fp=0xc000587818 sp=0xc0005877e8 pc=0x639963b0d0d0 net/http.(*onceCloseListener).Accept(0xc0004ba120?) :1 +0x24 fp=0xc000587830 sp=0xc000587818 pc=0x639963d23f84 net/http.(*Server).Serve(0xc000534200, {0x639964ca5be8, 0xc00071a080}) net/http/server.go:3424 +0x30c fp=0xc000587960 sp=0xc000587830 pc=0x639963cfb84c github.com/ollama/ollama/runner/llamarunner.Execute({0xc000034120, 0xe, 0xe}) github.com/ollama/ollama/runner/llamarunner/runner.go:993 +0x116a fp=0xc000587d08 sp=0xc000587960 pc=0x639963d9d3ea github.com/ollama/ollama/runner.Execute({0xc000034110?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:22 +0xd4 fp=0xc000587d30 sp=0xc000587d08 pc=0x639963fc6514 github.com/ollama/ollama/cmd.NewCLI.func2(0xc000035300?, {0x639964825055?, 0x4?, 0x639964825059?}) github.com/ollama/ollama/cmd/cmd.go:1280 +0x45 fp=0xc000587d58 sp=0xc000587d30 pc=0x6399645daae5 github.com/spf13/cobra.(*Command).execute(0xc00013ef08, {0xc0005149a0, 0xe, 0xe}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc000587e78 sp=0xc000587d58 pc=0x639963b71afc github.com/spf13/cobra.(*Command).ExecuteC(0xc00054e908) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc000587f30 sp=0xc000587e78 pc=0x639963b72345 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x4d fp=0xc000587f50 sp=0xc000587f30 pc=0x6399645dae4d runtime.main() runtime/proc.go:283 +0x29d fp=0xc000587fe0 sp=0xc000587f50 pc=0x6399639c6d9d runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000587fe8 sp=0xc000587fe0 pc=0x639963a020c1 goroutine 2 gp=0xc000002e00 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000084fa8 sp=0xc000084f88 pc=0x6399639fa98e runtime.goparkunlock(...) runtime/proc.go:441 runtime.forcegchelper() runtime/proc.go:348 +0xb8 fp=0xc000084fe0 sp=0xc000084fa8 pc=0x6399639c70d8 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000084fe8 sp=0xc000084fe0 pc=0x639963a020c1 created by runtime.init.7 in goroutine 1 runtime/proc.go:336 +0x1a goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000085780 sp=0xc000085760 pc=0x6399639fa98e runtime.goparkunlock(...) runtime/proc.go:441 runtime.bgsweep(0xc00003e080) runtime/mgcsweep.go:316 +0xdf fp=0xc0000857c8 sp=0xc000085780 pc=0x6399639b18ff runtime.gcenable.gowrap1() runtime/mgc.go:204 +0x25 fp=0xc0000857e0 sp=0xc0000857c8 pc=0x6399639a5ce5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000857e8 sp=0xc0000857e0 pc=0x639963a020c1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:204 +0x66 goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x6399649d76b8?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000085f78 sp=0xc000085f58 pc=0x6399639fa98e runtime.goparkunlock(...) runtime/proc.go:441 runtime.(*scavengerState).park(0x6399654f68a0) runtime/mgcscavenge.go:425 +0x49 fp=0xc000085fa8 sp=0xc000085f78 pc=0x6399639af349 runtime.bgscavenge(0xc00003e080) runtime/mgcscavenge.go:658 +0x59 fp=0xc000085fc8 sp=0xc000085fa8 pc=0x6399639af8d9 runtime.gcenable.gowrap2() runtime/mgc.go:205 +0x25 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x6399639a5c85 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x639963a020c1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:205 +0xa5 goroutine 5 gp=0xc000003dc0 m=nil [finalizer wait]: runtime.gopark(0x1b8?, 0xc000002380?, 0x1?, 0x23?, 0xc000084688?) runtime/proc.go:435 +0xce fp=0xc000084630 sp=0xc000084610 pc=0x6399639fa98e runtime.runfinq() runtime/mfinal.go:196 +0x107 fp=0xc0000847e0 sp=0xc000084630 pc=0x6399639a4ca7 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000847e8 sp=0xc0000847e0 pc=0x639963a020c1 created by runtime.createfing in goroutine 1 runtime/mfinal.go:166 +0x3d goroutine 6 gp=0xc0001e08c0 m=nil [chan receive]: runtime.gopark(0xc0000ff900?, 0xc000588018?, 0x60?, 0x67?, 0x639963adf5a8?) runtime/proc.go:435 +0xce fp=0xc000086718 sp=0xc0000866f8 pc=0x6399639fa98e runtime.chanrecv(0xc0000b6380, 0x0, 0x1) runtime/chan.go:664 +0x445 fp=0xc000086790 sp=0xc000086718 pc=0x639963996ea5 runtime.chanrecv1(0x0?, 0x0?) runtime/chan.go:506 +0x12 fp=0xc0000867b8 sp=0xc000086790 pc=0x639963996a32 runtime.unique_runtime_registerUniqueMapCleanup.func2(...) runtime/mgc.go:1796 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() runtime/mgc.go:1799 +0x2f fp=0xc0000867e0 sp=0xc0000867b8 pc=0x6399639a8e8f runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000867e8 sp=0xc0000867e0 pc=0x639963a020c1 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 runtime/mgc.go:1794 +0x85 goroutine 7 gp=0xc0001e1180 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000086f38 sp=0xc000086f18 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc000086fc8 sp=0xc000086f38 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000086fe0 sp=0xc000086fc8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000086fe8 sp=0xc000086fe0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 8 gp=0xc0001e1340 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000087738 sp=0xc000087718 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc0000877c8 sp=0xc000087738 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000877e0 sp=0xc0000877c8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000877e8 sp=0xc0000877e0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 18 gp=0xc000504000 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000080738 sp=0xc000080718 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc0000807c8 sp=0xc000080738 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000807e0 sp=0xc0000807c8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000807e8 sp=0xc0000807e0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 34 gp=0xc000102380 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00011a738 sp=0xc00011a718 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc00011a7c8 sp=0xc00011a738 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00011a7e0 sp=0xc00011a7c8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00011a7e8 sp=0xc00011a7e0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 9 gp=0xc0001e1500 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000087f38 sp=0xc000087f18 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc000087fc8 sp=0xc000087f38 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000087fe0 sp=0xc000087fc8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000087fe8 sp=0xc000087fe0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 10 gp=0xc0001e16c0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000116738 sp=0xc000116718 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc0001167c8 sp=0xc000116738 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0001167e0 sp=0xc0001167c8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0001167e8 sp=0xc0001167e0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 11 gp=0xc0001e1880 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000116f38 sp=0xc000116f18 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc000116fc8 sp=0xc000116f38 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000116fe0 sp=0xc000116fc8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000116fe8 sp=0xc000116fe0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 19 gp=0xc0005041c0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000080f38 sp=0xc000080f18 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc000080fc8 sp=0xc000080f38 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000080fe0 sp=0xc000080fc8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000080fe8 sp=0xc000080fe0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 20 gp=0xc000504380 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000081738 sp=0xc000081718 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc0000817c8 sp=0xc000081738 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000817e0 sp=0xc0000817c8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000817e8 sp=0xc0000817e0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 35 gp=0xc000102540 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00011af38 sp=0xc00011af18 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc00011afc8 sp=0xc00011af38 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00011afe0 sp=0xc00011afc8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00011afe8 sp=0xc00011afe0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 12 gp=0xc0001e1a40 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000117738 sp=0xc000117718 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc0001177c8 sp=0xc000117738 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0001177e0 sp=0xc0001177c8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0001177e8 sp=0xc0001177e0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 21 gp=0xc000504540 m=nil [GC worker (idle)]: runtime.gopark(0x21a0c049ba1f?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000081f38 sp=0xc000081f18 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc000081fc8 sp=0xc000081f38 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000081fe0 sp=0xc000081fc8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000081fe8 sp=0xc000081fe0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 13 gp=0xc0001e1c00 m=nil [GC worker (idle)]: runtime.gopark(0x21a0c049ac80?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000117f38 sp=0xc000117f18 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc000117fc8 sp=0xc000117f38 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000117fe0 sp=0xc000117fc8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000117fe8 sp=0xc000117fe0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 22 gp=0xc000504700 m=nil [GC worker (idle)]: runtime.gopark(0x21a0c049b309?, 0x3?, 0xe9?, 0x5?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000082738 sp=0xc000082718 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc0000827c8 sp=0xc000082738 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000827e0 sp=0xc0000827c8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000827e8 sp=0xc0000827e0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 14 gp=0xc0001e1dc0 m=nil [GC worker (idle)]: runtime.gopark(0x21a0c049b3b3?, 0x3?, 0x29?, 0x9?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000118738 sp=0xc000118718 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc0001187c8 sp=0xc000118738 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0001187e0 sp=0xc0001187c8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0001187e8 sp=0xc0001187e0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 36 gp=0xc000102700 m=nil [GC worker (idle)]: runtime.gopark(0x21a0c049bbce?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00011b738 sp=0xc00011b718 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc00011b7c8 sp=0xc00011b738 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00011b7e0 sp=0xc00011b7c8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00011b7e8 sp=0xc00011b7e0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 50 gp=0xc000504c40 m=nil [select]: runtime.gopark(0xc000047a58?, 0x2?, 0x40?, 0x68?, 0xc000047834?) runtime/proc.go:435 +0xce fp=0xc000047648 sp=0xc000047628 pc=0x6399639fa98e runtime.selectgo(0xc000047a58, 0xc000047830, 0x4?, 0x0, 0x1?, 0x1) runtime/select.go:351 +0x837 fp=0xc000047780 sp=0xc000047648 pc=0x6399639d9297 github.com/ollama/ollama/runner/llamarunner.(*Server).completion(0xc0004ba000, {0x639964ca5dc8, 0xc000514d20}, 0xc000154280) github.com/ollama/ollama/runner/llamarunner/runner.go:688 +0xa25 fp=0xc000047ac0 sp=0xc000047780 pc=0x639963d9aac5 github.com/ollama/ollama/runner/llamarunner.(*Server).completion-fm({0x639964ca5dc8?, 0xc000514d20?}, 0xc0004e3b40?) :1 +0x36 fp=0xc000047af0 sp=0xc000047ac0 pc=0x639963d9dad6 net/http.HandlerFunc.ServeHTTP(0xc0000ea240?, {0x639964ca5dc8?, 0xc000514d20?}, 0xc0004e3b60?) net/http/server.go:2294 +0x29 fp=0xc000047b18 sp=0xc000047af0 pc=0x639963cf7e89 net/http.(*ServeMux).ServeHTTP(0x63996399f1c5?, {0x639964ca5dc8, 0xc000514d20}, 0xc000154280) net/http/server.go:2822 +0x1c4 fp=0xc000047b68 sp=0xc000047b18 pc=0x639963cf9d84 net/http.serverHandler.ServeHTTP({0x639964ca2370?}, {0x639964ca5dc8?, 0xc000514d20?}, 0x1?) net/http/server.go:3301 +0x8e fp=0xc000047b98 sp=0xc000047b68 pc=0x639963d1780e net/http.(*conn).serve(0xc0004ba120, {0x639964ca7e28, 0xc00071c3f0}) net/http/server.go:2102 +0x625 fp=0xc000047fb8 sp=0xc000047b98 pc=0x639963cf6385 net/http.(*Server).Serve.gowrap3() net/http/server.go:3454 +0x28 fp=0xc000047fe0 sp=0xc000047fb8 pc=0x639963cfbc48 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000047fe8 sp=0xc000047fe0 pc=0x639963a020c1 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3454 +0x485 goroutine 39 gp=0xc000102a80 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0xb?) runtime/proc.go:435 +0xce fp=0xc0002445d8 sp=0xc0002445b8 pc=0x6399639fa98e runtime.netpollblock(0x639963a1de18?, 0x639942c6?, 0x99?) runtime/netpoll.go:575 +0xf7 fp=0xc000244610 sp=0xc0002445d8 pc=0x6399639bf797 internal/poll.runtime_pollWait(0x70d9a23c7d98, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc000244630 sp=0xc000244610 pc=0x6399639f9ba5 internal/poll.(*pollDesc).wait(0xc00004eb80?, 0xc00071c521?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000244658 sp=0xc000244630 pc=0x639963a81027 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0xc00004eb80, {0xc00071c521, 0x1, 0x1}) internal/poll/fd_unix.go:165 +0x27a fp=0xc0002446f0 sp=0xc000244658 pc=0x639963a8231a net.(*netFD).Read(0xc00004eb80, {0xc00071c521?, 0xc00071a158?, 0xc000244770?}) net/fd_posix.go:55 +0x25 fp=0xc000244738 sp=0xc0002446f0 pc=0x639963af68c5 net.(*conn).Read(0xc00051e078, {0xc00071c521?, 0x0?, 0x0?}) net/net.go:194 +0x45 fp=0xc000244780 sp=0xc000244738 pc=0x639963b04c85 net/http.(*connReader).backgroundRead(0xc00071c510) net/http/server.go:690 +0x37 fp=0xc0002447c8 sp=0xc000244780 pc=0x639963cf0257 net/http.(*connReader).startBackgroundRead.gowrap2() net/http/server.go:686 +0x25 fp=0xc0002447e0 sp=0xc0002447c8 pc=0x639963cf0185 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0002447e8 sp=0xc0002447e0 pc=0x639963a020c1 created by net/http.(*connReader).startBackgroundRead in goroutine 50 net/http/server.go:686 +0xb6 rax 0x0 rbx 0x70d9e8eb9fc0 rcx 0x70d9e8efe00b rdx 0x0 rdi 0x2 rsi 0x7ffec1e71dd0 rbp 0x70d978500000 rsp 0x7ffec1e71dd0 r8 0x0 r9 0x7ffec1e71dd0 r10 0x8 r11 0x246 r12 0x7ffec1e72050 r13 0x0 r14 0x1000 r15 0x0 rip 0x70d9e8efe00b rflags 0x246 cs 0x33 fs 0x0 gs 0x0 SIGABRT: abort PC=0x70d9e8efe00b m=0 sigcode=18446744073709551610 signal arrived during cgo execution goroutine 16 gp=0xc000504a80 m=0 mp=0x6399654f96c0 [syscall]: runtime.cgocall(0x639964647600, 0xc000093bc8) runtime/cgocall.go:167 +0x4b fp=0xc000093ba0 sp=0xc000093b68 pc=0x6399639f76ab github.com/ollama/ollama/llama._Cfunc_llama_decode(0x70d7cc836840, {0x4, 0x70d7cc857970, 0x0, 0x0, 0x70d7cc8610d0, 0x70d7cc859e30, 0x70d7cc78bbc0, 0x70d97c6252d0}) _cgo_gotypes.go:557 +0x4a fp=0xc000093bc8 sp=0xc000093ba0 pc=0x639963d7d46a github.com/ollama/ollama/llama.(*Context).Decode.func1(...) github.com/ollama/ollama/llama/llama.go:157 github.com/ollama/ollama/llama.(*Context).Decode(0xc00011c5d0?, 0x0?) github.com/ollama/ollama/llama/llama.go:157 +0xf6 fp=0xc000093cc8 sp=0xc000093bc8 pc=0x639963d80076 github.com/ollama/ollama/runner/llamarunner.(*Server).processBatch(0xc0004ba000, 0xc0004264e0, 0xc00011c720) github.com/ollama/ollama/runner/llamarunner/runner.go:435 +0x23e fp=0xc000093ee0 sp=0xc000093cc8 pc=0x639963d990be github.com/ollama/ollama/runner/llamarunner.(*Server).run(0xc0004ba000, {0x639964ca7e60, 0xc0001280a0}) github.com/ollama/ollama/runner/llamarunner/runner.go:343 +0x1d5 fp=0xc000093fb8 sp=0xc000093ee0 pc=0x639963d98d15 github.com/ollama/ollama/runner/llamarunner.Execute.gowrap2() github.com/ollama/ollama/runner/llamarunner/runner.go:973 +0x28 fp=0xc000093fe0 sp=0xc000093fb8 pc=0x639963d9d6a8 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000093fe8 sp=0xc000093fe0 pc=0x639963a020c1 created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/llamarunner/runner.go:973 +0xd97 goroutine 1 gp=0xc000002380 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0005875b8 sp=0xc000587598 pc=0x6399639fa98e runtime.netpollblock(0xc000587608?, 0x639942c6?, 0x99?) runtime/netpoll.go:575 +0xf7 fp=0xc0005875f0 sp=0xc0005875b8 pc=0x6399639bf797 internal/poll.runtime_pollWait(0x70d9a23c7eb0, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc000587610 sp=0xc0005875f0 pc=0x6399639f9ba5 internal/poll.(*pollDesc).wait(0xc00004e100?, 0x900000036?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000587638 sp=0xc000587610 pc=0x639963a81027 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0xc00004e100) internal/poll/fd_unix.go:620 +0x295 fp=0xc0005876e0 sp=0xc000587638 pc=0x639963a863f5 net.(*netFD).accept(0xc00004e100) net/fd_unix.go:172 +0x29 fp=0xc000587798 sp=0xc0005876e0 pc=0x639963af8869 net.(*TCPListener).accept(0xc00071a080) net/tcpsock_posix.go:159 +0x1b fp=0xc0005877e8 sp=0xc000587798 pc=0x639963b0e21b net.(*TCPListener).Accept(0xc00071a080) net/tcpsock.go:380 +0x30 fp=0xc000587818 sp=0xc0005877e8 pc=0x639963b0d0d0 net/http.(*onceCloseListener).Accept(0xc0004ba120?) :1 +0x24 fp=0xc000587830 sp=0xc000587818 pc=0x639963d23f84 net/http.(*Server).Serve(0xc000534200, {0x639964ca5be8, 0xc00071a080}) net/http/server.go:3424 +0x30c fp=0xc000587960 sp=0xc000587830 pc=0x639963cfb84c github.com/ollama/ollama/runner/llamarunner.Execute({0xc000034120, 0xe, 0xe}) github.com/ollama/ollama/runner/llamarunner/runner.go:993 +0x116a fp=0xc000587d08 sp=0xc000587960 pc=0x639963d9d3ea github.com/ollama/ollama/runner.Execute({0xc000034110?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:22 +0xd4 fp=0xc000587d30 sp=0xc000587d08 pc=0x639963fc6514 github.com/ollama/ollama/cmd.NewCLI.func2(0xc000035300?, {0x639964825055?, 0x4?, 0x639964825059?}) github.com/ollama/ollama/cmd/cmd.go:1280 +0x45 fp=0xc000587d58 sp=0xc000587d30 pc=0x6399645daae5 github.com/spf13/cobra.(*Command).execute(0xc00013ef08, {0xc0005149a0, 0xe, 0xe}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc000587e78 sp=0xc000587d58 pc=0x639963b71afc github.com/spf13/cobra.(*Command).ExecuteC(0xc00054e908) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc000587f30 sp=0xc000587e78 pc=0x639963b72345 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x4d fp=0xc000587f50 sp=0xc000587f30 pc=0x6399645dae4d runtime.main() runtime/proc.go:283 +0x29d fp=0xc000587fe0 sp=0xc000587f50 pc=0x6399639c6d9d runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000587fe8 sp=0xc000587fe0 pc=0x639963a020c1 goroutine 2 gp=0xc000002e00 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000084fa8 sp=0xc000084f88 pc=0x6399639fa98e runtime.goparkunlock(...) runtime/proc.go:441 runtime.forcegchelper() runtime/proc.go:348 +0xb8 fp=0xc000084fe0 sp=0xc000084fa8 pc=0x6399639c70d8 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000084fe8 sp=0xc000084fe0 pc=0x639963a020c1 created by runtime.init.7 in goroutine 1 runtime/proc.go:336 +0x1a goroutine 3 gp=0xc000003340 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000085780 sp=0xc000085760 pc=0x6399639fa98e runtime.goparkunlock(...) runtime/proc.go:441 runtime.bgsweep(0xc00003e080) runtime/mgcsweep.go:316 +0xdf fp=0xc0000857c8 sp=0xc000085780 pc=0x6399639b18ff runtime.gcenable.gowrap1() runtime/mgc.go:204 +0x25 fp=0xc0000857e0 sp=0xc0000857c8 pc=0x6399639a5ce5 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000857e8 sp=0xc0000857e0 pc=0x639963a020c1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:204 +0x66 goroutine 4 gp=0xc000003500 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x6399649d76b8?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000085f78 sp=0xc000085f58 pc=0x6399639fa98e runtime.goparkunlock(...) runtime/proc.go:441 runtime.(*scavengerState).park(0x6399654f68a0) runtime/mgcscavenge.go:425 +0x49 fp=0xc000085fa8 sp=0xc000085f78 pc=0x6399639af349 runtime.bgscavenge(0xc00003e080) runtime/mgcscavenge.go:658 +0x59 fp=0xc000085fc8 sp=0xc000085fa8 pc=0x6399639af8d9 runtime.gcenable.gowrap2() runtime/mgc.go:205 +0x25 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x6399639a5c85 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x639963a020c1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:205 +0xa5 goroutine 5 gp=0xc000003dc0 m=nil [finalizer wait]: runtime.gopark(0x1b8?, 0xc000002380?, 0x1?, 0x23?, 0xc000084688?) runtime/proc.go:435 +0xce fp=0xc000084630 sp=0xc000084610 pc=0x6399639fa98e runtime.runfinq() runtime/mfinal.go:196 +0x107 fp=0xc0000847e0 sp=0xc000084630 pc=0x6399639a4ca7 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000847e8 sp=0xc0000847e0 pc=0x639963a020c1 created by runtime.createfing in goroutine 1 runtime/mfinal.go:166 +0x3d goroutine 6 gp=0xc0001e08c0 m=nil [chan receive]: runtime.gopark(0xc0000ff900?, 0xc000588018?, 0x60?, 0x67?, 0x639963adf5a8?) runtime/proc.go:435 +0xce fp=0xc000086718 sp=0xc0000866f8 pc=0x6399639fa98e runtime.chanrecv(0xc0000b6380, 0x0, 0x1) runtime/chan.go:664 +0x445 fp=0xc000086790 sp=0xc000086718 pc=0x639963996ea5 runtime.chanrecv1(0x0?, 0x0?) runtime/chan.go:506 +0x12 fp=0xc0000867b8 sp=0xc000086790 pc=0x639963996a32 runtime.unique_runtime_registerUniqueMapCleanup.func2(...) runtime/mgc.go:1796 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() runtime/mgc.go:1799 +0x2f fp=0xc0000867e0 sp=0xc0000867b8 pc=0x6399639a8e8f runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000867e8 sp=0xc0000867e0 pc=0x639963a020c1 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 runtime/mgc.go:1794 +0x85 goroutine 7 gp=0xc0001e1180 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000086f38 sp=0xc000086f18 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc000086fc8 sp=0xc000086f38 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000086fe0 sp=0xc000086fc8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000086fe8 sp=0xc000086fe0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 8 gp=0xc0001e1340 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000087738 sp=0xc000087718 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc0000877c8 sp=0xc000087738 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000877e0 sp=0xc0000877c8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000877e8 sp=0xc0000877e0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 18 gp=0xc000504000 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000080738 sp=0xc000080718 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc0000807c8 sp=0xc000080738 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000807e0 sp=0xc0000807c8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000807e8 sp=0xc0000807e0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 34 gp=0xc000102380 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00011a738 sp=0xc00011a718 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc00011a7c8 sp=0xc00011a738 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00011a7e0 sp=0xc00011a7c8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00011a7e8 sp=0xc00011a7e0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 9 gp=0xc0001e1500 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000087f38 sp=0xc000087f18 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc000087fc8 sp=0xc000087f38 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000087fe0 sp=0xc000087fc8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000087fe8 sp=0xc000087fe0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 10 gp=0xc0001e16c0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000116738 sp=0xc000116718 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc0001167c8 sp=0xc000116738 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0001167e0 sp=0xc0001167c8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0001167e8 sp=0xc0001167e0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 11 gp=0xc0001e1880 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000116f38 sp=0xc000116f18 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc000116fc8 sp=0xc000116f38 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000116fe0 sp=0xc000116fc8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000116fe8 sp=0xc000116fe0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 19 gp=0xc0005041c0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000080f38 sp=0xc000080f18 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc000080fc8 sp=0xc000080f38 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000080fe0 sp=0xc000080fc8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000080fe8 sp=0xc000080fe0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 20 gp=0xc000504380 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000081738 sp=0xc000081718 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc0000817c8 sp=0xc000081738 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000817e0 sp=0xc0000817c8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000817e8 sp=0xc0000817e0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 35 gp=0xc000102540 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00011af38 sp=0xc00011af18 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc00011afc8 sp=0xc00011af38 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00011afe0 sp=0xc00011afc8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00011afe8 sp=0xc00011afe0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 12 gp=0xc0001e1a40 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000117738 sp=0xc000117718 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc0001177c8 sp=0xc000117738 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0001177e0 sp=0xc0001177c8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0001177e8 sp=0xc0001177e0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 21 gp=0xc000504540 m=nil [GC worker (idle)]: runtime.gopark(0x21a0c049ba1f?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000081f38 sp=0xc000081f18 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc000081fc8 sp=0xc000081f38 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000081fe0 sp=0xc000081fc8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000081fe8 sp=0xc000081fe0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 13 gp=0xc0001e1c00 m=nil [GC worker (idle)]: runtime.gopark(0x21a0c049ac80?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000117f38 sp=0xc000117f18 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc000117fc8 sp=0xc000117f38 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000117fe0 sp=0xc000117fc8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000117fe8 sp=0xc000117fe0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 22 gp=0xc000504700 m=nil [GC worker (idle)]: runtime.gopark(0x21a0c049b309?, 0x3?, 0xe9?, 0x5?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000082738 sp=0xc000082718 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc0000827c8 sp=0xc000082738 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000827e0 sp=0xc0000827c8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000827e8 sp=0xc0000827e0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 14 gp=0xc0001e1dc0 m=nil [GC worker (idle)]: runtime.gopark(0x21a0c049b3b3?, 0x3?, 0x29?, 0x9?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000118738 sp=0xc000118718 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc0001187c8 sp=0xc000118738 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0001187e0 sp=0xc0001187c8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0001187e8 sp=0xc0001187e0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 36 gp=0xc000102700 m=nil [GC worker (idle)]: runtime.gopark(0x21a0c049bbce?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00011b738 sp=0xc00011b718 pc=0x6399639fa98e runtime.gcBgMarkWorker(0xc0000b7b20) runtime/mgc.go:1423 +0xe9 fp=0xc00011b7c8 sp=0xc00011b738 pc=0x6399639a81a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00011b7e0 sp=0xc00011b7c8 pc=0x6399639a8085 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00011b7e8 sp=0xc00011b7e0 pc=0x639963a020c1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105 goroutine 50 gp=0xc000504c40 m=nil [select]: runtime.gopark(0xc000047a58?, 0x2?, 0x40?, 0x68?, 0xc000047834?) runtime/proc.go:435 +0xce fp=0xc000047648 sp=0xc000047628 pc=0x6399639fa98e runtime.selectgo(0xc000047a58, 0xc000047830, 0x4?, 0x0, 0x1?, 0x1) runtime/select.go:351 +0x837 fp=0xc000047780 sp=0xc000047648 pc=0x6399639d9297 github.com/ollama/ollama/runner/llamarunner.(*Server).completion(0xc0004ba000, {0x639964ca5dc8, 0xc000514d20}, 0xc000154280) github.com/ollama/ollama/runner/llamarunner/runner.go:688 +0xa25 fp=0xc000047ac0 sp=0xc000047780 pc=0x639963d9aac5 github.com/ollama/ollama/runner/llamarunner.(*Server).completion-fm({0x639964ca5dc8?, 0xc000514d20?}, 0xc0004e3b40?) :1 +0x36 fp=0xc000047af0 sp=0xc000047ac0 pc=0x639963d9dad6 net/http.HandlerFunc.ServeHTTP(0xc0000ea240?, {0x639964ca5dc8?, 0xc000514d20?}, 0xc0004e3b60?) net/http/server.go:2294 +0x29 fp=0xc000047b18 sp=0xc000047af0 pc=0x639963cf7e89 net/http.(*ServeMux).ServeHTTP(0x63996399f1c5?, {0x639964ca5dc8, 0xc000514d20}, 0xc000154280) net/http/server.go:2822 +0x1c4 fp=0xc000047b68 sp=0xc000047b18 pc=0x639963cf9d84 net/http.serverHandler.ServeHTTP({0x639964ca2370?}, {0x639964ca5dc8?, 0xc000514d20?}, 0x1?) net/http/server.go:3301 +0x8e fp=0xc000047b98 sp=0xc000047b68 pc=0x639963d1780e net/http.(*conn).serve(0xc0004ba120, {0x639964ca7e28, 0xc00071c3f0}) net/http/server.go:2102 +0x625 fp=0xc000047fb8 sp=0xc000047b98 pc=0x639963cf6385 net/http.(*Server).Serve.gowrap3() net/http/server.go:3454 +0x28 fp=0xc000047fe0 sp=0xc000047fb8 pc=0x639963cfbc48 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000047fe8 sp=0xc000047fe0 pc=0x639963a020c1 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3454 +0x485 goroutine 39 gp=0xc000102a80 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0xb?) runtime/proc.go:435 +0xce fp=0xc0002445d8 sp=0xc0002445b8 pc=0x6399639fa98e runtime.netpollblock(0x639963a1de18?, 0x639942c6?, 0x99?) runtime/netpoll.go:575 +0xf7 fp=0xc000244610 sp=0xc0002445d8 pc=0x6399639bf797 internal/poll.runtime_pollWait(0x70d9a23c7d98, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc000244630 sp=0xc000244610 pc=0x6399639f9ba5 internal/poll.(*pollDesc).wait(0xc00004eb80?, 0xc00071c521?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000244658 sp=0xc000244630 pc=0x639963a81027 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0xc00004eb80, {0xc00071c521, 0x1, 0x1}) internal/poll/fd_unix.go:165 +0x27a fp=0xc0002446f0 sp=0xc000244658 pc=0x639963a8231a net.(*netFD).Read(0xc00004eb80, {0xc00071c521?, 0xc00071a158?, 0xc000244770?}) net/fd_posix.go:55 +0x25 fp=0xc000244738 sp=0xc0002446f0 pc=0x639963af68c5 net.(*conn).Read(0xc00051e078, {0xc00071c521?, 0x0?, 0x0?}) net/net.go:194 +0x45 fp=0xc000244780 sp=0xc000244738 pc=0x639963b04c85 net/http.(*connReader).backgroundRead(0xc00071c510) net/http/server.go:690 +0x37 fp=0xc0002447c8 sp=0xc000244780 pc=0x639963cf0257 net/http.(*connReader).startBackgroundRead.gowrap2() net/http/server.go:686 +0x25 fp=0xc0002447e0 sp=0xc0002447c8 pc=0x639963cf0185 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0002447e8 sp=0xc0002447e0 pc=0x639963a020c1 created by net/http.(*connReader).startBackgroundRead in goroutine 50 net/http/server.go:686 +0xb6 rax 0x0 rbx 0x70d9e8eb9fc0 rcx 0x70d9e8efe00b rdx 0x0 rdi 0x2 rsi 0x7ffec1e72040 rbp 0x70d94eeb6303 rsp 0x7ffec1e72040 r8 0x0 r9 0x7ffec1e72040 r10 0x8 r11 0x246 r12 0x70d94eebee13 r13 0x1c1 r14 0x63996e7b6fd0 r15 0x63996e7b6fc0 rip 0x70d9e8efe00b rflags 0x246 cs 0x33 fs 0x0 gs 0x0 [GIN] 2025/03/01 - 07:10:04 | 200 | 117.845349ms | 127.0.0.1 | POST "/api/chat" ^C% ➜ ~ </details> This is on another terminal where I am running the model: ➜ ~ docker exec -it ollama ollama run phi4-mini hi Error: POST predict: Post "http://127.0.0.1:44845/completion": EOF ➜ ~ As you can see it exits from the prompt with the log dump as above. As phi4-mini has a requirement of 0.5.13 I could never run phi4-mini on 0.5.12, but, however with the same commands/step I was able to deepseek-r1:latest distilled models on 0.5.12. OS: Docker container running on Linuxmint with rocm support CPU/iGPU: 7735HS hence passing HSA_OVERRIDE_GFX_VERSION=10.3.0 and device driver location in docker run command PS: If the OP issue and the issue I am facing are different (in my case its docker and able to load the model into GPU, can see with ollama ps command, however, added here as both are ROCm issue), I will create a new issue. Please let me know.
Author
Owner

@githubdebugger commented on GitHub (Mar 2, 2025):

Maybe these issues are related, hence adding the logs here as I am facing issue with 0.5.13-rc2-rocm and this one in docker container. The model loads in GPU, but as soon as I am sending a message, it exits:

Click to view logs
This is on another terminal where I am running the model:

➜ ~ docker exec -it ollama ollama run phi4-mini

hi Error: POST predict: Post "http://127.0.0.1:44845/completion": EOF ➜ ~

As you can see it exits from the prompt with the log dump as above. As phi4-mini has a requirement of 0.5.13 I could never run phi4-mini on 0.5.12, but, however with the same commands/step I was able to deepseek-r1:latest distilled models on 0.5.12.

OS: Docker container running on Linuxmint with rocm support CPU/iGPU: 7735HS hence passing HSA_OVERRIDE_GFX_VERSION=10.3.0 and device driver location in docker run command

PS: If the OP issue and the issue I am facing are different (in my case its docker and able to load the model into GPU, can see with ollama ps command, however, added here as both are ROCm issue), I will create a new issue. Please let me know.

FYI: Upgraded to latest 0.5.13-rc4-rocm and I do not see this anymore in 0.5.13-rc4-rocm, the model loads fine in iGPU and everything works fine.

<!-- gh-comment-id:2692739909 --> @githubdebugger commented on GitHub (Mar 2, 2025): > Maybe these issues are related, hence adding the logs here as I am facing issue with 0.5.13-rc2-rocm and this one in docker container. The model loads in GPU, but as soon as I am sending a message, it exits: > > Click to view logs > This is on another terminal where I am running the model: > > ➜ ~ docker exec -it ollama ollama run phi4-mini > > hi Error: POST predict: Post "http://127.0.0.1:44845/completion": EOF ➜ ~ > > As you can see it exits from the prompt with the log dump as above. As phi4-mini has a requirement of 0.5.13 I could never run phi4-mini on 0.5.12, but, however with the same commands/step I was able to deepseek-r1:latest distilled models on 0.5.12. > > OS: Docker container running on Linuxmint with rocm support CPU/iGPU: 7735HS hence passing HSA_OVERRIDE_GFX_VERSION=10.3.0 and device driver location in docker run command > > PS: If the OP issue and the issue I am facing are different (in my case its docker and able to load the model into GPU, can see with ollama ps command, however, added here as both are ROCm issue), I will create a new issue. Please let me know. FYI: Upgraded to latest 0.5.13-rc4-rocm and I do not see this anymore in 0.5.13-rc4-rocm, the model loads fine in iGPU and everything works fine.
Author
Owner

@ProjectMoon commented on GitHub (Mar 3, 2025):

OK, latest updates from 0.5.13 RC5:

  • Models seem to run fine on the GPU again. They are loaded on the GPU, and responses are instant.
  • Llama3.2 vision runs on the GPU, and processes images on the GPU.
  • Granite3.2 vision loads on the GPU, and processes on the GPU when given only text. But as soon as an image is given (via the ollama CLI chat), ollama seems to be using CPU only? Logs are inconclusive.
  • Ollama isn't locking up the entire machine, at least!

Edit: With Granite, it also just seems to be stuck on "Image added!" with the little spinny progress indicator thing. I dunno if I should wait longer, but I let it spin for about 10 mins before killing the ollama runner.

Here are some debug logs. When the image is added, it just gets stuck at this point:

time=2025-03-03T10:14:12.392+01:00 level=DEBUG source=sched.go:576 msg="evaluating already loaded" model=/ollama/blobs/sha256-532c9368ec97a9b9299f38e5e871d245a5075a3eefddb364c26054b920cc7f83
time=2025-03-03T10:14:12.395+01:00 level=DEBUG source=routes.go:1505 msg="chat request" images=1 prompt="<|system|>\nA chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.\n<|user|>\nblammo\n<|assistant|>\n\nHello! How can I assist you today? Let's have a friendly conversation. 😊\n<|end_of_text|>\n\n<|user|>\n[img-0]\n<|assistant|>"
time=2025-03-03T10:14:13.957+01:00 level=DEBUG source=image.go:179 msg="storing image embeddings in cache" entry=0 used=0001-01-01T00:00:00.000Z
time=2025-03-03T10:14:13.958+01:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=69 prompt=7374 used=51 remaining=7323```
<!-- gh-comment-id:2693656584 --> @ProjectMoon commented on GitHub (Mar 3, 2025): OK, latest updates from 0.5.13 RC5: - Models seem to run fine on the GPU again. They are loaded on the GPU, and responses are instant. - Llama3.2 vision runs on the GPU, and processes images on the GPU. - Granite3.2 vision loads on the GPU, and processes on the GPU when given only text. But as soon as an image is given (via the ollama CLI chat), ollama seems to be using CPU only? Logs are inconclusive. - Ollama isn't locking up the entire machine, at least! Edit: With Granite, it also just seems to be stuck on "Image added!" with the little spinny progress indicator thing. I dunno if I should wait longer, but I let it spin for about 10 mins before killing the ollama runner. Here are some debug logs. When the image is added, it just gets stuck at this point: ``` time=2025-03-03T10:14:12.392+01:00 level=DEBUG source=sched.go:576 msg="evaluating already loaded" model=/ollama/blobs/sha256-532c9368ec97a9b9299f38e5e871d245a5075a3eefddb364c26054b920cc7f83 time=2025-03-03T10:14:12.395+01:00 level=DEBUG source=routes.go:1505 msg="chat request" images=1 prompt="<|system|>\nA chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.\n<|user|>\nblammo\n<|assistant|>\n\nHello! How can I assist you today? Let's have a friendly conversation. 😊\n<|end_of_text|>\n\n<|user|>\n[img-0]\n<|assistant|>" time=2025-03-03T10:14:13.957+01:00 level=DEBUG source=image.go:179 msg="storing image embeddings in cache" entry=0 used=0001-01-01T00:00:00.000Z time=2025-03-03T10:14:13.958+01:00 level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=69 prompt=7374 used=51 remaining=7323```
Author
Owner

@ProjectMoon commented on GitHub (Mar 4, 2025):

This still seems to be an issue with the final version of 0.5.13.

Edit: And I'm not sure it has anything to do with ROCm. As the same seems to be happening on my machine with a small nvidia GPU. Thing just gets locked up; I would expect at least SOME response after a few minutes, even on CPU mode.

<!-- gh-comment-id:2696835688 --> @ProjectMoon commented on GitHub (Mar 4, 2025): This still seems to be an issue with the final version of 0.5.13. Edit: And I'm not sure it has anything to do with ROCm. As the same seems to be happening on my machine with a small nvidia GPU. Thing just gets locked up; I would expect at least SOME response after a few minutes, even on CPU mode.
Author
Owner

@ProjectMoon commented on GitHub (Mar 5, 2025):

Closing this in favor of #9514 because I'm pretty sure that's the problem (I have flash attention and q8_0 cache).

<!-- gh-comment-id:2701135909 --> @ProjectMoon commented on GitHub (Mar 5, 2025): Closing this in favor of #9514 because I'm pretty sure that's the problem (I have flash attention and q8_0 cache).
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#68196