AMD RX 6900 XT - ggml_cuda_init: failed to initialize ROCm: no ROCm-capable device is detected #6855

New Issue

GiteaMirror · 2025-11-12T13:47:09-06:00

GiteaMirror commented

2025-11-12 13:47:09 -06:00

Originally created by @moophlo on GitHub (Apr 26, 2025).

What is the issue?

I'm running ollama:rocm container in a kubernetes cluster.
The worker node of the cluster has 2 GPU:

GPU0: RX7900XT
GPU1: RX6900XT

I'm setting in the deployment all the relevant variables as per documentation:

HIP_VISIBLE_DEVICES:1
HSA_OVERRIDE_GFX_VERSION:10.3.0
HCC_AMDGPU_TARGET:gfx1030
ROCR_VISIBLE_DEVICES:1

While starting I can see the GPU is properly detected and the other one is filtered out as expected, but then at a certain point:

time=2025-04-26T15:49:31.376Z level=DEBUG source=memory.go:108 msg=evaluating library=rocm gpu_count=1 available="[16.0 GiB]"
time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.vision.block_count default=0
time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.head_count_kv default=1
time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.key_length default=64
time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.value_length default=64
time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.head_count_kv default=1
time=2025-04-26T15:49:31.376Z level=INFO source=sched.go:722 msg="new model will fit in available VRAM in single GPU, loading" model=/home/ollama/models/.ollama/models/blobs/sha256-819c2adf5ce6df2b6bd2ae4ca90d2a69f060afeb438d0c171db57daa02e39c3d gpu=GPU-7685bd29c5086f3d parallel=1 available=17145991168 required="1.1 GiB"
time=2025-04-26T15:49:31.376Z level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="62.5 GiB" before.free="31.5 GiB" before.free_swap="0 B" now.total="62.5 GiB" now.free="31.5 GiB" now.free_swap="0 B"
time=2025-04-26T15:49:31.376Z level=DEBUG source=amd_linux.go:488 msg="updating rocm free memory" gpu=GPU-7685bd29c5086f3d name=1002:73bf before="16.0 GiB" now="16.0 GiB"
time=2025-04-26T15:49:31.376Z level=INFO source=server.go:105 msg="system memory" total="62.5 GiB" free="31.5 GiB" free_swap="0 B"
time=2025-04-26T15:49:31.376Z level=DEBUG source=memory.go:108 msg=evaluating library=rocm gpu_count=1 available="[16.0 GiB]"
time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.vision.block_count default=0
time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.head_count_kv default=1
time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.key_length default=64
time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.value_length default=64
time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.head_count_kv default=1
time=2025-04-26T15:49:31.376Z level=INFO source=server.go:138 msg=offload library=rocm layers.requested=-1 layers.model=25 layers.offload=25 layers.split="" memory.available="[16.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="1.1 GiB" memory.required.partial="1.1 GiB" memory.required.kv="3.0 MiB" memory.required.allocations="[1.1 GiB]" memory.weights.total="636.8 MiB" memory.weights.repeating="577.2 MiB" memory.weights.nonrepeating="59.6 MiB" memory.graph.full="8.0 MiB" memory.graph.partial="8.0 MiB"
time=2025-04-26T15:49:31.376Z level=DEBUG source=server.go:262 msg="compatible gpu libraries" compatible=[rocm]
llama_model_loader: loaded meta data with 23 key-value pairs and 389 tensors from /home/ollama/models/.ollama/models/blobs/sha256-819c2adf5ce6df2b6bd2ae4ca90d2a69f060afeb438d0c171db57daa02e39c3d (version GGUF V3 (latest))

At the moment I'm running in debug mode to obtain more verbosity.

One more element to add is that if I switch to GPU0 and I filter out the RX6900XT then is working properly and loading the model on the GPU and not falling back to the CPU backend.

Any help would be really appreciated

Relevant log output

2025/04/26 15:48:54 routes.go:1232: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES:1 HSA_OVERRIDE_GFX_VERSION:10.3.0 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/ollama/models/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[* http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:1 http_proxy: https_proxy: no_proxy:]"
time=2025-04-26T15:48:54.977Z level=INFO source=images.go:458 msg="total blobs: 52"
time=2025-04-26T15:48:54.978Z level=INFO source=images.go:465 msg="total unused blobs removed: 0"
time=2025-04-26T15:48:54.978Z level=INFO source=routes.go:1299 msg="Listening on [::]:11434 (version 0.6.6)"
time=2025-04-26T15:48:54.978Z level=DEBUG source=sched.go:107 msg="starting llm scheduler"
time=2025-04-26T15:48:54.978Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-04-26T15:48:54.979Z level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA"
time=2025-04-26T15:48:54.979Z level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=libcuda.so*
time=2025-04-26T15:48:54.979Z level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[/usr/lib/ollama/libcuda.so* /usr/local/nvidia/lib/libcuda.so* /usr/local/nvidia/lib64/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2025-04-26T15:48:55.048Z level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths=[]
time=2025-04-26T15:48:55.048Z level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=libcudart.so*
time=2025-04-26T15:48:55.048Z level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[/usr/lib/ollama/libcudart.so* /usr/local/nvidia/lib/libcudart.so* /usr/local/nvidia/lib64/libcudart.so* /usr/lib/ollama/cuda_v*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2025-04-26T15:48:55.049Z level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths=[]
time=2025-04-26T15:48:55.049Z level=DEBUG source=amd_linux.go:101 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2025-04-26T15:48:55.049Z level=DEBUG source=amd_linux.go:121 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2025-04-26T15:48:55.049Z level=DEBUG source=amd_linux.go:101 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2025-04-26T15:48:55.049Z level=DEBUG source=amd_linux.go:206 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=29772 unique_id=7433242190493591676
time=2025-04-26T15:48:55.049Z level=DEBUG source=amd_linux.go:240 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card1/device
time=2025-04-26T15:48:55.050Z level=DEBUG source=amd_linux.go:318 msg="amdgpu memory" gpu=0 total="20.0 GiB"
time=2025-04-26T15:48:55.050Z level=DEBUG source=amd_linux.go:319 msg="amdgpu memory" gpu=0 available="20.0 GiB"
time=2025-04-26T15:48:55.050Z level=INFO source=amd_linux.go:332 msg="filtering out device per user request" id=GPU-67282e63a5d1287c visible_devices=[1]
time=2025-04-26T15:48:55.050Z level=DEBUG source=amd_linux.go:101 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/2/properties"
time=2025-04-26T15:48:55.050Z level=DEBUG source=amd_linux.go:206 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/2/properties vendor=4098 device=29631 unique_id=8540440255474986813
time=2025-04-26T15:48:55.050Z level=DEBUG source=amd_linux.go:219 msg="failed to read sysfs node" file=/sys/class/drm/card1-DP-1/device/vendor error="open /sys/class/drm/card1-DP-1/device/vendor: no such file or directory"
time=2025-04-26T15:48:55.050Z level=DEBUG source=amd_linux.go:219 msg="failed to read sysfs node" file=/sys/class/drm/card1-DP-2/device/vendor error="open /sys/class/drm/card1-DP-2/device/vendor: no such file or directory"
time=2025-04-26T15:48:55.050Z level=DEBUG source=amd_linux.go:219 msg="failed to read sysfs node" file=/sys/class/drm/card1-DP-3/device/vendor error="open /sys/class/drm/card1-DP-3/device/vendor: no such file or directory"
time=2025-04-26T15:48:55.050Z level=DEBUG source=amd_linux.go:219 msg="failed to read sysfs node" file=/sys/class/drm/card1-HDMI-A-1/device/vendor error="open /sys/class/drm/card1-HDMI-A-1/device/vendor: no such file or directory"
time=2025-04-26T15:48:55.050Z level=DEBUG source=amd_linux.go:219 msg="failed to read sysfs node" file=/sys/class/drm/card1-Writeback-1/device/vendor error="open /sys/class/drm/card1-Writeback-1/device/vendor: no such file or directory"
time=2025-04-26T15:48:55.050Z level=DEBUG source=amd_linux.go:240 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/2/properties drm=/sys/class/drm/card2/device
time=2025-04-26T15:48:55.050Z level=DEBUG source=amd_linux.go:318 msg="amdgpu memory" gpu=1 total="16.0 GiB"
time=2025-04-26T15:48:55.050Z level=DEBUG source=amd_linux.go:319 msg="amdgpu memory" gpu=1 available="16.0 GiB"
time=2025-04-26T15:48:55.050Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/lib/ollama/rocm"
time=2025-04-26T15:48:55.050Z level=DEBUG source=amd_common.go:44 msg="detected ROCM next to ollama executable /usr/lib/ollama/rocm"
time=2025-04-26T15:48:55.050Z level=INFO source=amd_linux.go:389 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=10.3.0
time=2025-04-26T15:48:55.054Z level=INFO source=types.go:130 msg="inference compute" id=GPU-7685bd29c5086f3d library=rocm variant="" compute=gfx1030 driver=6.12 name=1002:73bf total="16.0 GiB" available="16.0 GiB"
[GIN] 2025/04/26 - 15:48:55 |[97;42m 200 [0m|      45.608µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:48:56 |[97;42m 200 [0m|      27.346µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:48:57 |[97;42m 200 [0m|      18.863µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:48:58 |[97;42m 200 [0m|      20.339µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:48:59 |[97;42m 200 [0m|      27.412µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:00 |[97;42m 200 [0m|      24.791µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:01 |[97;42m 200 [0m|      15.701µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:02 |[97;42m 200 [0m|      14.232µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:03 |[97;42m 200 [0m|      17.825µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:04 |[97;42m 200 [0m|      17.923µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:05 |[97;42m 200 [0m|      24.265µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:06 |[97;42m 200 [0m|      25.764µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:07 |[97;42m 200 [0m|      14.055µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:08 |[97;42m 200 [0m|      12.819µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:09 |[97;42m 200 [0m|      13.869µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:10 |[97;42m 200 [0m|      13.224µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:11 |[97;42m 200 [0m|      13.465µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:12 |[97;42m 200 [0m|      33.276µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:13 |[97;42m 200 [0m|      14.354µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:14 |[97;42m 200 [0m|      14.805µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:15 |[97;42m 200 [0m|      13.669µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:16 |[97;42m 200 [0m|      14.005µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:17 |[97;42m 200 [0m|      17.338µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:18 |[97;42m 200 [0m|      18.535µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:19 |[97;42m 200 [0m|      23.488µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:20 |[97;42m 200 [0m|      17.602µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:21 |[97;42m 200 [0m|      13.956µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:22 |[97;42m 200 [0m|      13.766µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:23 |[97;42m 200 [0m|      21.209µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:24 |[97;42m 200 [0m|       17.51µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:25 |[97;42m 200 [0m|      18.154µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:26 |[97;42m 200 [0m|     179.984µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:27 |[97;42m 200 [0m|       22.29µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:28 |[97;42m 200 [0m|      14.871µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:29 |[97;42m 200 [0m|       22.09µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:31 |[97;42m 200 [0m|      14.426µs |    192.168.10.1 |[97;44m GET     [0m "/"
time=2025-04-26T15:49:31.370Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-26T15:49:31.371Z level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="62.5 GiB" before.free="31.7 GiB" before.free_swap="0 B" now.total="62.5 GiB" now.free="31.5 GiB" now.free_swap="0 B"
time=2025-04-26T15:49:31.371Z level=DEBUG source=amd_linux.go:488 msg="updating rocm free memory" gpu=GPU-7685bd29c5086f3d name=1002:73bf before="16.0 GiB" now="16.0 GiB"
time=2025-04-26T15:49:31.371Z level=DEBUG source=sched.go:183 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1
time=2025-04-26T15:49:31.373Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-26T15:49:31.375Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-26T15:49:31.376Z level=DEBUG source=sched.go:226 msg="loading first model" model=/home/ollama/models/.ollama/models/blobs/sha256-819c2adf5ce6df2b6bd2ae4ca90d2a69f060afeb438d0c171db57daa02e39c3d
time=2025-04-26T15:49:31.376Z level=DEBUG source=memory.go:108 msg=evaluating library=rocm gpu_count=1 available="[16.0 GiB]"
time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.vision.block_count default=0
time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.head_count_kv default=1
time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.key_length default=64
time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.value_length default=64
time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.head_count_kv default=1
time=2025-04-26T15:49:31.376Z level=INFO source=sched.go:722 msg="new model will fit in available VRAM in single GPU, loading" model=/home/ollama/models/.ollama/models/blobs/sha256-819c2adf5ce6df2b6bd2ae4ca90d2a69f060afeb438d0c171db57daa02e39c3d gpu=GPU-7685bd29c5086f3d parallel=1 available=17145991168 required="1.1 GiB"
time=2025-04-26T15:49:31.376Z level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="62.5 GiB" before.free="31.5 GiB" before.free_swap="0 B" now.total="62.5 GiB" now.free="31.5 GiB" now.free_swap="0 B"
time=2025-04-26T15:49:31.376Z level=DEBUG source=amd_linux.go:488 msg="updating rocm free memory" gpu=GPU-7685bd29c5086f3d name=1002:73bf before="16.0 GiB" now="16.0 GiB"
time=2025-04-26T15:49:31.376Z level=INFO source=server.go:105 msg="system memory" total="62.5 GiB" free="31.5 GiB" free_swap="0 B"
time=2025-04-26T15:49:31.376Z level=DEBUG source=memory.go:108 msg=evaluating library=rocm gpu_count=1 available="[16.0 GiB]"
time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.vision.block_count default=0
time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.head_count_kv default=1
time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.key_length default=64
time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.value_length default=64
time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.head_count_kv default=1
time=2025-04-26T15:49:31.376Z level=INFO source=server.go:138 msg=offload library=rocm layers.requested=-1 layers.model=25 layers.offload=25 layers.split="" memory.available="[16.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="1.1 GiB" memory.required.partial="1.1 GiB" memory.required.kv="3.0 MiB" memory.required.allocations="[1.1 GiB]" memory.weights.total="636.8 MiB" memory.weights.repeating="577.2 MiB" memory.weights.nonrepeating="59.6 MiB" memory.graph.full="8.0 MiB" memory.graph.partial="8.0 MiB"
time=2025-04-26T15:49:31.376Z level=DEBUG source=server.go:262 msg="compatible gpu libraries" compatible=[rocm]
llama_model_loader: loaded meta data with 23 key-value pairs and 389 tensors from /home/ollama/models/.ollama/models/blobs/sha256-819c2adf5ce6df2b6bd2ae4ca90d2a69f060afeb438d0c171db57daa02e39c3d (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = bert
llama_model_loader: - kv   1:                               general.name str              = mxbai-embed-large-v1
llama_model_loader: - kv   2:                           bert.block_count u32              = 24
llama_model_loader: - kv   3:                        bert.context_length u32              = 512
llama_model_loader: - kv   4:                      bert.embedding_length u32              = 1024
llama_model_loader: - kv   5:                   bert.feed_forward_length u32              = 4096
llama_model_loader: - kv   6:                  bert.attention.head_count u32              = 16
llama_model_loader: - kv   7:          bert.attention.layer_norm_epsilon f32              = 0.000000
llama_model_loader: - kv   8:                          general.file_type u32              = 1
llama_model_loader: - kv   9:                      bert.attention.causal bool             = false
llama_model_loader: - kv  10:                          bert.pooling_type u32              = 2
llama_model_loader: - kv  11:            tokenizer.ggml.token_type_count u32              = 2
llama_model_loader: - kv  12:                tokenizer.ggml.bos_token_id u32              = 101
llama_model_loader: - kv  13:                tokenizer.ggml.eos_token_id u32              = 102
llama_model_loader: - kv  14:                       tokenizer.ggml.model str              = bert
llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,30522]   = ["[PAD]", "[unused0]", "[unused1]", "...
llama_model_loader: - kv  16:                      tokenizer.ggml.scores arr[f32,30522]   = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  17:                  tokenizer.ggml.token_type arr[i32,30522]   = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32              = 100
llama_model_loader: - kv  19:          tokenizer.ggml.seperator_token_id u32              = 102
llama_model_loader: - kv  20:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  21:                tokenizer.ggml.cls_token_id u32              = 101
llama_model_loader: - kv  22:               tokenizer.ggml.mask_token_id u32              = 103
llama_model_loader: - type  f32:  243 tensors
llama_model_loader: - type  f16:  146 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = F16
print_info: file size   = 637.85 MiB (16.02 BPW) 
init_tokenizer: initializing tokenizer for type 3
load: control token:    101 '[CLS]' is not marked as EOG
load: control token:    103 '[MASK]' is not marked as EOG
load: control token:      0 '[PAD]' is not marked as EOG
load: control token:    100 '[UNK]' is not marked as EOG
load: control token:    102 '[SEP]' is not marked as EOG
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special tokens cache size = 5
load: token to piece cache size = 0.2032 MB
print_info: arch             = bert
print_info: vocab_only       = 1
print_info: model type       = ?B
print_info: model params     = 334.09 M
print_info: general.name     = mxbai-embed-large-v1
print_info: vocab type       = WPM
print_info: n_vocab          = 30522
print_info: n_merges         = 0
print_info: BOS token        = 101 '[CLS]'
print_info: EOS token        = 102 '[SEP]'
print_info: UNK token        = 100 '[UNK]'
print_info: SEP token        = 102 '[SEP]'
print_info: PAD token        = 0 '[PAD]'
print_info: MASK token       = 103 '[MASK]'
print_info: LF token         = 0 '[PAD]'
print_info: EOG token        = 102 '[SEP]'
print_info: max token length = 21
llama_model_load: vocab only - skipping tensors
[GIN] 2025/04/26 - 15:49:32 |[97;42m 200 [0m|      23.128µs |    192.168.10.1 |[97;44m GET     [0m "/"
time=2025-04-26T15:49:32.367Z level=DEBUG source=server.go:335 msg="adding gpu library" path=/usr/lib/ollama/rocm
time=2025-04-26T15:49:32.367Z level=DEBUG source=server.go:343 msg="adding gpu dependency paths" paths=[/usr/lib/ollama/rocm]
time=2025-04-26T15:49:32.368Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/bin/ollama runner --model /home/ollama/models/.ollama/models/blobs/sha256-819c2adf5ce6df2b6bd2ae4ca90d2a69f060afeb438d0c171db57daa02e39c3d --ctx-size 512 --batch-size 512 --n-gpu-layers 25 --verbose --threads 8 --parallel 1 --port 21385"
time=2025-04-26T15:49:32.368Z level=DEBUG source=server.go:423 msg=subprocess environment="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin LD_LIBRARY_PATH=/usr/lib/ollama/rocm:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/lib/ollama/rocm:/usr/lib/ollama HIP_VISIBLE_DEVICES=1 ROCR_VISIBLE_DEVICES=GPU-7685bd29c5086f3d ROCM_VISIBLE_DEVICES=1 HSA_OVERRIDE_GFX_VERSION=10.3.0]"
time=2025-04-26T15:49:32.368Z level=INFO source=sched.go:451 msg="loaded runners" count=1
time=2025-04-26T15:49:32.368Z level=INFO source=server.go:580 msg="waiting for llama runner to start responding"
time=2025-04-26T15:49:32.368Z level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error"
time=2025-04-26T15:49:32.377Z level=INFO source=runner.go:853 msg="starting go runner"
time=2025-04-26T15:49:32.377Z level=DEBUG source=ggml.go:99 msg="ggml backend load all from path" path=/usr/lib/ollama/rocm
[GIN] 2025/04/26 - 15:49:33 |[97;42m 200 [0m|      19.852µs |    192.168.10.1 |[97;44m GET     [0m "/"
/opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
/opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
ggml_cuda_init: failed to initialize ROCm: no ROCm-capable device is detected
load_backend: loaded ROCm backend from /usr/lib/ollama/rocm/libggml-hip.so
time=2025-04-26T15:49:33.155Z level=DEBUG source=ggml.go:93 msg="skipping path which is not part of ollama" path=/usr/local/nvidia/lib
time=2025-04-26T15:49:33.155Z level=DEBUG source=ggml.go:93 msg="skipping path which is not part of ollama" path=/usr/local/nvidia/lib64
time=2025-04-26T15:49:33.155Z level=DEBUG source=ggml.go:99 msg="ggml backend load all from path" path=/usr/lib/ollama
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-alderlake.so
time=2025-04-26T15:49:33.156Z level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
time=2025-04-26T15:49:33.156Z level=INFO source=runner.go:913 msg="Server listening on 127.0.0.1:21385"
llama_model_loader: loaded meta data with 23 key-value pairs and 389 tensors from /home/ollama/models/.ollama/models/blobs/sha256-819c2adf5ce6df2b6bd2ae4ca90d2a69f060afeb438d0c171db57daa02e39c3d (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = bert
llama_model_loader: - kv   1:                               general.name str              = mxbai-embed-large-v1
llama_model_loader: - kv   2:                           bert.block_count u32              = 24
llama_model_loader: - kv   3:                        bert.context_length u32              = 512
llama_model_loader: - kv   4:                      bert.embedding_length u32              = 1024
llama_model_loader: - kv   5:                   bert.feed_forward_length u32              = 4096
llama_model_loader: - kv   6:                  bert.attention.head_count u32              = 16
llama_model_loader: - kv   7:          bert.attention.layer_norm_epsilon f32              = 0.000000
llama_model_loader: - kv   8:                          general.file_type u32              = 1
llama_model_loader: - kv   9:                      bert.attention.causal bool             = false
llama_model_loader: - kv  10:                          bert.pooling_type u32              = 2
llama_model_loader: - kv  11:            tokenizer.ggml.token_type_count u32              = 2
llama_model_loader: - kv  12:                tokenizer.ggml.bos_token_id u32              = 101
llama_model_loader: - kv  13:                tokenizer.ggml.eos_token_id u32              = 102
llama_model_loader: - kv  14:                       tokenizer.ggml.model str              = bert
llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,30522]   = ["[PAD]", "[unused0]", "[unused1]", "...
llama_model_loader: - kv  16:                      tokenizer.ggml.scores arr[f32,30522]   = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  17:                  tokenizer.ggml.token_type arr[i32,30522]   = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32              = 100
llama_model_loader: - kv  19:          tokenizer.ggml.seperator_token_id u32              = 102
llama_model_loader: - kv  20:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  21:                tokenizer.ggml.cls_token_id u32              = 101
llama_model_loader: - kv  22:               tokenizer.ggml.mask_token_id u32              = 103
llama_model_loader: - type  f32:  243 tensors
llama_model_loader: - type  f16:  146 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = F16
print_info: file size   = 637.85 MiB (16.02 BPW) 
init_tokenizer: initializing tokenizer for type 3
load: control token:    101 '[CLS]' is not marked as EOG
load: control token:    103 '[MASK]' is not marked as EOG
load: control token:      0 '[PAD]' is not marked as EOG
load: control token:    100 '[UNK]' is not marked as EOG
load: control token:    102 '[SEP]' is not marked as EOG
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special tokens cache size = 5
load: token to piece cache size = 0.2032 MB
print_info: arch             = bert
print_info: vocab_only       = 0
print_info: n_ctx_train      = 512
print_info: n_embd           = 1024
print_info: n_layer          = 24
print_info: n_head           = 16
print_info: n_head_kv        = 16
print_info: n_rot            = 64
print_info: n_swa            = 0
print_info: n_swa_pattern    = 1
print_info: n_embd_head_k    = 64
print_info: n_embd_head_v    = 64
print_info: n_gqa            = 1
print_info: n_embd_k_gqa     = 1024
print_info: n_embd_v_gqa     = 1024
print_info: f_norm_eps       = 1.0e-12
print_info: f_norm_rms_eps   = 0.0e+00
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 4096
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 0
print_info: pooling type     = 2
print_info: rope type        = 2
print_info: rope scaling     = linear
print_info: freq_base_train  = 10000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 512
print_info: rope_finetuned   = unknown
print_info: ssm_d_conv       = 0
print_info: ssm_d_inner      = 0
print_info: ssm_d_state      = 0
print_info: ssm_dt_rank      = 0
print_info: ssm_dt_b_c_rms   = 0
print_info: model type       = 335M
print_info: model params     = 334.09 M
print_info: general.name     = mxbai-embed-large-v1
print_info: vocab type       = WPM
print_info: n_vocab          = 30522
print_info: n_merges         = 0
print_info: BOS token        = 101 '[CLS]'
print_info: EOS token        = 102 '[SEP]'
print_info: UNK token        = 100 '[UNK]'
print_info: SEP token        = 102 '[SEP]'
print_info: PAD token        = 0 '[PAD]'
print_info: MASK token       = 103 '[MASK]'
print_info: LF token         = 0 '[PAD]'
print_info: EOG token        = 102 '[SEP]'
print_info: max token length = 21
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: layer   0 assigned to device CPU, is_swa = 0
load_tensors: layer   1 assigned to device CPU, is_swa = 0
load_tensors: layer   2 assigned to device CPU, is_swa = 0
load_tensors: layer   3 assigned to device CPU, is_swa = 0
load_tensors: layer   4 assigned to device CPU, is_swa = 0
load_tensors: layer   5 assigned to device CPU, is_swa = 0
load_tensors: layer   6 assigned to device CPU, is_swa = 0
load_tensors: layer   7 assigned to device CPU, is_swa = 0
load_tensors: layer   8 assigned to device CPU, is_swa = 0
load_tensors: layer   9 assigned to device CPU, is_swa = 0
load_tensors: layer  10 assigned to device CPU, is_swa = 0
load_tensors: layer  11 assigned to device CPU, is_swa = 0
load_tensors: layer  12 assigned to device CPU, is_swa = 0
load_tensors: layer  13 assigned to device CPU, is_swa = 0
load_tensors: layer  14 assigned to device CPU, is_swa = 0
load_tensors: layer  15 assigned to device CPU, is_swa = 0
load_tensors: layer  16 assigned to device CPU, is_swa = 0
load_tensors: layer  17 assigned to device CPU, is_swa = 0
load_tensors: layer  18 assigned to device CPU, is_swa = 0
load_tensors: layer  19 assigned to device CPU, is_swa = 0
load_tensors: layer  20 assigned to device CPU, is_swa = 0
load_tensors: layer  21 assigned to device CPU, is_swa = 0
load_tensors: layer  22 assigned to device CPU, is_swa = 0
load_tensors: layer  23 assigned to device CPU, is_swa = 0
load_tensors: layer  24 assigned to device CPU, is_swa = 0
time=2025-04-26T15:49:33.372Z level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server loading model"
[GIN] 2025/04/26 - 15:49:34 |[97;42m 200 [0m|       16.28µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:35 |[97;42m 200 [0m|      12.119µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:36 |[97;42m 200 [0m|      12.501µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:37 |[97;42m 200 [0m|      16.986µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:38 |[97;42m 200 [0m|      18.783µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:39 |[97;42m 200 [0m|      13.436µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:40 |[97;42m 200 [0m|      19.586µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:41 |[97;42m 200 [0m|      23.479µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:42 |[97;42m 200 [0m|      23.868µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:43 |[97;42m 200 [0m|      13.501µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:44 |[97;42m 200 [0m|      14.025µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:45 |[97;42m 200 [0m|       13.98µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:46 |[97;42m 200 [0m|      22.553µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:47 |[97;42m 200 [0m|      13.092µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:48 |[97;42m 200 [0m|      19.752µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:49 |[97;42m 200 [0m|      12.943µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:50 |[97;42m 200 [0m|      13.455µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:51 |[97;42m 200 [0m|      12.638µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:52 |[97;42m 200 [0m|      21.598µs |    192.168.10.1 |[97;44m GET     [0m "/"
load_tensors:   CPU_Mapped model buffer size =   637.85 MiB
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 512
llama_context: n_ctx_per_seq = 512
llama_context: n_batch       = 512
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 0
llama_context: flash_attn    = 0
llama_context: freq_base     = 10000.0
llama_context: freq_scale    = 1
set_abort_callback: call
llama_context:        CPU  output buffer size =     0.00 MiB
llama_context: n_ctx = 512
llama_context: n_ctx = 512 (padded)
init: kv_size = 512, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 24, can_shift = 1
init: layer   0: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU
init: layer   1: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU
init: layer   2: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU
init: layer   3: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU
init: layer   4: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU
init: layer   5: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU
init: layer   6: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU
init: layer   7: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU
init: layer   8: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU
init: layer   9: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU
init: layer  10: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU
init: layer  11: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU
init: layer  12: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU
init: layer  13: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU
init: layer  14: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU
init: layer  15: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU
init: layer  16: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU
init: layer  17: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU
init: layer  18: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU
init: layer  19: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU
init: layer  20: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU
init: layer  21: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU
init: layer  22: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU
init: layer  23: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU
init:        CPU KV buffer size =    48.00 MiB
llama_context: KV self size  =   48.00 MiB, K (f16):   24.00 MiB, V (f16):   24.00 MiB
llama_context: enumerating backends
llama_context: backend_ptrs.size() = 1
llama_context: max_nodes = 65536
llama_context: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 0
llama_context: reserving graph for n_tokens = 512, n_seqs = 1
llama_context: reserving graph for n_tokens = 1, n_seqs = 1
llama_context: reserving graph for n_tokens = 512, n_seqs = 1
llama_context:        CPU compute buffer size =    27.01 MiB
llama_context: graph nodes  = 825
llama_context: graph splits = 1
time=2025-04-26T15:49:53.446Z level=INFO source=server.go:619 msg="llama runner started in 21.08 seconds"
time=2025-04-26T15:49:53.446Z level=DEBUG source=sched.go:464 msg="finished setting up runner" model=/home/ollama/models/.ollama/models/blobs/sha256-819c2adf5ce6df2b6bd2ae4ca90d2a69f060afeb438d0c171db57daa02e39c3d
time=2025-04-26T15:49:53.450Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32
time=2025-04-26T15:49:53.452Z level=DEBUG source=runner.go:686 msg="embedding request" content="tell me more"
time=2025-04-26T15:49:53.454Z level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=5 used=0 remaining=5
[GIN] 2025/04/26 - 15:49:53 |[97;42m 200 [0m| 22.671213222s |  173.249.47.211 |[97;46m POST    [0m "/api/embed"
time=2025-04-26T15:49:53.477Z level=DEBUG source=sched.go:468 msg="context for request finished"
time=2025-04-26T15:49:53.477Z level=DEBUG source=sched.go:341 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/home/ollama/models/.ollama/models/blobs/sha256-819c2adf5ce6df2b6bd2ae4ca90d2a69f060afeb438d0c171db57daa02e39c3d duration=5m0s
time=2025-04-26T15:49:53.477Z level=DEBUG source=sched.go:359 msg="after processing request finished event" modelPath=/home/ollama/models/.ollama/models/blobs/sha256-819c2adf5ce6df2b6bd2ae4ca90d2a69f060afeb438d0c171db57daa02e39c3d refCount=0
[GIN] 2025/04/26 - 15:49:53 |[97;42m 200 [0m|      15.649µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:54 |[97;42m 200 [0m|      19.318µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:55 |[97;42m 200 [0m|      13.768µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:56 |[97;42m 200 [0m|      18.993µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:57 |[97;42m 200 [0m|      15.537µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:58 |[97;42m 200 [0m|      14.674µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:49:59 |[97;42m 200 [0m|       14.12µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:50:00 |[97;42m 200 [0m|       13.63µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:50:01 |[97;42m 200 [0m|      13.562µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:50:02 |[97;42m 200 [0m|      14.125µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:50:03 |[97;42m 200 [0m|      14.033µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:50:04 |[97;42m 200 [0m|      13.703µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:50:05 |[97;42m 200 [0m|       14.48µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:50:06 |[97;42m 200 [0m|      16.471µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:50:07 |[97;42m 200 [0m|      15.057µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:50:08 |[97;42m 200 [0m|      14.551µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:50:09 |[97;42m 200 [0m|      13.898µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:50:10 |[97;42m 200 [0m|      18.579µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:50:12 |[97;42m 200 [0m|      14.124µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:50:13 |[97;42m 200 [0m|      13.713µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:50:14 |[97;42m 200 [0m|      16.543µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:50:15 |[97;42m 200 [0m|      22.086µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:50:16 |[97;42m 200 [0m|      13.266µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:50:17 |[97;42m 200 [0m|      13.639µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:50:18 |[97;42m 200 [0m|      14.333µs |    192.168.10.1 |[97;44m GET     [0m "/"
[GIN] 2025/04/26 - 15:50:19 |[97;42m 200 [0m|      14.326µs |    192.168.10.1 |[97;44m GET     [0m "/"

OS

OS: Linux Mint 22.1 x86_64

GPU

GPU0: AMD ATI Radeon RX 7900 XT/7900 XTX/7900 GRE/7900M
GPU1: AMD ATI Radeon RX 6800/6800 XT / 6900 XT

CPU

CPU: Intel i9-14900K (32) @ 5.700GHz

Ollama version

ollama/ollama:0.6.6-rocm

Originally created by @moophlo on GitHub (Apr 26, 2025). ### What is the issue? I'm running ollama:rocm container in a kubernetes cluster. The worker node of the cluster has 2 GPU: GPU0: RX7900XT GPU1: RX6900XT I'm setting in the deployment all the relevant variables as per documentation: HIP_VISIBLE_DEVICES:1 HSA_OVERRIDE_GFX_VERSION:10.3.0 HCC_AMDGPU_TARGET:gfx1030 ROCR_VISIBLE_DEVICES:1 While starting I can see the GPU is properly detected and the other one is filtered out as expected, but then at a certain point: ``` time=2025-04-26T15:49:31.376Z level=DEBUG source=memory.go:108 msg=evaluating library=rocm gpu_count=1 available="[16.0 GiB]" time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.vision.block_count default=0 time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.head_count_kv default=1 time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.key_length default=64 time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.value_length default=64 time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.head_count_kv default=1 time=2025-04-26T15:49:31.376Z level=INFO source=sched.go:722 msg="new model will fit in available VRAM in single GPU, loading" model=/home/ollama/models/.ollama/models/blobs/sha256-819c2adf5ce6df2b6bd2ae4ca90d2a69f060afeb438d0c171db57daa02e39c3d gpu=GPU-7685bd29c5086f3d parallel=1 available=17145991168 required="1.1 GiB" time=2025-04-26T15:49:31.376Z level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="62.5 GiB" before.free="31.5 GiB" before.free_swap="0 B" now.total="62.5 GiB" now.free="31.5 GiB" now.free_swap="0 B" time=2025-04-26T15:49:31.376Z level=DEBUG source=amd_linux.go:488 msg="updating rocm free memory" gpu=GPU-7685bd29c5086f3d name=1002:73bf before="16.0 GiB" now="16.0 GiB" time=2025-04-26T15:49:31.376Z level=INFO source=server.go:105 msg="system memory" total="62.5 GiB" free="31.5 GiB" free_swap="0 B" time=2025-04-26T15:49:31.376Z level=DEBUG source=memory.go:108 msg=evaluating library=rocm gpu_count=1 available="[16.0 GiB]" time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.vision.block_count default=0 time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.head_count_kv default=1 time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.key_length default=64 time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.value_length default=64 time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.head_count_kv default=1 time=2025-04-26T15:49:31.376Z level=INFO source=server.go:138 msg=offload library=rocm layers.requested=-1 layers.model=25 layers.offload=25 layers.split="" memory.available="[16.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="1.1 GiB" memory.required.partial="1.1 GiB" memory.required.kv="3.0 MiB" memory.required.allocations="[1.1 GiB]" memory.weights.total="636.8 MiB" memory.weights.repeating="577.2 MiB" memory.weights.nonrepeating="59.6 MiB" memory.graph.full="8.0 MiB" memory.graph.partial="8.0 MiB" time=2025-04-26T15:49:31.376Z level=DEBUG source=server.go:262 msg="compatible gpu libraries" compatible=[rocm] llama_model_loader: loaded meta data with 23 key-value pairs and 389 tensors from /home/ollama/models/.ollama/models/blobs/sha256-819c2adf5ce6df2b6bd2ae4ca90d2a69f060afeb438d0c171db57daa02e39c3d (version GGUF V3 (latest)) ``` At the moment I'm running in debug mode to obtain more verbosity. One more element to add is that if I switch to GPU0 and I filter out the RX6900XT then is working properly and loading the model on the GPU and not falling back to the CPU backend. Any help would be really appreciated ### Relevant log output ```shell 2025/04/26 15:48:54 routes.go:1232: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES:1 HSA_OVERRIDE_GFX_VERSION:10.3.0 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/ollama/models/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[* http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:1 http_proxy: https_proxy: no_proxy:]" time=2025-04-26T15:48:54.977Z level=INFO source=images.go:458 msg="total blobs: 52" time=2025-04-26T15:48:54.978Z level=INFO source=images.go:465 msg="total unused blobs removed: 0" time=2025-04-26T15:48:54.978Z level=INFO source=routes.go:1299 msg="Listening on [::]:11434 (version 0.6.6)" time=2025-04-26T15:48:54.978Z level=DEBUG source=sched.go:107 msg="starting llm scheduler" time=2025-04-26T15:48:54.978Z level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-04-26T15:48:54.979Z level=DEBUG source=gpu.go:98 msg="searching for GPU discovery libraries for NVIDIA" time=2025-04-26T15:48:54.979Z level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=libcuda.so* time=2025-04-26T15:48:54.979Z level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[/usr/lib/ollama/libcuda.so* /usr/local/nvidia/lib/libcuda.so* /usr/local/nvidia/lib64/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2025-04-26T15:48:55.048Z level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths=[] time=2025-04-26T15:48:55.048Z level=DEBUG source=gpu.go:501 msg="Searching for GPU library" name=libcudart.so* time=2025-04-26T15:48:55.048Z level=DEBUG source=gpu.go:525 msg="gpu library search" globs="[/usr/lib/ollama/libcudart.so* /usr/local/nvidia/lib/libcudart.so* /usr/local/nvidia/lib64/libcudart.so* /usr/lib/ollama/cuda_v*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2025-04-26T15:48:55.049Z level=DEBUG source=gpu.go:558 msg="discovered GPU libraries" paths=[] time=2025-04-26T15:48:55.049Z level=DEBUG source=amd_linux.go:101 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2025-04-26T15:48:55.049Z level=DEBUG source=amd_linux.go:121 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2025-04-26T15:48:55.049Z level=DEBUG source=amd_linux.go:101 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2025-04-26T15:48:55.049Z level=DEBUG source=amd_linux.go:206 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=29772 unique_id=7433242190493591676 time=2025-04-26T15:48:55.049Z level=DEBUG source=amd_linux.go:240 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card1/device time=2025-04-26T15:48:55.050Z level=DEBUG source=amd_linux.go:318 msg="amdgpu memory" gpu=0 total="20.0 GiB" time=2025-04-26T15:48:55.050Z level=DEBUG source=amd_linux.go:319 msg="amdgpu memory" gpu=0 available="20.0 GiB" time=2025-04-26T15:48:55.050Z level=INFO source=amd_linux.go:332 msg="filtering out device per user request" id=GPU-67282e63a5d1287c visible_devices=[1] time=2025-04-26T15:48:55.050Z level=DEBUG source=amd_linux.go:101 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/2/properties" time=2025-04-26T15:48:55.050Z level=DEBUG source=amd_linux.go:206 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/2/properties vendor=4098 device=29631 unique_id=8540440255474986813 time=2025-04-26T15:48:55.050Z level=DEBUG source=amd_linux.go:219 msg="failed to read sysfs node" file=/sys/class/drm/card1-DP-1/device/vendor error="open /sys/class/drm/card1-DP-1/device/vendor: no such file or directory" time=2025-04-26T15:48:55.050Z level=DEBUG source=amd_linux.go:219 msg="failed to read sysfs node" file=/sys/class/drm/card1-DP-2/device/vendor error="open /sys/class/drm/card1-DP-2/device/vendor: no such file or directory" time=2025-04-26T15:48:55.050Z level=DEBUG source=amd_linux.go:219 msg="failed to read sysfs node" file=/sys/class/drm/card1-DP-3/device/vendor error="open /sys/class/drm/card1-DP-3/device/vendor: no such file or directory" time=2025-04-26T15:48:55.050Z level=DEBUG source=amd_linux.go:219 msg="failed to read sysfs node" file=/sys/class/drm/card1-HDMI-A-1/device/vendor error="open /sys/class/drm/card1-HDMI-A-1/device/vendor: no such file or directory" time=2025-04-26T15:48:55.050Z level=DEBUG source=amd_linux.go:219 msg="failed to read sysfs node" file=/sys/class/drm/card1-Writeback-1/device/vendor error="open /sys/class/drm/card1-Writeback-1/device/vendor: no such file or directory" time=2025-04-26T15:48:55.050Z level=DEBUG source=amd_linux.go:240 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/2/properties drm=/sys/class/drm/card2/device time=2025-04-26T15:48:55.050Z level=DEBUG source=amd_linux.go:318 msg="amdgpu memory" gpu=1 total="16.0 GiB" time=2025-04-26T15:48:55.050Z level=DEBUG source=amd_linux.go:319 msg="amdgpu memory" gpu=1 available="16.0 GiB" time=2025-04-26T15:48:55.050Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/lib/ollama/rocm" time=2025-04-26T15:48:55.050Z level=DEBUG source=amd_common.go:44 msg="detected ROCM next to ollama executable /usr/lib/ollama/rocm" time=2025-04-26T15:48:55.050Z level=INFO source=amd_linux.go:389 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=10.3.0 time=2025-04-26T15:48:55.054Z level=INFO source=types.go:130 msg="inference compute" id=GPU-7685bd29c5086f3d library=rocm variant="" compute=gfx1030 driver=6.12 name=1002:73bf total="16.0 GiB" available="16.0 GiB" [GIN] 2025/04/26 - 15:48:55 |[97;42m 200 [0m| 45.608µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:48:56 |[97;42m 200 [0m| 27.346µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:48:57 |[97;42m 200 [0m| 18.863µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:48:58 |[97;42m 200 [0m| 20.339µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:48:59 |[97;42m 200 [0m| 27.412µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:00 |[97;42m 200 [0m| 24.791µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:01 |[97;42m 200 [0m| 15.701µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:02 |[97;42m 200 [0m| 14.232µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:03 |[97;42m 200 [0m| 17.825µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:04 |[97;42m 200 [0m| 17.923µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:05 |[97;42m 200 [0m| 24.265µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:06 |[97;42m 200 [0m| 25.764µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:07 |[97;42m 200 [0m| 14.055µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:08 |[97;42m 200 [0m| 12.819µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:09 |[97;42m 200 [0m| 13.869µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:10 |[97;42m 200 [0m| 13.224µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:11 |[97;42m 200 [0m| 13.465µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:12 |[97;42m 200 [0m| 33.276µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:13 |[97;42m 200 [0m| 14.354µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:14 |[97;42m 200 [0m| 14.805µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:15 |[97;42m 200 [0m| 13.669µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:16 |[97;42m 200 [0m| 14.005µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:17 |[97;42m 200 [0m| 17.338µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:18 |[97;42m 200 [0m| 18.535µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:19 |[97;42m 200 [0m| 23.488µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:20 |[97;42m 200 [0m| 17.602µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:21 |[97;42m 200 [0m| 13.956µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:22 |[97;42m 200 [0m| 13.766µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:23 |[97;42m 200 [0m| 21.209µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:24 |[97;42m 200 [0m| 17.51µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:25 |[97;42m 200 [0m| 18.154µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:26 |[97;42m 200 [0m| 179.984µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:27 |[97;42m 200 [0m| 22.29µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:28 |[97;42m 200 [0m| 14.871µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:29 |[97;42m 200 [0m| 22.09µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:31 |[97;42m 200 [0m| 14.426µs | 192.168.10.1 |[97;44m GET [0m "/" time=2025-04-26T15:49:31.370Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-04-26T15:49:31.371Z level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="62.5 GiB" before.free="31.7 GiB" before.free_swap="0 B" now.total="62.5 GiB" now.free="31.5 GiB" now.free_swap="0 B" time=2025-04-26T15:49:31.371Z level=DEBUG source=amd_linux.go:488 msg="updating rocm free memory" gpu=GPU-7685bd29c5086f3d name=1002:73bf before="16.0 GiB" now="16.0 GiB" time=2025-04-26T15:49:31.371Z level=DEBUG source=sched.go:183 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1 time=2025-04-26T15:49:31.373Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-04-26T15:49:31.375Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-04-26T15:49:31.376Z level=DEBUG source=sched.go:226 msg="loading first model" model=/home/ollama/models/.ollama/models/blobs/sha256-819c2adf5ce6df2b6bd2ae4ca90d2a69f060afeb438d0c171db57daa02e39c3d time=2025-04-26T15:49:31.376Z level=DEBUG source=memory.go:108 msg=evaluating library=rocm gpu_count=1 available="[16.0 GiB]" time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.vision.block_count default=0 time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.head_count_kv default=1 time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.key_length default=64 time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.value_length default=64 time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.head_count_kv default=1 time=2025-04-26T15:49:31.376Z level=INFO source=sched.go:722 msg="new model will fit in available VRAM in single GPU, loading" model=/home/ollama/models/.ollama/models/blobs/sha256-819c2adf5ce6df2b6bd2ae4ca90d2a69f060afeb438d0c171db57daa02e39c3d gpu=GPU-7685bd29c5086f3d parallel=1 available=17145991168 required="1.1 GiB" time=2025-04-26T15:49:31.376Z level=DEBUG source=gpu.go:391 msg="updating system memory data" before.total="62.5 GiB" before.free="31.5 GiB" before.free_swap="0 B" now.total="62.5 GiB" now.free="31.5 GiB" now.free_swap="0 B" time=2025-04-26T15:49:31.376Z level=DEBUG source=amd_linux.go:488 msg="updating rocm free memory" gpu=GPU-7685bd29c5086f3d name=1002:73bf before="16.0 GiB" now="16.0 GiB" time=2025-04-26T15:49:31.376Z level=INFO source=server.go:105 msg="system memory" total="62.5 GiB" free="31.5 GiB" free_swap="0 B" time=2025-04-26T15:49:31.376Z level=DEBUG source=memory.go:108 msg=evaluating library=rocm gpu_count=1 available="[16.0 GiB]" time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.vision.block_count default=0 time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.head_count_kv default=1 time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.key_length default=64 time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.value_length default=64 time=2025-04-26T15:49:31.376Z level=WARN source=ggml.go:152 msg="key not found" key=bert.attention.head_count_kv default=1 time=2025-04-26T15:49:31.376Z level=INFO source=server.go:138 msg=offload library=rocm layers.requested=-1 layers.model=25 layers.offload=25 layers.split="" memory.available="[16.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="1.1 GiB" memory.required.partial="1.1 GiB" memory.required.kv="3.0 MiB" memory.required.allocations="[1.1 GiB]" memory.weights.total="636.8 MiB" memory.weights.repeating="577.2 MiB" memory.weights.nonrepeating="59.6 MiB" memory.graph.full="8.0 MiB" memory.graph.partial="8.0 MiB" time=2025-04-26T15:49:31.376Z level=DEBUG source=server.go:262 msg="compatible gpu libraries" compatible=[rocm] llama_model_loader: loaded meta data with 23 key-value pairs and 389 tensors from /home/ollama/models/.ollama/models/blobs/sha256-819c2adf5ce6df2b6bd2ae4ca90d2a69f060afeb438d0c171db57daa02e39c3d (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = bert llama_model_loader: - kv 1: general.name str = mxbai-embed-large-v1 llama_model_loader: - kv 2: bert.block_count u32 = 24 llama_model_loader: - kv 3: bert.context_length u32 = 512 llama_model_loader: - kv 4: bert.embedding_length u32 = 1024 llama_model_loader: - kv 5: bert.feed_forward_length u32 = 4096 llama_model_loader: - kv 6: bert.attention.head_count u32 = 16 llama_model_loader: - kv 7: bert.attention.layer_norm_epsilon f32 = 0.000000 llama_model_loader: - kv 8: general.file_type u32 = 1 llama_model_loader: - kv 9: bert.attention.causal bool = false llama_model_loader: - kv 10: bert.pooling_type u32 = 2 llama_model_loader: - kv 11: tokenizer.ggml.token_type_count u32 = 2 llama_model_loader: - kv 12: tokenizer.ggml.bos_token_id u32 = 101 llama_model_loader: - kv 13: tokenizer.ggml.eos_token_id u32 = 102 llama_model_loader: - kv 14: tokenizer.ggml.model str = bert llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,30522] = ["[PAD]", "[unused0]", "[unused1]", "... llama_model_loader: - kv 16: tokenizer.ggml.scores arr[f32,30522] = [-1000.000000, -1000.000000, -1000.00... llama_model_loader: - kv 17: tokenizer.ggml.token_type arr[i32,30522] = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 100 llama_model_loader: - kv 19: tokenizer.ggml.seperator_token_id u32 = 102 llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 21: tokenizer.ggml.cls_token_id u32 = 101 llama_model_loader: - kv 22: tokenizer.ggml.mask_token_id u32 = 103 llama_model_loader: - type f32: 243 tensors llama_model_loader: - type f16: 146 tensors print_info: file format = GGUF V3 (latest) print_info: file type = F16 print_info: file size = 637.85 MiB (16.02 BPW) init_tokenizer: initializing tokenizer for type 3 load: control token: 101 '[CLS]' is not marked as EOG load: control token: 103 '[MASK]' is not marked as EOG load: control token: 0 '[PAD]' is not marked as EOG load: control token: 100 '[UNK]' is not marked as EOG load: control token: 102 '[SEP]' is not marked as EOG load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect load: special tokens cache size = 5 load: token to piece cache size = 0.2032 MB print_info: arch = bert print_info: vocab_only = 1 print_info: model type = ?B print_info: model params = 334.09 M print_info: general.name = mxbai-embed-large-v1 print_info: vocab type = WPM print_info: n_vocab = 30522 print_info: n_merges = 0 print_info: BOS token = 101 '[CLS]' print_info: EOS token = 102 '[SEP]' print_info: UNK token = 100 '[UNK]' print_info: SEP token = 102 '[SEP]' print_info: PAD token = 0 '[PAD]' print_info: MASK token = 103 '[MASK]' print_info: LF token = 0 '[PAD]' print_info: EOG token = 102 '[SEP]' print_info: max token length = 21 llama_model_load: vocab only - skipping tensors [GIN] 2025/04/26 - 15:49:32 |[97;42m 200 [0m| 23.128µs | 192.168.10.1 |[97;44m GET [0m "/" time=2025-04-26T15:49:32.367Z level=DEBUG source=server.go:335 msg="adding gpu library" path=/usr/lib/ollama/rocm time=2025-04-26T15:49:32.367Z level=DEBUG source=server.go:343 msg="adding gpu dependency paths" paths=[/usr/lib/ollama/rocm] time=2025-04-26T15:49:32.368Z level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/bin/ollama runner --model /home/ollama/models/.ollama/models/blobs/sha256-819c2adf5ce6df2b6bd2ae4ca90d2a69f060afeb438d0c171db57daa02e39c3d --ctx-size 512 --batch-size 512 --n-gpu-layers 25 --verbose --threads 8 --parallel 1 --port 21385" time=2025-04-26T15:49:32.368Z level=DEBUG source=server.go:423 msg=subprocess environment="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin LD_LIBRARY_PATH=/usr/lib/ollama/rocm:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/lib/ollama/rocm:/usr/lib/ollama HIP_VISIBLE_DEVICES=1 ROCR_VISIBLE_DEVICES=GPU-7685bd29c5086f3d ROCM_VISIBLE_DEVICES=1 HSA_OVERRIDE_GFX_VERSION=10.3.0]" time=2025-04-26T15:49:32.368Z level=INFO source=sched.go:451 msg="loaded runners" count=1 time=2025-04-26T15:49:32.368Z level=INFO source=server.go:580 msg="waiting for llama runner to start responding" time=2025-04-26T15:49:32.368Z level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error" time=2025-04-26T15:49:32.377Z level=INFO source=runner.go:853 msg="starting go runner" time=2025-04-26T15:49:32.377Z level=DEBUG source=ggml.go:99 msg="ggml backend load all from path" path=/usr/lib/ollama/rocm [GIN] 2025/04/26 - 15:49:33 |[97;42m 200 [0m| 19.852µs | 192.168.10.1 |[97;44m GET [0m "/" /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory ggml_cuda_init: failed to initialize ROCm: no ROCm-capable device is detected load_backend: loaded ROCm backend from /usr/lib/ollama/rocm/libggml-hip.so time=2025-04-26T15:49:33.155Z level=DEBUG source=ggml.go:93 msg="skipping path which is not part of ollama" path=/usr/local/nvidia/lib time=2025-04-26T15:49:33.155Z level=DEBUG source=ggml.go:93 msg="skipping path which is not part of ollama" path=/usr/local/nvidia/lib64 time=2025-04-26T15:49:33.155Z level=DEBUG source=ggml.go:99 msg="ggml backend load all from path" path=/usr/lib/ollama load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-alderlake.so time=2025-04-26T15:49:33.156Z level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) time=2025-04-26T15:49:33.156Z level=INFO source=runner.go:913 msg="Server listening on 127.0.0.1:21385" llama_model_loader: loaded meta data with 23 key-value pairs and 389 tensors from /home/ollama/models/.ollama/models/blobs/sha256-819c2adf5ce6df2b6bd2ae4ca90d2a69f060afeb438d0c171db57daa02e39c3d (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = bert llama_model_loader: - kv 1: general.name str = mxbai-embed-large-v1 llama_model_loader: - kv 2: bert.block_count u32 = 24 llama_model_loader: - kv 3: bert.context_length u32 = 512 llama_model_loader: - kv 4: bert.embedding_length u32 = 1024 llama_model_loader: - kv 5: bert.feed_forward_length u32 = 4096 llama_model_loader: - kv 6: bert.attention.head_count u32 = 16 llama_model_loader: - kv 7: bert.attention.layer_norm_epsilon f32 = 0.000000 llama_model_loader: - kv 8: general.file_type u32 = 1 llama_model_loader: - kv 9: bert.attention.causal bool = false llama_model_loader: - kv 10: bert.pooling_type u32 = 2 llama_model_loader: - kv 11: tokenizer.ggml.token_type_count u32 = 2 llama_model_loader: - kv 12: tokenizer.ggml.bos_token_id u32 = 101 llama_model_loader: - kv 13: tokenizer.ggml.eos_token_id u32 = 102 llama_model_loader: - kv 14: tokenizer.ggml.model str = bert llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,30522] = ["[PAD]", "[unused0]", "[unused1]", "... llama_model_loader: - kv 16: tokenizer.ggml.scores arr[f32,30522] = [-1000.000000, -1000.000000, -1000.00... llama_model_loader: - kv 17: tokenizer.ggml.token_type arr[i32,30522] = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 100 llama_model_loader: - kv 19: tokenizer.ggml.seperator_token_id u32 = 102 llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 21: tokenizer.ggml.cls_token_id u32 = 101 llama_model_loader: - kv 22: tokenizer.ggml.mask_token_id u32 = 103 llama_model_loader: - type f32: 243 tensors llama_model_loader: - type f16: 146 tensors print_info: file format = GGUF V3 (latest) print_info: file type = F16 print_info: file size = 637.85 MiB (16.02 BPW) init_tokenizer: initializing tokenizer for type 3 load: control token: 101 '[CLS]' is not marked as EOG load: control token: 103 '[MASK]' is not marked as EOG load: control token: 0 '[PAD]' is not marked as EOG load: control token: 100 '[UNK]' is not marked as EOG load: control token: 102 '[SEP]' is not marked as EOG load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect load: special tokens cache size = 5 load: token to piece cache size = 0.2032 MB print_info: arch = bert print_info: vocab_only = 0 print_info: n_ctx_train = 512 print_info: n_embd = 1024 print_info: n_layer = 24 print_info: n_head = 16 print_info: n_head_kv = 16 print_info: n_rot = 64 print_info: n_swa = 0 print_info: n_swa_pattern = 1 print_info: n_embd_head_k = 64 print_info: n_embd_head_v = 64 print_info: n_gqa = 1 print_info: n_embd_k_gqa = 1024 print_info: n_embd_v_gqa = 1024 print_info: f_norm_eps = 1.0e-12 print_info: f_norm_rms_eps = 0.0e+00 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: f_attn_scale = 0.0e+00 print_info: n_ff = 4096 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: causal attn = 0 print_info: pooling type = 2 print_info: rope type = 2 print_info: rope scaling = linear print_info: freq_base_train = 10000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 512 print_info: rope_finetuned = unknown print_info: ssm_d_conv = 0 print_info: ssm_d_inner = 0 print_info: ssm_d_state = 0 print_info: ssm_dt_rank = 0 print_info: ssm_dt_b_c_rms = 0 print_info: model type = 335M print_info: model params = 334.09 M print_info: general.name = mxbai-embed-large-v1 print_info: vocab type = WPM print_info: n_vocab = 30522 print_info: n_merges = 0 print_info: BOS token = 101 '[CLS]' print_info: EOS token = 102 '[SEP]' print_info: UNK token = 100 '[UNK]' print_info: SEP token = 102 '[SEP]' print_info: PAD token = 0 '[PAD]' print_info: MASK token = 103 '[MASK]' print_info: LF token = 0 '[PAD]' print_info: EOG token = 102 '[SEP]' print_info: max token length = 21 load_tensors: loading model tensors, this can take a while... (mmap = true) load_tensors: layer 0 assigned to device CPU, is_swa = 0 load_tensors: layer 1 assigned to device CPU, is_swa = 0 load_tensors: layer 2 assigned to device CPU, is_swa = 0 load_tensors: layer 3 assigned to device CPU, is_swa = 0 load_tensors: layer 4 assigned to device CPU, is_swa = 0 load_tensors: layer 5 assigned to device CPU, is_swa = 0 load_tensors: layer 6 assigned to device CPU, is_swa = 0 load_tensors: layer 7 assigned to device CPU, is_swa = 0 load_tensors: layer 8 assigned to device CPU, is_swa = 0 load_tensors: layer 9 assigned to device CPU, is_swa = 0 load_tensors: layer 10 assigned to device CPU, is_swa = 0 load_tensors: layer 11 assigned to device CPU, is_swa = 0 load_tensors: layer 12 assigned to device CPU, is_swa = 0 load_tensors: layer 13 assigned to device CPU, is_swa = 0 load_tensors: layer 14 assigned to device CPU, is_swa = 0 load_tensors: layer 15 assigned to device CPU, is_swa = 0 load_tensors: layer 16 assigned to device CPU, is_swa = 0 load_tensors: layer 17 assigned to device CPU, is_swa = 0 load_tensors: layer 18 assigned to device CPU, is_swa = 0 load_tensors: layer 19 assigned to device CPU, is_swa = 0 load_tensors: layer 20 assigned to device CPU, is_swa = 0 load_tensors: layer 21 assigned to device CPU, is_swa = 0 load_tensors: layer 22 assigned to device CPU, is_swa = 0 load_tensors: layer 23 assigned to device CPU, is_swa = 0 load_tensors: layer 24 assigned to device CPU, is_swa = 0 time=2025-04-26T15:49:33.372Z level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server loading model" [GIN] 2025/04/26 - 15:49:34 |[97;42m 200 [0m| 16.28µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:35 |[97;42m 200 [0m| 12.119µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:36 |[97;42m 200 [0m| 12.501µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:37 |[97;42m 200 [0m| 16.986µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:38 |[97;42m 200 [0m| 18.783µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:39 |[97;42m 200 [0m| 13.436µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:40 |[97;42m 200 [0m| 19.586µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:41 |[97;42m 200 [0m| 23.479µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:42 |[97;42m 200 [0m| 23.868µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:43 |[97;42m 200 [0m| 13.501µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:44 |[97;42m 200 [0m| 14.025µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:45 |[97;42m 200 [0m| 13.98µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:46 |[97;42m 200 [0m| 22.553µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:47 |[97;42m 200 [0m| 13.092µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:48 |[97;42m 200 [0m| 19.752µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:49 |[97;42m 200 [0m| 12.943µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:50 |[97;42m 200 [0m| 13.455µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:51 |[97;42m 200 [0m| 12.638µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:52 |[97;42m 200 [0m| 21.598µs | 192.168.10.1 |[97;44m GET [0m "/" load_tensors: CPU_Mapped model buffer size = 637.85 MiB llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 512 llama_context: n_ctx_per_seq = 512 llama_context: n_batch = 512 llama_context: n_ubatch = 512 llama_context: causal_attn = 0 llama_context: flash_attn = 0 llama_context: freq_base = 10000.0 llama_context: freq_scale = 1 set_abort_callback: call llama_context: CPU output buffer size = 0.00 MiB llama_context: n_ctx = 512 llama_context: n_ctx = 512 (padded) init: kv_size = 512, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 24, can_shift = 1 init: layer 0: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU init: layer 1: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU init: layer 2: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU init: layer 3: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU init: layer 4: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU init: layer 5: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU init: layer 6: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU init: layer 7: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU init: layer 8: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU init: layer 9: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU init: layer 10: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU init: layer 11: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU init: layer 12: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU init: layer 13: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU init: layer 14: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU init: layer 15: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU init: layer 16: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU init: layer 17: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU init: layer 18: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU init: layer 19: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU init: layer 20: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU init: layer 21: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU init: layer 22: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU init: layer 23: n_embd_k_gqa = 1024, n_embd_v_gqa = 1024, dev = CPU init: CPU KV buffer size = 48.00 MiB llama_context: KV self size = 48.00 MiB, K (f16): 24.00 MiB, V (f16): 24.00 MiB llama_context: enumerating backends llama_context: backend_ptrs.size() = 1 llama_context: max_nodes = 65536 llama_context: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 0 llama_context: reserving graph for n_tokens = 512, n_seqs = 1 llama_context: reserving graph for n_tokens = 1, n_seqs = 1 llama_context: reserving graph for n_tokens = 512, n_seqs = 1 llama_context: CPU compute buffer size = 27.01 MiB llama_context: graph nodes = 825 llama_context: graph splits = 1 time=2025-04-26T15:49:53.446Z level=INFO source=server.go:619 msg="llama runner started in 21.08 seconds" time=2025-04-26T15:49:53.446Z level=DEBUG source=sched.go:464 msg="finished setting up runner" model=/home/ollama/models/.ollama/models/blobs/sha256-819c2adf5ce6df2b6bd2ae4ca90d2a69f060afeb438d0c171db57daa02e39c3d time=2025-04-26T15:49:53.450Z level=WARN source=ggml.go:152 msg="key not found" key=general.alignment default=32 time=2025-04-26T15:49:53.452Z level=DEBUG source=runner.go:686 msg="embedding request" content="tell me more" time=2025-04-26T15:49:53.454Z level=DEBUG source=cache.go:104 msg="loading cache slot" id=0 cache=0 prompt=5 used=0 remaining=5 [GIN] 2025/04/26 - 15:49:53 |[97;42m 200 [0m| 22.671213222s | 173.249.47.211 |[97;46m POST [0m "/api/embed" time=2025-04-26T15:49:53.477Z level=DEBUG source=sched.go:468 msg="context for request finished" time=2025-04-26T15:49:53.477Z level=DEBUG source=sched.go:341 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/home/ollama/models/.ollama/models/blobs/sha256-819c2adf5ce6df2b6bd2ae4ca90d2a69f060afeb438d0c171db57daa02e39c3d duration=5m0s time=2025-04-26T15:49:53.477Z level=DEBUG source=sched.go:359 msg="after processing request finished event" modelPath=/home/ollama/models/.ollama/models/blobs/sha256-819c2adf5ce6df2b6bd2ae4ca90d2a69f060afeb438d0c171db57daa02e39c3d refCount=0 [GIN] 2025/04/26 - 15:49:53 |[97;42m 200 [0m| 15.649µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:54 |[97;42m 200 [0m| 19.318µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:55 |[97;42m 200 [0m| 13.768µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:56 |[97;42m 200 [0m| 18.993µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:57 |[97;42m 200 [0m| 15.537µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:58 |[97;42m 200 [0m| 14.674µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:49:59 |[97;42m 200 [0m| 14.12µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:50:00 |[97;42m 200 [0m| 13.63µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:50:01 |[97;42m 200 [0m| 13.562µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:50:02 |[97;42m 200 [0m| 14.125µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:50:03 |[97;42m 200 [0m| 14.033µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:50:04 |[97;42m 200 [0m| 13.703µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:50:05 |[97;42m 200 [0m| 14.48µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:50:06 |[97;42m 200 [0m| 16.471µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:50:07 |[97;42m 200 [0m| 15.057µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:50:08 |[97;42m 200 [0m| 14.551µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:50:09 |[97;42m 200 [0m| 13.898µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:50:10 |[97;42m 200 [0m| 18.579µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:50:12 |[97;42m 200 [0m| 14.124µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:50:13 |[97;42m 200 [0m| 13.713µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:50:14 |[97;42m 200 [0m| 16.543µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:50:15 |[97;42m 200 [0m| 22.086µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:50:16 |[97;42m 200 [0m| 13.266µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:50:17 |[97;42m 200 [0m| 13.639µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:50:18 |[97;42m 200 [0m| 14.333µs | 192.168.10.1 |[97;44m GET [0m "/" [GIN] 2025/04/26 - 15:50:19 |[97;42m 200 [0m| 14.326µs | 192.168.10.1 |[97;44m GET [0m "/" ``` ### OS OS: Linux Mint 22.1 x86_64 ### GPU GPU0: AMD ATI Radeon RX 7900 XT/7900 XTX/7900 GRE/7900M GPU1: AMD ATI Radeon RX 6800/6800 XT / 6900 XT ### CPU CPU: Intel i9-14900K (32) @ 5.700GHz ### Ollama version ollama/ollama:0.6.6-rocm

GiteaMirror added the bug label 2025-11-12 13:47:09 -06:00

GiteaMirror commented

2025-11-12 13:47:10 -06:00

@sunarowicz commented on GitHub (May 2, 2025):

Same problem here, I get
ggml_cuda_init: failed to initialize ROCm: no ROCm-capable device is detected

Ollama version: 0.6.6 (installed in system, not container by official install script)
ROCm installed
GPU: iGP 780M in Rzyen 7 78003D
System variables used: HSA_OVERRIDE_GFX_VERSION=11.0.0 HCC_AMDGPU_TARGETS=gfx1103 CUDA_VISIBLE_DEVICES=-1

rocminfo (truncated:

*******                  
Agent 2                  
*******                  
  Name:                    gfx1036                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon Graphics                
  Vendor Name:             AMD

Ollama messages (truncated):

llama_model_load: vocab only - skipping tensors
time=2025-05-02T16:00:01.289+02:00 level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /home/ai/.ollama/models/blobs/sha256-5ee4f07cdb9beadbbb293e85803c569b01bd37ed059d2715faa7bb405f31caa6 --ctx-size 8192 --batch-size 512 --n-gpu-layers 37 --threads 8 --parallel 4 --port 40579"
time=2025-05-02T16:00:01.290+02:00 level=INFO source=sched.go:451 msg="loaded runners" count=1
time=2025-05-02T16:00:01.290+02:00 level=INFO source=server.go:580 msg="waiting for llama runner to start responding"
time=2025-05-02T16:00:01.290+02:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error"
time=2025-05-02T16:00:01.300+02:00 level=INFO source=runner.go:853 msg="starting go runner"
ggml_cuda_init: failed to initialize ROCm: no ROCm-capable device is detected
load_backend: loaded ROCm backend from /usr/local/lib/ollama/rocm/libggml-hip.so
load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-icelake.so
time=2025-05-02T16:00:01.344+02:00 level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
time=2025-05-02T16:00:01.345+02:00 level=INFO source=runner.go:913 msg="Server listening on 127.0.0.1:40579"

Any idea how to solve?

@sunarowicz commented on GitHub (May 2, 2025): Same problem here, I get `ggml_cuda_init: failed to initialize ROCm: no ROCm-capable device is detected` Ollama version: 0.6.6 (installed in system, not container by official install script) ROCm installed GPU: iGP 780M in Rzyen 7 78003D System variables used: HSA_OVERRIDE_GFX_VERSION=11.0.0 HCC_AMDGPU_TARGETS=gfx1103 CUDA_VISIBLE_DEVICES=-1 rocminfo (truncated: ``` ******* Agent 2 ******* Name: gfx1036 Uuid: GPU-XX Marketing Name: AMD Radeon Graphics Vendor Name: AMD ``` Ollama messages (truncated): ``` llama_model_load: vocab only - skipping tensors time=2025-05-02T16:00:01.289+02:00 level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /home/ai/.ollama/models/blobs/sha256-5ee4f07cdb9beadbbb293e85803c569b01bd37ed059d2715faa7bb405f31caa6 --ctx-size 8192 --batch-size 512 --n-gpu-layers 37 --threads 8 --parallel 4 --port 40579" time=2025-05-02T16:00:01.290+02:00 level=INFO source=sched.go:451 msg="loaded runners" count=1 time=2025-05-02T16:00:01.290+02:00 level=INFO source=server.go:580 msg="waiting for llama runner to start responding" time=2025-05-02T16:00:01.290+02:00 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error" time=2025-05-02T16:00:01.300+02:00 level=INFO source=runner.go:853 msg="starting go runner" ggml_cuda_init: failed to initialize ROCm: no ROCm-capable device is detected load_backend: loaded ROCm backend from /usr/local/lib/ollama/rocm/libggml-hip.so load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-icelake.so time=2025-05-02T16:00:01.344+02:00 level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) time=2025-05-02T16:00:01.345+02:00 level=INFO source=runner.go:913 msg="Server listening on 127.0.0.1:40579" ``` Any idea how to solve?

GiteaMirror commented

2025-11-12 13:47:10 -06:00

@moophlo commented on GitHub (May 2, 2025):

Not sure about iGPU but try with these:

HIP_VISIBLE_DEVICES=1
HSA_OVERRIDE_GFX_VERSION=11.0.0
HCC_AMDGPU_TARGET=gfx1100
ROCR_VISIBLE_DEVICES=1

Actually if you only have the Integrated GPU you shouldn't need any variable at all, ideally it should autodetect

@moophlo commented on GitHub (May 2, 2025): Not sure about iGPU but try with these: HIP_VISIBLE_DEVICES=1 HSA_OVERRIDE_GFX_VERSION=11.0.0 HCC_AMDGPU_TARGET=gfx1100 ROCR_VISIBLE_DEVICES=1 Actually if you only have the Integrated GPU you shouldn't need any variable at all, ideally it should autodetect

GiteaMirror commented

2025-11-12 13:47:10 -06:00

@sunarowicz commented on GitHub (May 2, 2025):

When I start server with AMD_LOG_LEVEL=3 and OLLAMA_DEBUG=1 I get the following additional info:

time=2025-05-02T16:24:57.999+02:00 level=INFO source=runner.go:853 msg="starting go runner"
time=2025-05-02T16:24:58.000+02:00 level=DEBUG source=ggml.go:99 msg="ggml backend load all from path" path=/usr/local/lib/ollama/rocm
:3:rocdevice.cpp            :469 : 145600849622d us:  Initializing HSA stack.
:3:rocdevice.cpp            :555 : 145600858892d us:  Enumerated GPU agents = 0
:3:hip_context.cpp          :49  : 145600858897d us:  Direct Dispatch: 1
:3:hip_device_runtime.cpp   :649 : 145600858914d us:   hipGetDeviceCount ( 0x72165e6c4310 ) 
:3:hip_device_runtime.cpp   :651 : 145600858917d us:  hipGetDeviceCount: Returned hipErrorNoDevice : 
ggml_cuda_init: failed to initialize ROCm: no ROCm-capable device is detected
load_backend: loaded ROCm backend from /usr/local/lib/ollama/rocm/libggml-hip.so

But still do not understand why I get "Enumerated GPU agents = 0" when rocminfo reports the GPU agent.

@sunarowicz commented on GitHub (May 2, 2025): When I start server with AMD_LOG_LEVEL=3 and OLLAMA_DEBUG=1 I get the following additional info: ``` time=2025-05-02T16:24:57.999+02:00 level=INFO source=runner.go:853 msg="starting go runner" time=2025-05-02T16:24:58.000+02:00 level=DEBUG source=ggml.go:99 msg="ggml backend load all from path" path=/usr/local/lib/ollama/rocm :3:rocdevice.cpp :469 : 145600849622d us: Initializing HSA stack. :3:rocdevice.cpp :555 : 145600858892d us: Enumerated GPU agents = 0 :3:hip_context.cpp :49 : 145600858897d us: Direct Dispatch: 1 :3:hip_device_runtime.cpp :649 : 145600858914d us: hipGetDeviceCount ( 0x72165e6c4310 ) :3:hip_device_runtime.cpp :651 : 145600858917d us: hipGetDeviceCount: Returned hipErrorNoDevice : ggml_cuda_init: failed to initialize ROCm: no ROCm-capable device is detected load_backend: loaded ROCm backend from /usr/local/lib/ollama/rocm/libggml-hip.so ``` But still do not understand why I get "Enumerated GPU agents = 0" when rocminfo reports the GPU agent.

GiteaMirror commented

2025-11-12 13:47:11 -06:00

@sunarowicz commented on GitHub (May 2, 2025):

Not sure about iGPU but try with these:

HIP_VISIBLE_DEVICES=1 HSA_OVERRIDE_GFX_VERSION=11.0.0 HCC_AMDGPU_TARGET=gfx1100 ROCR_VISIBLE_DEVICES=1

Actually if you only have the Integrated GPU you shouldn't need any variable at all, ideally it should autodetect

Thank you for replying to me.

Although your recommendation didn't help, it moved me forward a bit. I found that using HSA_OVERRIDE_GFX_VERSION=11.0.0 only makes ollama finally to try to load layers on iGPU. But then it crashes because of this error:

Memory access fault by GPU node-1 (Agent handle: 0x5d8d5aeee0b0) on address 0x70cee5a30000. Reason: Page not present or supervisor privilege.
time=2025-05-02T18:12:35.513+02:00 level=ERROR source=sched.go:457 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped)"

But this seems to be a different story, already reported (but not responded yet) here: #8851.

@sunarowicz commented on GitHub (May 2, 2025): > Not sure about iGPU but try with these: > > HIP_VISIBLE_DEVICES=1 HSA_OVERRIDE_GFX_VERSION=11.0.0 HCC_AMDGPU_TARGET=gfx1100 ROCR_VISIBLE_DEVICES=1 > > Actually if you only have the Integrated GPU you shouldn't need any variable at all, ideally it should autodetect Thank you for replying to me. Although your recommendation didn't help, it moved me forward a bit. I found that using HSA_OVERRIDE_GFX_VERSION=11.0.0 only makes ollama finally to try to load layers on iGPU. But then it crashes because of this error: ``` Memory access fault by GPU node-1 (Agent handle: 0x5d8d5aeee0b0) on address 0x70cee5a30000. Reason: Page not present or supervisor privilege. time=2025-05-02T18:12:35.513+02:00 level=ERROR source=sched.go:457 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped)" ``` But this seems to be a different story, already reported (but not responded yet) here: [#8851](https://github.com/ollama/ollama/issues/8851).

GiteaMirror referenced this issue

2025-11-12 15:41:18 -06:00

[PR #6855] [MERGED] env.sh: cleanup unused RELEASE_IMAGE_REPO #10986

Sign in to join this conversation.

Branches Tags

main

hoyyeva/fix-launch-app-process-reap

jessegross/batching

pdevine/manifest-list

hoyyeva/launch-page-update

codex/fix-codex-model-metadata-warning

hoyyeva/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

brucemacd/download-before-remove

parth/update-claude-docs

parth-anthropic-reference-images-path

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama-ollama#6855