[GH-ISSUE #3928] rocm crash with 4 gfx900 GPUs #28193

New Issue

GiteaMirror · 2026-04-22T06:04:25-05:00

GiteaMirror commented

2026-04-22 06:04:25 -05:00

Originally created by @ZanMax on GitHub (Apr 26, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3928

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

My system:
Ubuntu: 22.04
CPU: E5 2620
GPU: WX 9100

I have installed drivers and ROCm.
But when I try to run ollama I receive:

time=2024-04-26T02:45:47.779Z level=INFO source=routes.go:1063 msg="Listening on 127.0.0.1:11434 (version 0.0.0)"
time=2024-04-26T02:45:47.780Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3693243001/runners
time=2024-04-26T02:45:47.780Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz
time=2024-04-26T02:45:47.780Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v0 file=build/linux/x86_64/rocm_v0/bin/deps.txt.gz
time=2024-04-26T02:45:47.780Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v0 file=build/linux/x86_64/rocm_v0/bin/ollama_llama_server.gz
time=2024-04-26T02:45:48.242Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3693243001/runners/cpu
time=2024-04-26T02:45:48.242Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3693243001/runners/rocm_v0
time=2024-04-26T02:45:48.242Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu rocm_v0]"
time=2024-04-26T02:45:48.242Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-04-26T02:45:48.242Z level=DEBUG source=sched.go:101 msg="starting llm scheduler"
time=2024-04-26T02:45:48.242Z level=INFO source=gpu.go:96 msg="Detecting GPUs"
time=2024-04-26T02:45:48.242Z level=DEBUG source=gpu.go:203 msg="Searching for GPU library" name=libcudart.so*
time=2024-04-26T02:45:48.242Z level=DEBUG source=gpu.go:221 msg="gpu library search" globs="[/tmp/ollama3693243001/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers//libcudart.so /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so* /opt/rocm/lib/libcudart.so** /home/dev/libcudart.so**]"
time=2024-04-26T02:45:48.244Z level=DEBUG source=gpu.go:249 msg="discovered GPU libraries" paths=[]
time=2024-04-26T02:45:48.244Z level=INFO source=cpu_common.go:15 msg="CPU has AVX"
time=2024-04-26T02:45:48.244Z level=INFO source=amd_linux.go:46 msg="AMD Driver: 6.2.4"
time=2024-04-26T02:45:48.244Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-04-26T02:45:48.244Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-04-26T02:45:48.244Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-04-26T02:45:48.244Z level=INFO source=amd_linux.go:217 msg="amdgpu memory" gpu=0 total="16368.0 MiB"
time=2024-04-26T02:45:48.245Z level=INFO source=amd_linux.go:218 msg="amdgpu memory" gpu=0 available="5697.0 MiB"
time=2024-04-26T02:45:48.245Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-04-26T02:45:48.245Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /home/dev"
time=2024-04-26T02:45:48.245Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-04-26T02:45:48.245Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/bin/rocm"
time=2024-04-26T02:45:48.245Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/share/ollama/lib/rocm"
time=2024-04-26T02:45:48.245Z level=WARN source=amd_linux.go:321 msg="amdgpu detected, but no compatible rocm library found. Either install rocm v6, or follow manual install instructions at https://github.com/ollama/ollama/blob/main/docs/linux.md#manual-install"
time=2024-04-26T02:45:48.245Z level=WARN source=amd_linux.go:253 msg="unable to verify rocm library, will use cpu" error="no suitable rocm found, falling back to CPU"

OS

Linux

GPU

AMD

CPU

Intel

Ollama version

0.1.32

Originally created by @ZanMax on GitHub (Apr 26, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3928 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? My system: Ubuntu: 22.04 CPU: E5 2620 GPU: WX 9100 I have installed drivers and ROCm. But when I try to run ollama I receive: > time=2024-04-26T02:45:47.779Z level=INFO source=routes.go:1063 msg="Listening on 127.0.0.1:11434 (version 0.0.0)" > time=2024-04-26T02:45:47.780Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3693243001/runners > time=2024-04-26T02:45:47.780Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz > time=2024-04-26T02:45:47.780Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v0 file=build/linux/x86_64/rocm_v0/bin/deps.txt.gz > time=2024-04-26T02:45:47.780Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v0 file=build/linux/x86_64/rocm_v0/bin/ollama_llama_server.gz > time=2024-04-26T02:45:48.242Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3693243001/runners/cpu > time=2024-04-26T02:45:48.242Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3693243001/runners/rocm_v0 > time=2024-04-26T02:45:48.242Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu rocm_v0]" > time=2024-04-26T02:45:48.242Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" > time=2024-04-26T02:45:48.242Z level=DEBUG source=sched.go:101 msg="starting llm scheduler" > time=2024-04-26T02:45:48.242Z level=INFO source=gpu.go:96 msg="Detecting GPUs" > time=2024-04-26T02:45:48.242Z level=DEBUG source=gpu.go:203 msg="Searching for GPU library" name=libcudart.so* > time=2024-04-26T02:45:48.242Z level=DEBUG source=gpu.go:221 msg="gpu library search" globs="[/tmp/ollama3693243001/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so* /opt/rocm/lib/libcudart.so** /home/dev/libcudart.so**]" > time=2024-04-26T02:45:48.244Z level=DEBUG source=gpu.go:249 msg="discovered GPU libraries" paths=[] > time=2024-04-26T02:45:48.244Z level=INFO source=cpu_common.go:15 msg="CPU has AVX" > time=2024-04-26T02:45:48.244Z level=INFO source=amd_linux.go:46 msg="AMD Driver: 6.2.4" > time=2024-04-26T02:45:48.244Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" > time=2024-04-26T02:45:48.244Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" > time=2024-04-26T02:45:48.244Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" > time=2024-04-26T02:45:48.244Z level=INFO source=amd_linux.go:217 msg="amdgpu memory" gpu=0 total="16368.0 MiB" > time=2024-04-26T02:45:48.245Z level=INFO source=amd_linux.go:218 msg="amdgpu memory" gpu=0 available="5697.0 MiB" > time=2024-04-26T02:45:48.245Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" > time=2024-04-26T02:45:48.245Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /home/dev" > time=2024-04-26T02:45:48.245Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" > time=2024-04-26T02:45:48.245Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/bin/rocm" > time=2024-04-26T02:45:48.245Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/share/ollama/lib/rocm" > time=2024-04-26T02:45:48.245Z level=WARN source=amd_linux.go:321 msg="amdgpu detected, but no compatible rocm library found. Either install rocm v6, or follow manual install instructions at https://github.com/ollama/ollama/blob/main/docs/linux.md#manual-install" > time=2024-04-26T02:45:48.245Z level=WARN source=amd_linux.go:253 msg="unable to verify rocm library, will use cpu" error="no suitable rocm found, falling back to CPU" ### OS Linux ### GPU AMD ### CPU Intel ### Ollama version 0.1.32

GiteaMirror added the amd bug labels 2026-04-22 06:04:25 -05:00

GiteaMirror closed this issue

2026-04-22 06:04:29 -05:00

GiteaMirror commented

2026-04-22 06:04:31 -05:00

@ZanMax commented on GitHub (Apr 26, 2024):

After reinstalling Ollama. It starts to offload into GPU.
But I received a new problem:

time=2024-04-26T19:26:29.174Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get "http://127.0.0.1:38459/health": dial tcp 127.0.0.1:38459: i/o timeout"
time=2024-04-26T19:26:29.375Z level=DEBUG source=server.go:420 msg="server not yet available" error="server not responding"
time=2024-04-26T19:26:29.576Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get "http://127.0.0.1:38459/health": dial tcp 127.0.0.1:38459: i/o timeout"
time=2024-04-26T19:26:29.776Z level=DEBUG source=server.go:420 msg="server not yet available" error="server not responding"
time=2024-04-26T19:26:30.178Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get "http://127.0.0.1:38459/health": dial tcp 127.0.0.1:38459: i/o timeout"
time=2024-04-26T19:26:30.580Z level=DEBUG source=server.go:420 msg="server not yet available" error="server not responding"

Full log:
ollama_problem.txt

@ZanMax commented on GitHub (Apr 26, 2024): After reinstalling Ollama. It starts to offload into GPU. But I received a new problem: > time=2024-04-26T19:26:29.174Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:38459/health\": dial tcp 127.0.0.1:38459: i/o timeout" > time=2024-04-26T19:26:29.375Z level=DEBUG source=server.go:420 msg="server not yet available" error="server not responding" > time=2024-04-26T19:26:29.576Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:38459/health\": dial tcp 127.0.0.1:38459: i/o timeout" > time=2024-04-26T19:26:29.776Z level=DEBUG source=server.go:420 msg="server not yet available" error="server not responding" > time=2024-04-26T19:26:30.178Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:38459/health\": dial tcp 127.0.0.1:38459: i/o timeout" > time=2024-04-26T19:26:30.580Z level=DEBUG source=server.go:420 msg="server not yet available" error="server not responding" Full log: [ollama_problem.txt](https://github.com/ollama/ollama/files/15134633/ollama_problem.txt)

GiteaMirror commented

2026-04-22 06:04:32 -05:00

@dhiltgen commented on GitHub (May 1, 2024):

Do you actually have 4 GPUs, or did the amdgpu driver go sideways and incorrectly enumerate 4 GPUs when there is only 1? Your first log shows only 1, so my suspicion is you're hitting GPU driver bugs. If it is reporting "ghost" GPUs that don't exist, things will go bad when we try to allocate memory on them which seems like a plausible explanation for the crash. If that's correct, rebooting will likely clear it.

time=2024-04-26T19:26:26.048Z level=INFO source=amd_linux.go:121 msg="amdgpu [0] gfx900 is supported"
time=2024-04-26T19:26:26.048Z level=INFO source=amd_linux.go:121 msg="amdgpu [1] gfx900 is supported"
time=2024-04-26T19:26:26.048Z level=INFO source=amd_linux.go:121 msg="amdgpu [2] gfx900 is supported"
time=2024-04-26T19:26:26.048Z level=INFO source=amd_linux.go:121 msg="amdgpu [3] gfx900 is supported"
time=2024-04-26T19:26:26.048Z level=INFO source=amd_linux.go:121 msg="amdgpu [4] gfx900 is supported"
time=2024-04-26T19:26:26.048Z level=DEBUG source=amd_linux.go:169 msg="discovering VRAM for amdgpu devices"
time=2024-04-26T19:26:26.048Z level=DEBUG source=amd_linux.go:188 msg="amdgpu devices [2 3 4 0 1]"
time=2024-04-26T19:26:26.048Z level=INFO source=amd_linux.go:263 msg="[2] amdgpu totalMemory 16368M"
time=2024-04-26T19:26:26.048Z level=INFO source=amd_linux.go:264 msg="[2] amdgpu freeMemory  5697M"
time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:263 msg="[3] amdgpu totalMemory 16368M"
time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:264 msg="[3] amdgpu freeMemory  5697M"
time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:263 msg="[4] amdgpu totalMemory 16368M"
time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:264 msg="[4] amdgpu freeMemory  5697M"
time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:263 msg="[0] amdgpu totalMemory 16368M"
time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:264 msg="[0] amdgpu freeMemory  16368M"
time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:263 msg="[1] amdgpu totalMemory 16368M"
time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:264 msg="[1] amdgpu freeMemory  5697M"

You can sanity test this theory by checking the contents of ls /sys/class/kfd/kfd/topology/nodes/ and see how many are being reported.

@dhiltgen commented on GitHub (May 1, 2024): Do you actually have 4 GPUs, or did the amdgpu driver go sideways and incorrectly enumerate 4 GPUs when there is only 1? Your first log shows only 1, so my suspicion is you're hitting GPU driver bugs. If it is reporting "ghost" GPUs that don't exist, things will go bad when we try to allocate memory on them which seems like a plausible explanation for the crash. If that's correct, rebooting will likely clear it. ``` time=2024-04-26T19:26:26.048Z level=INFO source=amd_linux.go:121 msg="amdgpu [0] gfx900 is supported" time=2024-04-26T19:26:26.048Z level=INFO source=amd_linux.go:121 msg="amdgpu [1] gfx900 is supported" time=2024-04-26T19:26:26.048Z level=INFO source=amd_linux.go:121 msg="amdgpu [2] gfx900 is supported" time=2024-04-26T19:26:26.048Z level=INFO source=amd_linux.go:121 msg="amdgpu [3] gfx900 is supported" time=2024-04-26T19:26:26.048Z level=INFO source=amd_linux.go:121 msg="amdgpu [4] gfx900 is supported" time=2024-04-26T19:26:26.048Z level=DEBUG source=amd_linux.go:169 msg="discovering VRAM for amdgpu devices" time=2024-04-26T19:26:26.048Z level=DEBUG source=amd_linux.go:188 msg="amdgpu devices [2 3 4 0 1]" time=2024-04-26T19:26:26.048Z level=INFO source=amd_linux.go:263 msg="[2] amdgpu totalMemory 16368M" time=2024-04-26T19:26:26.048Z level=INFO source=amd_linux.go:264 msg="[2] amdgpu freeMemory 5697M" time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:263 msg="[3] amdgpu totalMemory 16368M" time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:264 msg="[3] amdgpu freeMemory 5697M" time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:263 msg="[4] amdgpu totalMemory 16368M" time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:264 msg="[4] amdgpu freeMemory 5697M" time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:263 msg="[0] amdgpu totalMemory 16368M" time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:264 msg="[0] amdgpu freeMemory 16368M" time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:263 msg="[1] amdgpu totalMemory 16368M" time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:264 msg="[1] amdgpu freeMemory 5697M" ``` You can sanity test this theory by checking the contents of `ls /sys/class/kfd/kfd/topology/nodes/` and see how many are being reported.

GiteaMirror commented

2026-04-22 06:04:33 -05:00

@ZanMax commented on GitHub (May 1, 2024):

Yes, I have 5 GPUs ( 5 x WX9100 ) in my system.

ls /sys/class/kfd/kfd/topology/nodes/
0 1 2 3 4 5

lspci | grep VGA
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon PRO WX 9100]
06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon PRO WX 9100]
09:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon PRO WX 9100]
0c:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon PRO WX 9100]
0f:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon PRO WX 9100]
12:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)

@ZanMax commented on GitHub (May 1, 2024): Yes, I have 5 GPUs ( 5 x WX9100 ) in my system. ls /sys/class/kfd/kfd/topology/nodes/ 0 1 2 3 4 5 lspci | grep VGA 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon PRO WX 9100] 06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon PRO WX 9100] 09:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon PRO WX 9100] 0c:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon PRO WX 9100] 0f:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon PRO WX 9100] 12:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)

GiteaMirror commented

2026-04-22 06:04:35 -05:00

@dhiltgen commented on GitHub (May 2, 2024):

I think the explanation is likely bugs in ROCm related to multi-GPU. In 0.1.32 and older, we always spread out on available GPUs. The good news is in 0.1.33 (RC available now) we try to run models in a single GPU if they fit, and only spread across multiple GPUs when the model is larger than VRAM on the largest card. As long as you run models smaller than the biggest GPU, you should see things start working in 0.1.33, but larger models will likely still fail until the underlying bug is resolved.

https://github.com/ollama/ollama/releases

@dhiltgen commented on GitHub (May 2, 2024): I think the explanation is likely bugs in ROCm related to multi-GPU. In 0.1.32 and older, we always spread out on available GPUs. The good news is in 0.1.33 (RC available now) we try to run models in a single GPU if they fit, and only spread across multiple GPUs when the model is larger than VRAM on the largest card. As long as you run models smaller than the biggest GPU, you should see things start working in 0.1.33, but larger models will likely still fail until the underlying bug is resolved. https://github.com/ollama/ollama/releases

GiteaMirror commented

2026-04-22 06:04:36 -05:00

@ZanMax commented on GitHub (May 2, 2024):

I tried to run ollama with just one GPU:

OLLAMA_DEBUG=1 HIP_VISIBLE_DEVICES=1 ollama serve

as a result:

time=2024-05-02T21:54:36.454Z level=INFO source=gpu.go:314 msg="Discovered GPU libraries: []"
time=2024-05-02T21:54:36.454Z level=INFO source=cpu_common.go:15 msg="CPU has AVX"
time=2024-05-02T21:54:36.454Z level=INFO source=amd_linux.go:50 msg="AMD Driver: 6.2.4"
time=2024-05-02T21:54:36.454Z level=DEBUG source=amd_linux.go:169 msg="discovering VRAM for amdgpu devices"
time=2024-05-02T21:54:36.454Z level=DEBUG source=amd_linux.go:188 msg="amdgpu devices [1]"
time=2024-05-02T21:54:36.454Z level=INFO source=amd_linux.go:263 msg="[1] amdgpu totalMemory 16368M"
time=2024-05-02T21:54:36.454Z level=INFO source=amd_linux.go:264 msg="[1] amdgpu freeMemory 16335M"
time=2024-05-02T21:54:36.455Z level=INFO source=server.go:127 msg="offload to gpu" reallayers=33 layers=33 required="3538.9 MiB" used="3538.9 MiB" available="16335.1 MiB" kv="768.0 MiB" fulloffload="156.0 MiB" partialoffload="175.1 MiB"
time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cpu
time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cpu_avx
time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cpu_avx2
time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cuda_v11
time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/rocm_v60002
time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cpu
time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cpu_avx
time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cpu_avx2
time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cuda_v11
time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/rocm_v60002
time=2024-05-02T21:54:36.455Z level=INFO source=cpu_common.go:15 msg="CPU has AVX"
time=2024-05-02T21:54:36.456Z level=DEBUG source=server.go:259 msg="LD_LIBRARY_PATH=/tmp/ollama3088734008/runners/rocm_v60002"
time=2024-05-02T21:54:36.456Z level=INFO source=server.go:264 msg="starting llama server" cmd="/tmp/ollama3088734008/runners/rocm_v60002/ollama_llama_server --model /home/dev/.ollama/models/blobs/sha256-4fed7364ee3e0c7cb4fe0880148bfdfcd1b630981efa0802a6b62ee52e7da97e --ctx-size 2048 --batch-size 512 --embedding --log-format json --n-gpu-layers 33 --verbose --port 40127"
time=2024-05-02T21:54:36.457Z level=INFO source=server.go:389 msg="waiting for llama runner to start responding"
/tmp/ollama3088734008/runners/rocm_v60002/ollama_llama_server: error while loading shared libraries: libhipblas.so.2: cannot open shared object file: No such file or directory
time=2024-05-02T21:54:36.507Z level=ERROR source=routes.go:120 msg="error loading llama server" error="llama runner process no longer running: 127 "

What I noticed:
file=/tmp/ollama3088734008/runners/rocm_v60002

I have 5.7 installed in my system.
And I am not sure that 6.0 supports gfx900.

@ZanMax commented on GitHub (May 2, 2024): I tried to run ollama with just one GPU: OLLAMA_DEBUG=1 HIP_VISIBLE_DEVICES=1 ollama serve as a result: > time=2024-05-02T21:54:36.454Z level=INFO source=gpu.go:314 msg="Discovered GPU libraries: []" > time=2024-05-02T21:54:36.454Z level=INFO source=cpu_common.go:15 msg="CPU has AVX" > time=2024-05-02T21:54:36.454Z level=INFO source=amd_linux.go:50 msg="AMD Driver: 6.2.4" > time=2024-05-02T21:54:36.454Z level=DEBUG source=amd_linux.go:169 msg="discovering VRAM for amdgpu devices" > time=2024-05-02T21:54:36.454Z level=DEBUG source=amd_linux.go:188 msg="amdgpu devices [1]" > time=2024-05-02T21:54:36.454Z level=INFO source=amd_linux.go:263 msg="[1] amdgpu totalMemory 16368M" > time=2024-05-02T21:54:36.454Z level=INFO source=amd_linux.go:264 msg="[1] amdgpu freeMemory 16335M" > time=2024-05-02T21:54:36.455Z level=INFO source=server.go:127 msg="offload to gpu" reallayers=33 layers=33 required="3538.9 MiB" used="3538.9 MiB" available="16335.1 MiB" kv="768.0 MiB" fulloffload="156.0 MiB" partialoffload="175.1 MiB" > time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cpu > time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cpu_avx > time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cpu_avx2 > time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cuda_v11 > time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/rocm_v60002 > time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cpu > time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cpu_avx > time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cpu_avx2 > time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cuda_v11 > time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/rocm_v60002 > time=2024-05-02T21:54:36.455Z level=INFO source=cpu_common.go:15 msg="CPU has AVX" > time=2024-05-02T21:54:36.456Z level=DEBUG source=server.go:259 msg="LD_LIBRARY_PATH=/tmp/ollama3088734008/runners/rocm_v60002" > time=2024-05-02T21:54:36.456Z level=INFO source=server.go:264 msg="starting llama server" cmd="/tmp/ollama3088734008/runners/rocm_v60002/ollama_llama_server --model /home/dev/.ollama/models/blobs/sha256-4fed7364ee3e0c7cb4fe0880148bfdfcd1b630981efa0802a6b62ee52e7da97e --ctx-size 2048 --batch-size 512 --embedding --log-format json --n-gpu-layers 33 --verbose --port 40127" > time=2024-05-02T21:54:36.457Z level=INFO source=server.go:389 msg="waiting for llama runner to start responding" > /tmp/ollama3088734008/runners/rocm_v60002/ollama_llama_server: error while loading shared libraries: libhipblas.so.2: cannot open shared object file: No such file or directory > time=2024-05-02T21:54:36.507Z level=ERROR source=routes.go:120 msg="error loading llama server" error="llama runner process no longer running: 127 " What I noticed: file=/tmp/ollama3088734008/runners/rocm_v60002 I have 5.7 installed in my system. And I am not sure that 6.0 supports gfx900.

GiteaMirror commented

2026-04-22 06:04:38 -05:00

@dhiltgen commented on GitHub (May 2, 2024):

Hmm... this is implying /tmp/ollama3088734008/runners/rocm_v60002/libhipblas.so.2 doesn't exist.

If you're running 0.1.33, in debug mode you should see something like this during early startup

time=2024-05-02T23:49:39.975Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-05-02T23:49:39.975Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /tmp/daniel_ollama_test/rocm"
time=2024-05-02T23:49:39.975Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/share/ollama/lib/rocm"
time=2024-05-02T23:49:39.977Z level=DEBUG source=amd_linux.go:267 msg="rocm supported GPUs" types="[gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]"

gfx900 is included in the v6 ROCm we bundle.

@dhiltgen commented on GitHub (May 2, 2024): Hmm... this is implying `/tmp/ollama3088734008/runners/rocm_v60002/libhipblas.so.2` doesn't exist. If you're running 0.1.33, in debug mode you should see something like this during early startup ``` time=2024-05-02T23:49:39.975Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-05-02T23:49:39.975Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /tmp/daniel_ollama_test/rocm" time=2024-05-02T23:49:39.975Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/share/ollama/lib/rocm" time=2024-05-02T23:49:39.977Z level=DEBUG source=amd_linux.go:267 msg="rocm supported GPUs" types="[gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]" ``` gfx900 is included in the v6 ROCm we bundle.

GiteaMirror commented

2026-04-22 06:04:40 -05:00

@ZanMax commented on GitHub (May 3, 2024):

After update:

ollama -v
ollama version is 0.1.33

The same problem:

.time=2024-05-03T00:51:14.618Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get "http://127.0.0.1:43461/health": dial tcp 127.0.0.1:43461: i/o timeout"
time=2024-05-03T00:51:14.819Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding"
time=2024-05-03T00:51:15.020Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get "http://127.0.0.1:43461/health": dial tcp 127.0.0.1:43461: i/o timeout"
time=2024-05-03T00:51:15.221Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding"

Full log:

time=2024-05-03T00:51:10.287Z level=INFO source=gpu.go:96 msg="Detecting GPUs"
time=2024-05-03T00:51:10.287Z level=DEBUG source=gpu.go:203 msg="Searching for GPU library" name=libcudart.so*
time=2024-05-03T00:51:10.287Z level=DEBUG source=gpu.go:221 msg="gpu library search" globs="[/tmp/ollama3982970041/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers//libcudart.so /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so* /home/dev/libcudart.so**]"
time=2024-05-03T00:51:10.289Z level=DEBUG source=gpu.go:249 msg="discovered GPU libraries" paths=[/tmp/ollama3982970041/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-05-03T00:51:10.290Z level=DEBUG source=gpu.go:261 msg="Unable to load cudart" library=/tmp/ollama3982970041/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama"
time=2024-05-03T00:51:10.290Z level=INFO source=cpu_common.go:15 msg="CPU has AVX"
time=2024-05-03T00:51:10.290Z level=INFO source=amd_linux.go:46 msg="AMD Driver: 6.2.4"
time=2024-05-03T00:51:10.290Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-05-03T00:51:10.290Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-05-03T00:51:10.290Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-05-03T00:51:10.290Z level=INFO source=amd_linux.go:217 msg="amdgpu memory" gpu=0 total="16368.0 MiB"
time=2024-05-03T00:51:10.290Z level=INFO source=amd_linux.go:218 msg="amdgpu memory" gpu=0 available="16360.1 MiB"
time=2024-05-03T00:51:10.290Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-05-03T00:51:10.291Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/local/bin/rocm"
time=2024-05-03T00:51:10.291Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/share/ollama/lib/rocm"
time=2024-05-03T00:51:10.295Z level=DEBUG source=amd_linux.go:267 msg="rocm supported GPUs" types="[gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]"
time=2024-05-03T00:51:10.295Z level=INFO source=amd_linux.go:276 msg="amdgpu is supported" gpu=0 gpu_type=gfx900
time=2024-05-03T00:51:10.295Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/2/properties"
time=2024-05-03T00:51:10.295Z level=INFO source=amd_linux.go:217 msg="amdgpu memory" gpu=1 total="16368.0 MiB"
time=2024-05-03T00:51:10.295Z level=INFO source=amd_linux.go:218 msg="amdgpu memory" gpu=1 available="16360.1 MiB"
time=2024-05-03T00:51:10.295Z level=INFO source=amd_linux.go:276 msg="amdgpu is supported" gpu=1 gpu_type=gfx900
time=2024-05-03T00:51:10.295Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/3/properties"
time=2024-05-03T00:51:10.295Z level=INFO source=amd_linux.go:217 msg="amdgpu memory" gpu=2 total="16368.0 MiB"
time=2024-05-03T00:51:10.295Z level=INFO source=amd_linux.go:218 msg="amdgpu memory" gpu=2 available="16360.1 MiB"
time=2024-05-03T00:51:10.295Z level=INFO source=amd_linux.go:276 msg="amdgpu is supported" gpu=2 gpu_type=gfx900
time=2024-05-03T00:51:10.295Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/4/properties"
time=2024-05-03T00:51:10.296Z level=INFO source=amd_linux.go:217 msg="amdgpu memory" gpu=3 total="16368.0 MiB"
time=2024-05-03T00:51:10.296Z level=INFO source=amd_linux.go:218 msg="amdgpu memory" gpu=3 available="16360.1 MiB"
time=2024-05-03T00:51:10.296Z level=INFO source=amd_linux.go:276 msg="amdgpu is supported" gpu=3 gpu_type=gfx900
time=2024-05-03T00:51:10.296Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/5/properties"
time=2024-05-03T00:51:10.296Z level=INFO source=amd_linux.go:217 msg="amdgpu memory" gpu=4 total="16368.0 MiB"
time=2024-05-03T00:51:10.296Z level=INFO source=amd_linux.go:218 msg="amdgpu memory" gpu=4 available="16360.1 MiB"
time=2024-05-03T00:51:10.296Z level=INFO source=amd_linux.go:276 msg="amdgpu is supported" gpu=4 gpu_type=gfx900
time=2024-05-03T00:51:10.296Z level=DEBUG source=gguf.go:57 msg="model = &llm.gguf{containerGGUF:(llm.containerGGUF)(0xc00047ddc0), kv:llm.KV{}, tensors:[]llm.Tensor(nil), parameters:0x0}"
time=2024-05-03T00:51:10.699Z level=DEBUG source=sched.go:162 msg="loading first model" model=/home/dev/.ollama/models/blobs/sha256-4fed7364ee3e0c7cb4fe0880148bfdfcd1b630981efa0802a6b62ee52e7da97e
time=2024-05-03T00:51:10.699Z level=DEBUG source=memory.go:64 msg=evaluating library=rocm gpu_count=1 available="16360.1 MiB"
time=2024-05-03T00:51:10.700Z level=INFO source=memory.go:152 msg="offload to gpu" layers.real=-1 layers.estimate=33 memory.available="16360.1 MiB" memory.required.full="3538.9 MiB" memory.required.partial="3538.9 MiB" memory.required.kv="768.0 MiB" memory.weights.total="2157.9 MiB" memory.weights.repeating="2080.9 MiB" memory.weights.nonrepeating="77.1 MiB" memory.graph.full="156.0 MiB" memory.graph.partial="175.1 MiB"
time=2024-05-03T00:51:10.700Z level=DEBUG source=sched.go:508 msg="new model will fit in available VRAM in single GPU, loading" model=/home/dev/.ollama/models/blobs/sha256-4fed7364ee3e0c7cb4fe0880148bfdfcd1b630981efa0802a6b62ee52e7da97e gpu=0 available=17154772992 required="3538.9 MiB"
time=2024-05-03T00:51:10.700Z level=DEBUG source=memory.go:64 msg=evaluating library=rocm gpu_count=1 available="16360.1 MiB"
time=2024-05-03T00:51:10.701Z level=INFO source=memory.go:152 msg="offload to gpu" layers.real=-1 layers.estimate=33 memory.available="16360.1 MiB" memory.required.full="3538.9 MiB" memory.required.partial="3538.9 MiB" memory.required.kv="768.0 MiB" memory.weights.total="2157.9 MiB" memory.weights.repeating="2080.9 MiB" memory.weights.nonrepeating="77.1 MiB" memory.graph.full="156.0 MiB" memory.graph.partial="175.1 MiB"
time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cpu
time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cpu_avx
time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cpu_avx2
time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cuda_v11
time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/rocm_v60002
time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cpu
time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cpu_avx
time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cpu_avx2
time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cuda_v11
time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/rocm_v60002
time=2024-05-03T00:51:10.701Z level=INFO source=cpu_common.go:15 msg="CPU has AVX"
time=2024-05-03T00:51:10.702Z level=INFO source=server.go:289 msg="starting llama server" cmd="/tmp/ollama3982970041/runners/rocm_v60002/ollama_llama_server --model /home/dev/.ollama/models/blobs/sha256-4fed7364ee3e0c7cb4fe0880148bfdfcd1b630981efa0802a6b62ee52e7da97e --ctx-size 2048 --batch-size 512 --embedding --log-format json --n-gpu-layers 33 --verbose --parallel 1 --port 43461"
time=2024-05-03T00:51:10.703Z level=DEBUG source=server.go:291 msg=subprocess environment="[OLLAMA_DEBUG=1 SHELL=/bin/bash PWD=/home/dev LOGNAME=dev XDG_SESSION_TYPE=tty MOTD_SHOWN=pam HOME=/home/dev LANG=en_US.UTF-8 LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:.tar=01;31:.tgz=01;31:.arc=01;31:.arj=01;31:.taz=01;31:.lha=01;31:.lz4=01;31:.lzh=01;31:.lzma=01;31:.tlz=01;31:.txz=01;31:.tzo=01;31:.t7z=01;31:.zip=01;31:.z=01;31:.dz=01;31:.gz=01;31:.lrz=01;31:.lz=01;31:.lzo=01;31:.xz=01;31:.zst=01;31:.tzst=01;31:.bz2=01;31:.bz=01;31:.tbz=01;31:.tbz2=01;31:.tz=01;31:.deb=01;31:.rpm=01;31:.jar=01;31:.war=01;31:.ear=01;31:.sar=01;31:.rar=01;31:.alz=01;31:.ace=01;31:.zoo=01;31:.cpio=01;31:.7z=01;31:.rz=01;31:.cab=01;31:.wim=01;31:.swm=01;31:.dwm=01;31:.esd=01;31:.jpg=01;35:.jpeg=01;35:.mjpg=01;35:.mjpeg=01;35:.gif=01;35:.bmp=01;35:.pbm=01;35:.pgm=01;35:.ppm=01;35:.tga=01;35:.xbm=01;35:.xpm=01;35:.tif=01;35:.tiff=01;35:.png=01;35:.svg=01;35:.svgz=01;35:.mng=01;35:.pcx=01;35:.mov=01;35:.mpg=01;35:.mpeg=01;35:.m2v=01;35:.mkv=01;35:.webm=01;35:.webp=01;35:.ogm=01;35:.mp4=01;35:.m4v=01;35:.mp4v=01;35:.vob=01;35:.qt=01;35:.nuv=01;35:.wmv=01;35:.asf=01;35:.rm=01;35:.rmvb=01;35:.flc=01;35:.avi=01;35:.fli=01;35:.flv=01;35:.gl=01;35:.dl=01;35:.xcf=01;35:.xwd=01;35:.yuv=01;35:.cgm=01;35:.emf=01;35:.ogv=01;35:.ogx=01;35:.aac=00;36:.au=00;36:.flac=00;36:.m4a=00;36:.mid=00;36:.midi=00;36:.mka=00;36:.mp3=00;36:.mpc=00;36:.ogg=00;36:.ra=00;36:.wav=00;36:.oga=00;36:.opus=00;36:.spx=00;36:.xspf=00;36: LC_TERMINAL=iTerm2 SSH_CONNECTION=100.85.70.120 54311 100.106.7.66 22 LESSCLOSE=/usr/bin/lesspipe %s %s XDG_SESSION_CLASS=user TERM=xterm-256color LESSOPEN=| /usr/bin/lesspipe %s USER=dev LC_TERMINAL_VERSION=3.4.23 SHLVL=1 XDG_SESSION_ID=55 XDG_RUNTIME_DIR=/run/user/1000 SSH_CLIENT=100.85.70.120 54311 22 XDG_DATA_DIRS=/usr/local/share:/usr/share:/var/lib/snapd/desktop PATH=/home/dev/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/go/bin DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus SSH_TTY=/dev/pts/1 _=/usr/local/bin/ollama LD_LIBRARY_PATH=/usr/share/ollama/lib/rocm:/tmp/ollama3982970041/runners/rocm_v60002 HIP_VISIBLE_DEVICES=0]"
time=2024-05-03T00:51:10.703Z level=INFO source=sched.go:340 msg="loaded runners" count=1
time=2024-05-03T00:51:10.704Z level=INFO source=server.go:432 msg="waiting for llama runner to start responding"
{"function":"server_params_parse","level":"WARN","line":2497,"msg":"server.cpp is not built with verbose logging.","tid":"140112024857664","timestamp":1714697470}
time=2024-05-03T00:51:10.755Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get "http://127.0.0.1:43461/health": dial tcp 127.0.0.1:43461: connect: connection refused"
{"build":1,"commit":"952d03d","function":"main","level":"INFO","line":2823,"msg":"build info","tid":"140112024857664","timestamp":1714697470}
{"function":"main","level":"INFO","line":2830,"msg":"system info","n_threads":6,"n_threads_batch":-1,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | ","tid":"140112024857664","timestamp":1714697470,"total_threads":12}
llama_model_loader: loaded meta data with 25 key-value pairs and 291 tensors from /home/dev/.ollama/models/blobs/sha256-4fed7364ee3e0c7cb4fe0880148bfdfcd1b630981efa0802a6b62ee52e7da97e (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = LLaMA v2
llama_model_loader: - kv 2: llama.vocab_size u32 = 32064
llama_model_loader: - kv 3: llama.context_length u32 = 4096
llama_model_loader: - kv 4: llama.embedding_length u32 = 3072
llama_model_loader: - kv 5: llama.block_count u32 = 32
llama_model_loader: - kv 6: llama.feed_forward_length u32 = 8192
llama_model_loader: - kv 7: llama.rope.dimension_count u32 = 96
llama_model_loader: - kv 8: llama.attention.head_count u32 = 32
llama_model_loader: - kv 9: llama.attention.head_count_kv u32 = 32
llama_model_loader: - kv 10: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 11: llama.rope.freq_base f32 = 10000.000000
llama_model_loader: - kv 12: general.file_type u32 = 15
llama_model_loader: - kv 13: tokenizer.ggml.model str = llama
llama_model_loader: - kv 14: tokenizer.ggml.tokens arr[str,32064] = ["", "~~", "~~", "<0x00>", "<...
llama_model_loader: - kv 15: tokenizer.ggml.scores arr[f32,32064] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,32064] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 32000
llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 32000
llama_model_loader: - kv 21: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 22: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 23: tokenizer.chat_template str = {{ bos_token }}{% for message in mess...
llama_model_loader: - kv 24: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q4_K: 193 tensors
llama_model_loader: - type q6_K: 33 tensors
llm_load_vocab: special tokens definition check successful ( 323/32064 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32064
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 4096
llm_load_print_meta: n_embd = 3072
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 32
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 96
llm_load_print_meta: n_embd_head_k = 96
llm_load_print_meta: n_embd_head_v = 96
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: n_embd_k_gqa = 3072
llm_load_print_meta: n_embd_v_gqa = 3072
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 8192
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 4096
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = Q4_K - Medium
llm_load_print_meta: model params = 3.82 B
llm_load_print_meta: model size = 2.16 GiB (4.85 BPW)
llm_load_print_meta: general.name = LLaMA v2
llm_load_print_meta: BOS token = 1 ''
llm_load_print_meta: EOS token = 32000 '<|endoftext|>'
llm_load_print_meta: UNK token = 0 ''
llm_load_print_meta: PAD token = 32000 '<|endoftext|>'
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_print_meta: EOT token = 32007 '<|end|>'
time=2024-05-03T00:51:11.005Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding"
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon Pro WX 9100, compute capability 9.0, VMM: no
llm_load_tensors: ggml ctx size = 0.30 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors: ROCm0 buffer size = 2157.94 MiB
llm_load_tensors: CPU buffer size = 52.84 MiB
.time=2024-05-03T00:51:14.618Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get "http://127.0.0.1:43461/health": dial tcp 127.0.0.1:43461: i/o timeout"
time=2024-05-03T00:51:14.819Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding"
time=2024-05-03T00:51:15.020Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get "http://127.0.0.1:43461/health": dial tcp 127.0.0.1:43461: i/o timeout"
time=2024-05-03T00:51:15.221Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding"
time=2024-05-03T00:51:15.823Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get "http://127.0.0.1:43461/health": dial tcp 127.0.0.1:43461: i/o timeout"
time=2024-05-03T00:51:16.024Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding"
time=2024-05-03T00:51:18.033Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get "http://127.0.0.1:43461/health": dial tcp 127.0.0.1:43461: i/o timeout"
time=2024-05-03T00:51:18.234Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding"
time=2024-05-03T00:51:18.636Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get "http://127.0.0.1:43461/health": dial tcp 127.0.0.1:43461: i/o timeout"
time=2024-05-03T00:51:18.836Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding"

@ZanMax commented on GitHub (May 3, 2024): **After update:** > ollama -v > ollama version is 0.1.33 **The same problem:** > .time=2024-05-03T00:51:14.618Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:43461/health\": dial tcp 127.0.0.1:43461: i/o timeout" > time=2024-05-03T00:51:14.819Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding" > time=2024-05-03T00:51:15.020Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:43461/health\": dial tcp 127.0.0.1:43461: i/o timeout" > time=2024-05-03T00:51:15.221Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding" **Full log:** > time=2024-05-03T00:51:10.287Z level=INFO source=gpu.go:96 msg="Detecting GPUs" > time=2024-05-03T00:51:10.287Z level=DEBUG source=gpu.go:203 msg="Searching for GPU library" name=libcudart.so* > time=2024-05-03T00:51:10.287Z level=DEBUG source=gpu.go:221 msg="gpu library search" globs="[/tmp/ollama3982970041/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so* /home/dev/libcudart.so**]" > time=2024-05-03T00:51:10.289Z level=DEBUG source=gpu.go:249 msg="discovered GPU libraries" paths=[/tmp/ollama3982970041/runners/cuda_v11/libcudart.so.11.0] > cudaSetDevice err: 35 > time=2024-05-03T00:51:10.290Z level=DEBUG source=gpu.go:261 msg="Unable to load cudart" library=/tmp/ollama3982970041/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" > time=2024-05-03T00:51:10.290Z level=INFO source=cpu_common.go:15 msg="CPU has AVX" > time=2024-05-03T00:51:10.290Z level=INFO source=amd_linux.go:46 msg="AMD Driver: 6.2.4" > time=2024-05-03T00:51:10.290Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" > time=2024-05-03T00:51:10.290Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" > time=2024-05-03T00:51:10.290Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" > time=2024-05-03T00:51:10.290Z level=INFO source=amd_linux.go:217 msg="amdgpu memory" gpu=0 total="16368.0 MiB" > time=2024-05-03T00:51:10.290Z level=INFO source=amd_linux.go:218 msg="amdgpu memory" gpu=0 available="16360.1 MiB" > time=2024-05-03T00:51:10.290Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" > time=2024-05-03T00:51:10.291Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/local/bin/rocm" > time=2024-05-03T00:51:10.291Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/share/ollama/lib/rocm" > time=2024-05-03T00:51:10.295Z level=DEBUG source=amd_linux.go:267 msg="rocm supported GPUs" types="[gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]" > time=2024-05-03T00:51:10.295Z level=INFO source=amd_linux.go:276 msg="amdgpu is supported" gpu=0 gpu_type=gfx900 > time=2024-05-03T00:51:10.295Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/2/properties" > time=2024-05-03T00:51:10.295Z level=INFO source=amd_linux.go:217 msg="amdgpu memory" gpu=1 total="16368.0 MiB" > time=2024-05-03T00:51:10.295Z level=INFO source=amd_linux.go:218 msg="amdgpu memory" gpu=1 available="16360.1 MiB" > time=2024-05-03T00:51:10.295Z level=INFO source=amd_linux.go:276 msg="amdgpu is supported" gpu=1 gpu_type=gfx900 > time=2024-05-03T00:51:10.295Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/3/properties" > time=2024-05-03T00:51:10.295Z level=INFO source=amd_linux.go:217 msg="amdgpu memory" gpu=2 total="16368.0 MiB" > time=2024-05-03T00:51:10.295Z level=INFO source=amd_linux.go:218 msg="amdgpu memory" gpu=2 available="16360.1 MiB" > time=2024-05-03T00:51:10.295Z level=INFO source=amd_linux.go:276 msg="amdgpu is supported" gpu=2 gpu_type=gfx900 > time=2024-05-03T00:51:10.295Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/4/properties" > time=2024-05-03T00:51:10.296Z level=INFO source=amd_linux.go:217 msg="amdgpu memory" gpu=3 total="16368.0 MiB" > time=2024-05-03T00:51:10.296Z level=INFO source=amd_linux.go:218 msg="amdgpu memory" gpu=3 available="16360.1 MiB" > time=2024-05-03T00:51:10.296Z level=INFO source=amd_linux.go:276 msg="amdgpu is supported" gpu=3 gpu_type=gfx900 > time=2024-05-03T00:51:10.296Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/5/properties" > time=2024-05-03T00:51:10.296Z level=INFO source=amd_linux.go:217 msg="amdgpu memory" gpu=4 total="16368.0 MiB" > time=2024-05-03T00:51:10.296Z level=INFO source=amd_linux.go:218 msg="amdgpu memory" gpu=4 available="16360.1 MiB" > time=2024-05-03T00:51:10.296Z level=INFO source=amd_linux.go:276 msg="amdgpu is supported" gpu=4 gpu_type=gfx900 > time=2024-05-03T00:51:10.296Z level=DEBUG source=gguf.go:57 msg="model = &llm.gguf{containerGGUF:(*llm.containerGGUF)(0xc00047ddc0), kv:llm.KV{}, tensors:[]*llm.Tensor(nil), parameters:0x0}" > time=2024-05-03T00:51:10.699Z level=DEBUG source=sched.go:162 msg="loading first model" model=/home/dev/.ollama/models/blobs/sha256-4fed7364ee3e0c7cb4fe0880148bfdfcd1b630981efa0802a6b62ee52e7da97e > time=2024-05-03T00:51:10.699Z level=DEBUG source=memory.go:64 msg=evaluating library=rocm gpu_count=1 available="16360.1 MiB" > time=2024-05-03T00:51:10.700Z level=INFO source=memory.go:152 msg="offload to gpu" layers.real=-1 layers.estimate=33 memory.available="16360.1 MiB" memory.required.full="3538.9 MiB" memory.required.partial="3538.9 MiB" memory.required.kv="768.0 MiB" memory.weights.total="2157.9 MiB" memory.weights.repeating="2080.9 MiB" memory.weights.nonrepeating="77.1 MiB" memory.graph.full="156.0 MiB" memory.graph.partial="175.1 MiB" > time=2024-05-03T00:51:10.700Z level=DEBUG source=sched.go:508 msg="new model will fit in available VRAM in single GPU, loading" model=/home/dev/.ollama/models/blobs/sha256-4fed7364ee3e0c7cb4fe0880148bfdfcd1b630981efa0802a6b62ee52e7da97e gpu=0 available=17154772992 required="3538.9 MiB" > time=2024-05-03T00:51:10.700Z level=DEBUG source=memory.go:64 msg=evaluating library=rocm gpu_count=1 available="16360.1 MiB" > time=2024-05-03T00:51:10.701Z level=INFO source=memory.go:152 msg="offload to gpu" layers.real=-1 layers.estimate=33 memory.available="16360.1 MiB" memory.required.full="3538.9 MiB" memory.required.partial="3538.9 MiB" memory.required.kv="768.0 MiB" memory.weights.total="2157.9 MiB" memory.weights.repeating="2080.9 MiB" memory.weights.nonrepeating="77.1 MiB" memory.graph.full="156.0 MiB" memory.graph.partial="175.1 MiB" > time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cpu > time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cpu_avx > time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cpu_avx2 > time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cuda_v11 > time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/rocm_v60002 > time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cpu > time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cpu_avx > time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cpu_avx2 > time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cuda_v11 > time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/rocm_v60002 > time=2024-05-03T00:51:10.701Z level=INFO source=cpu_common.go:15 msg="CPU has AVX" > time=2024-05-03T00:51:10.702Z level=INFO source=server.go:289 msg="starting llama server" cmd="/tmp/ollama3982970041/runners/rocm_v60002/ollama_llama_server --model /home/dev/.ollama/models/blobs/sha256-4fed7364ee3e0c7cb4fe0880148bfdfcd1b630981efa0802a6b62ee52e7da97e --ctx-size 2048 --batch-size 512 --embedding --log-format json --n-gpu-layers 33 --verbose --parallel 1 --port 43461" > time=2024-05-03T00:51:10.703Z level=DEBUG source=server.go:291 msg=subprocess environment="[OLLAMA_DEBUG=1 SHELL=/bin/bash PWD=/home/dev LOGNAME=dev XDG_SESSION_TYPE=tty MOTD_SHOWN=pam HOME=/home/dev LANG=en_US.UTF-8 LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36: LC_TERMINAL=iTerm2 SSH_CONNECTION=100.85.70.120 54311 100.106.7.66 22 LESSCLOSE=/usr/bin/lesspipe %s %s XDG_SESSION_CLASS=user TERM=xterm-256color LESSOPEN=| /usr/bin/lesspipe %s USER=dev LC_TERMINAL_VERSION=3.4.23 SHLVL=1 XDG_SESSION_ID=55 XDG_RUNTIME_DIR=/run/user/1000 SSH_CLIENT=100.85.70.120 54311 22 XDG_DATA_DIRS=/usr/local/share:/usr/share:/var/lib/snapd/desktop PATH=/home/dev/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/go/bin DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus SSH_TTY=/dev/pts/1 _=/usr/local/bin/ollama LD_LIBRARY_PATH=/usr/share/ollama/lib/rocm:/tmp/ollama3982970041/runners/rocm_v60002 HIP_VISIBLE_DEVICES=0]" > time=2024-05-03T00:51:10.703Z level=INFO source=sched.go:340 msg="loaded runners" count=1 > time=2024-05-03T00:51:10.704Z level=INFO source=server.go:432 msg="waiting for llama runner to start responding" > {"function":"server_params_parse","level":"WARN","line":2497,"msg":"server.cpp is not built with verbose logging.","tid":"140112024857664","timestamp":1714697470} > time=2024-05-03T00:51:10.755Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:43461/health\": dial tcp 127.0.0.1:43461: connect: connection refused" > {"build":1,"commit":"952d03d","function":"main","level":"INFO","line":2823,"msg":"build info","tid":"140112024857664","timestamp":1714697470} > {"function":"main","level":"INFO","line":2830,"msg":"system info","n_threads":6,"n_threads_batch":-1,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | ","tid":"140112024857664","timestamp":1714697470,"total_threads":12} > llama_model_loader: loaded meta data with 25 key-value pairs and 291 tensors from /home/dev/.ollama/models/blobs/sha256-4fed7364ee3e0c7cb4fe0880148bfdfcd1b630981efa0802a6b62ee52e7da97e (version GGUF V3 (latest)) > llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. > llama_model_loader: - kv 0: general.architecture str = llama > llama_model_loader: - kv 1: general.name str = LLaMA v2 > llama_model_loader: - kv 2: llama.vocab_size u32 = 32064 > llama_model_loader: - kv 3: llama.context_length u32 = 4096 > llama_model_loader: - kv 4: llama.embedding_length u32 = 3072 > llama_model_loader: - kv 5: llama.block_count u32 = 32 > llama_model_loader: - kv 6: llama.feed_forward_length u32 = 8192 > llama_model_loader: - kv 7: llama.rope.dimension_count u32 = 96 > llama_model_loader: - kv 8: llama.attention.head_count u32 = 32 > llama_model_loader: - kv 9: llama.attention.head_count_kv u32 = 32 > llama_model_loader: - kv 10: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 > llama_model_loader: - kv 11: llama.rope.freq_base f32 = 10000.000000 > llama_model_loader: - kv 12: general.file_type u32 = 15 > llama_model_loader: - kv 13: tokenizer.ggml.model str = llama > llama_model_loader: - kv 14: tokenizer.ggml.tokens arr[str,32064] = ["<unk>", "<s>", "</s>", "<0x00>", "<... > llama_model_loader: - kv 15: tokenizer.ggml.scores arr[f32,32064] = [0.000000, 0.000000, 0.000000, 0.0000... > llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,32064] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... > llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 1 > llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 32000 > llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 0 > llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 32000 > llama_model_loader: - kv 21: tokenizer.ggml.add_bos_token bool = true > llama_model_loader: - kv 22: tokenizer.ggml.add_eos_token bool = false > llama_model_loader: - kv 23: tokenizer.chat_template str = {{ bos_token }}{% for message in mess... > llama_model_loader: - kv 24: general.quantization_version u32 = 2 > llama_model_loader: - type f32: 65 tensors > llama_model_loader: - type q4_K: 193 tensors > llama_model_loader: - type q6_K: 33 tensors > llm_load_vocab: special tokens definition check successful ( 323/32064 ). > llm_load_print_meta: format = GGUF V3 (latest) > llm_load_print_meta: arch = llama > llm_load_print_meta: vocab type = SPM > llm_load_print_meta: n_vocab = 32064 > llm_load_print_meta: n_merges = 0 > llm_load_print_meta: n_ctx_train = 4096 > llm_load_print_meta: n_embd = 3072 > llm_load_print_meta: n_head = 32 > llm_load_print_meta: n_head_kv = 32 > llm_load_print_meta: n_layer = 32 > llm_load_print_meta: n_rot = 96 > llm_load_print_meta: n_embd_head_k = 96 > llm_load_print_meta: n_embd_head_v = 96 > llm_load_print_meta: n_gqa = 1 > llm_load_print_meta: n_embd_k_gqa = 3072 > llm_load_print_meta: n_embd_v_gqa = 3072 > llm_load_print_meta: f_norm_eps = 0.0e+00 > llm_load_print_meta: f_norm_rms_eps = 1.0e-05 > llm_load_print_meta: f_clamp_kqv = 0.0e+00 > llm_load_print_meta: f_max_alibi_bias = 0.0e+00 > llm_load_print_meta: f_logit_scale = 0.0e+00 > llm_load_print_meta: n_ff = 8192 > llm_load_print_meta: n_expert = 0 > llm_load_print_meta: n_expert_used = 0 > llm_load_print_meta: causal attn = 1 > llm_load_print_meta: pooling type = 0 > llm_load_print_meta: rope type = 0 > llm_load_print_meta: rope scaling = linear > llm_load_print_meta: freq_base_train = 10000.0 > llm_load_print_meta: freq_scale_train = 1 > llm_load_print_meta: n_yarn_orig_ctx = 4096 > llm_load_print_meta: rope_finetuned = unknown > llm_load_print_meta: ssm_d_conv = 0 > llm_load_print_meta: ssm_d_inner = 0 > llm_load_print_meta: ssm_d_state = 0 > llm_load_print_meta: ssm_dt_rank = 0 > llm_load_print_meta: model type = 7B > llm_load_print_meta: model ftype = Q4_K - Medium > llm_load_print_meta: model params = 3.82 B > llm_load_print_meta: model size = 2.16 GiB (4.85 BPW) > llm_load_print_meta: general.name = LLaMA v2 > llm_load_print_meta: BOS token = 1 '<s>' > llm_load_print_meta: EOS token = 32000 '<|endoftext|>' > llm_load_print_meta: UNK token = 0 '<unk>' > llm_load_print_meta: PAD token = 32000 '<|endoftext|>' > llm_load_print_meta: LF token = 13 '<0x0A>' > llm_load_print_meta: EOT token = 32007 '<|end|>' > time=2024-05-03T00:51:11.005Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding" > ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no > ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes > ggml_cuda_init: found 1 ROCm devices: > Device 0: Radeon Pro WX 9100, compute capability 9.0, VMM: no > llm_load_tensors: ggml ctx size = 0.30 MiB > llm_load_tensors: offloading 32 repeating layers to GPU > llm_load_tensors: offloading non-repeating layers to GPU > llm_load_tensors: offloaded 33/33 layers to GPU > llm_load_tensors: ROCm0 buffer size = 2157.94 MiB > llm_load_tensors: CPU buffer size = 52.84 MiB > .time=2024-05-03T00:51:14.618Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:43461/health\": dial tcp 127.0.0.1:43461: i/o timeout" > time=2024-05-03T00:51:14.819Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding" > time=2024-05-03T00:51:15.020Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:43461/health\": dial tcp 127.0.0.1:43461: i/o timeout" > time=2024-05-03T00:51:15.221Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding" > time=2024-05-03T00:51:15.823Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:43461/health\": dial tcp 127.0.0.1:43461: i/o timeout" > time=2024-05-03T00:51:16.024Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding" > time=2024-05-03T00:51:18.033Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:43461/health\": dial tcp 127.0.0.1:43461: i/o timeout" > time=2024-05-03T00:51:18.234Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding" > time=2024-05-03T00:51:18.636Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:43461/health\": dial tcp 127.0.0.1:43461: i/o timeout" > time=2024-05-03T00:51:18.836Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding" >

GiteaMirror commented

2026-04-22 06:04:40 -05:00

@ZanMax commented on GitHub (May 3, 2024):

I can provide access to the server for debugging if it's necessary.

@ZanMax commented on GitHub (May 3, 2024): I can provide access to the server for debugging if it's necessary.

GiteaMirror commented

2026-04-22 06:04:42 -05:00

@dhiltgen commented on GitHub (May 4, 2024):

From the log it looks like we're correctly discovering the 4 GPUs, we determine the model can fit in a single GPU, and while loading the model, it seems to be getting suck somewhere in the low-level ROCm logic.

It's possible this might be a variation on #3840 just with a slightly different failure mode. It appears ROCm may have some issues with the gfx900 GPUs possibly...

@dhiltgen commented on GitHub (May 4, 2024): From the log it looks like we're correctly discovering the 4 GPUs, we determine the model can fit in a single GPU, and while loading the model, it seems to be getting suck somewhere in the low-level ROCm logic. It's possible this might be a variation on #3840 just with a slightly different failure mode. It appears ROCm may have some issues with the gfx900 GPUs possibly...

GiteaMirror commented

2026-04-22 06:04:45 -05:00

@ZanMax commented on GitHub (May 6, 2024):

In my case llama.cpp works great.
I run 5 processes of llama.cpp each on a different GPU.
Thank you for your help.
I think it's not critical to support so old hardware.

@ZanMax commented on GitHub (May 6, 2024): In my case llama.cpp works great. I run 5 processes of llama.cpp each on a different GPU. Thank you for your help. I think it's not critical to support so old hardware.

Sign in to join this conversation.

Branches Tags

main

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-remove-claude-desktop-launch

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#28193