[GH-ISSUE #3928] rocm crash with 4 gfx900 GPUs #28193

Closed
opened 2026-04-22 06:04:25 -05:00 by GiteaMirror · 10 comments
Owner

Originally created by @ZanMax on GitHub (Apr 26, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3928

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

My system:
Ubuntu: 22.04
CPU: E5 2620
GPU: WX 9100

I have installed drivers and ROCm.
But when I try to run ollama I receive:

time=2024-04-26T02:45:47.779Z level=INFO source=routes.go:1063 msg="Listening on 127.0.0.1:11434 (version 0.0.0)"
time=2024-04-26T02:45:47.780Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3693243001/runners
time=2024-04-26T02:45:47.780Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz
time=2024-04-26T02:45:47.780Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v0 file=build/linux/x86_64/rocm_v0/bin/deps.txt.gz
time=2024-04-26T02:45:47.780Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v0 file=build/linux/x86_64/rocm_v0/bin/ollama_llama_server.gz
time=2024-04-26T02:45:48.242Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3693243001/runners/cpu
time=2024-04-26T02:45:48.242Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3693243001/runners/rocm_v0
time=2024-04-26T02:45:48.242Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu rocm_v0]"
time=2024-04-26T02:45:48.242Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-04-26T02:45:48.242Z level=DEBUG source=sched.go:101 msg="starting llm scheduler"
time=2024-04-26T02:45:48.242Z level=INFO source=gpu.go:96 msg="Detecting GPUs"
time=2024-04-26T02:45:48.242Z level=DEBUG source=gpu.go:203 msg="Searching for GPU library" name=libcudart.so*
time=2024-04-26T02:45:48.242Z level=DEBUG source=gpu.go:221 msg="gpu library search" globs="[/tmp/ollama3693243001/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers//libcudart.so /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so* /opt/rocm/lib/libcudart.so** /home/dev/libcudart.so**]"
time=2024-04-26T02:45:48.244Z level=DEBUG source=gpu.go:249 msg="discovered GPU libraries" paths=[]
time=2024-04-26T02:45:48.244Z level=INFO source=cpu_common.go:15 msg="CPU has AVX"
time=2024-04-26T02:45:48.244Z level=INFO source=amd_linux.go:46 msg="AMD Driver: 6.2.4"
time=2024-04-26T02:45:48.244Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-04-26T02:45:48.244Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-04-26T02:45:48.244Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-04-26T02:45:48.244Z level=INFO source=amd_linux.go:217 msg="amdgpu memory" gpu=0 total="16368.0 MiB"
time=2024-04-26T02:45:48.245Z level=INFO source=amd_linux.go:218 msg="amdgpu memory" gpu=0 available="5697.0 MiB"
time=2024-04-26T02:45:48.245Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-04-26T02:45:48.245Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /home/dev"
time=2024-04-26T02:45:48.245Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-04-26T02:45:48.245Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/bin/rocm"
time=2024-04-26T02:45:48.245Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/share/ollama/lib/rocm"
time=2024-04-26T02:45:48.245Z level=WARN source=amd_linux.go:321 msg="amdgpu detected, but no compatible rocm library found. Either install rocm v6, or follow manual install instructions at https://github.com/ollama/ollama/blob/main/docs/linux.md#manual-install"
time=2024-04-26T02:45:48.245Z level=WARN source=amd_linux.go:253 msg="unable to verify rocm library, will use cpu" error="no suitable rocm found, falling back to CPU"

OS

Linux

GPU

AMD

CPU

Intel

Ollama version

0.1.32

Originally created by @ZanMax on GitHub (Apr 26, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3928 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? My system: Ubuntu: 22.04 CPU: E5 2620 GPU: WX 9100 I have installed drivers and ROCm. But when I try to run ollama I receive: > time=2024-04-26T02:45:47.779Z level=INFO source=routes.go:1063 msg="Listening on 127.0.0.1:11434 (version 0.0.0)" > time=2024-04-26T02:45:47.780Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3693243001/runners > time=2024-04-26T02:45:47.780Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz > time=2024-04-26T02:45:47.780Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v0 file=build/linux/x86_64/rocm_v0/bin/deps.txt.gz > time=2024-04-26T02:45:47.780Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v0 file=build/linux/x86_64/rocm_v0/bin/ollama_llama_server.gz > time=2024-04-26T02:45:48.242Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3693243001/runners/cpu > time=2024-04-26T02:45:48.242Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3693243001/runners/rocm_v0 > time=2024-04-26T02:45:48.242Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu rocm_v0]" > time=2024-04-26T02:45:48.242Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" > time=2024-04-26T02:45:48.242Z level=DEBUG source=sched.go:101 msg="starting llm scheduler" > time=2024-04-26T02:45:48.242Z level=INFO source=gpu.go:96 msg="Detecting GPUs" > time=2024-04-26T02:45:48.242Z level=DEBUG source=gpu.go:203 msg="Searching for GPU library" name=libcudart.so* > time=2024-04-26T02:45:48.242Z level=DEBUG source=gpu.go:221 msg="gpu library search" globs="[/tmp/ollama3693243001/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so* /opt/rocm/lib/libcudart.so** /home/dev/libcudart.so**]" > time=2024-04-26T02:45:48.244Z level=DEBUG source=gpu.go:249 msg="discovered GPU libraries" paths=[] > time=2024-04-26T02:45:48.244Z level=INFO source=cpu_common.go:15 msg="CPU has AVX" > time=2024-04-26T02:45:48.244Z level=INFO source=amd_linux.go:46 msg="AMD Driver: 6.2.4" > time=2024-04-26T02:45:48.244Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" > time=2024-04-26T02:45:48.244Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" > time=2024-04-26T02:45:48.244Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" > time=2024-04-26T02:45:48.244Z level=INFO source=amd_linux.go:217 msg="amdgpu memory" gpu=0 total="16368.0 MiB" > time=2024-04-26T02:45:48.245Z level=INFO source=amd_linux.go:218 msg="amdgpu memory" gpu=0 available="5697.0 MiB" > time=2024-04-26T02:45:48.245Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" > time=2024-04-26T02:45:48.245Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /home/dev" > time=2024-04-26T02:45:48.245Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" > time=2024-04-26T02:45:48.245Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/bin/rocm" > time=2024-04-26T02:45:48.245Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/share/ollama/lib/rocm" > time=2024-04-26T02:45:48.245Z level=WARN source=amd_linux.go:321 msg="amdgpu detected, but no compatible rocm library found. Either install rocm v6, or follow manual install instructions at https://github.com/ollama/ollama/blob/main/docs/linux.md#manual-install" > time=2024-04-26T02:45:48.245Z level=WARN source=amd_linux.go:253 msg="unable to verify rocm library, will use cpu" error="no suitable rocm found, falling back to CPU" ### OS Linux ### GPU AMD ### CPU Intel ### Ollama version 0.1.32
GiteaMirror added the amdbug labels 2026-04-22 06:04:25 -05:00
Author
Owner

@ZanMax commented on GitHub (Apr 26, 2024):

After reinstalling Ollama. It starts to offload into GPU.
But I received a new problem:

time=2024-04-26T19:26:29.174Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get "http://127.0.0.1:38459/health": dial tcp 127.0.0.1:38459: i/o timeout"
time=2024-04-26T19:26:29.375Z level=DEBUG source=server.go:420 msg="server not yet available" error="server not responding"
time=2024-04-26T19:26:29.576Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get "http://127.0.0.1:38459/health": dial tcp 127.0.0.1:38459: i/o timeout"
time=2024-04-26T19:26:29.776Z level=DEBUG source=server.go:420 msg="server not yet available" error="server not responding"
time=2024-04-26T19:26:30.178Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get "http://127.0.0.1:38459/health": dial tcp 127.0.0.1:38459: i/o timeout"
time=2024-04-26T19:26:30.580Z level=DEBUG source=server.go:420 msg="server not yet available" error="server not responding"

Full log:
ollama_problem.txt

<!-- gh-comment-id:2080005500 --> @ZanMax commented on GitHub (Apr 26, 2024): After reinstalling Ollama. It starts to offload into GPU. But I received a new problem: > time=2024-04-26T19:26:29.174Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:38459/health\": dial tcp 127.0.0.1:38459: i/o timeout" > time=2024-04-26T19:26:29.375Z level=DEBUG source=server.go:420 msg="server not yet available" error="server not responding" > time=2024-04-26T19:26:29.576Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:38459/health\": dial tcp 127.0.0.1:38459: i/o timeout" > time=2024-04-26T19:26:29.776Z level=DEBUG source=server.go:420 msg="server not yet available" error="server not responding" > time=2024-04-26T19:26:30.178Z level=DEBUG source=server.go:420 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:38459/health\": dial tcp 127.0.0.1:38459: i/o timeout" > time=2024-04-26T19:26:30.580Z level=DEBUG source=server.go:420 msg="server not yet available" error="server not responding" Full log: [ollama_problem.txt](https://github.com/ollama/ollama/files/15134633/ollama_problem.txt)
Author
Owner

@dhiltgen commented on GitHub (May 1, 2024):

Do you actually have 4 GPUs, or did the amdgpu driver go sideways and incorrectly enumerate 4 GPUs when there is only 1? Your first log shows only 1, so my suspicion is you're hitting GPU driver bugs. If it is reporting "ghost" GPUs that don't exist, things will go bad when we try to allocate memory on them which seems like a plausible explanation for the crash. If that's correct, rebooting will likely clear it.

time=2024-04-26T19:26:26.048Z level=INFO source=amd_linux.go:121 msg="amdgpu [0] gfx900 is supported"
time=2024-04-26T19:26:26.048Z level=INFO source=amd_linux.go:121 msg="amdgpu [1] gfx900 is supported"
time=2024-04-26T19:26:26.048Z level=INFO source=amd_linux.go:121 msg="amdgpu [2] gfx900 is supported"
time=2024-04-26T19:26:26.048Z level=INFO source=amd_linux.go:121 msg="amdgpu [3] gfx900 is supported"
time=2024-04-26T19:26:26.048Z level=INFO source=amd_linux.go:121 msg="amdgpu [4] gfx900 is supported"
time=2024-04-26T19:26:26.048Z level=DEBUG source=amd_linux.go:169 msg="discovering VRAM for amdgpu devices"
time=2024-04-26T19:26:26.048Z level=DEBUG source=amd_linux.go:188 msg="amdgpu devices [2 3 4 0 1]"
time=2024-04-26T19:26:26.048Z level=INFO source=amd_linux.go:263 msg="[2] amdgpu totalMemory 16368M"
time=2024-04-26T19:26:26.048Z level=INFO source=amd_linux.go:264 msg="[2] amdgpu freeMemory  5697M"
time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:263 msg="[3] amdgpu totalMemory 16368M"
time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:264 msg="[3] amdgpu freeMemory  5697M"
time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:263 msg="[4] amdgpu totalMemory 16368M"
time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:264 msg="[4] amdgpu freeMemory  5697M"
time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:263 msg="[0] amdgpu totalMemory 16368M"
time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:264 msg="[0] amdgpu freeMemory  16368M"
time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:263 msg="[1] amdgpu totalMemory 16368M"
time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:264 msg="[1] amdgpu freeMemory  5697M"

You can sanity test this theory by checking the contents of ls /sys/class/kfd/kfd/topology/nodes/ and see how many are being reported.

<!-- gh-comment-id:2089175012 --> @dhiltgen commented on GitHub (May 1, 2024): Do you actually have 4 GPUs, or did the amdgpu driver go sideways and incorrectly enumerate 4 GPUs when there is only 1? Your first log shows only 1, so my suspicion is you're hitting GPU driver bugs. If it is reporting "ghost" GPUs that don't exist, things will go bad when we try to allocate memory on them which seems like a plausible explanation for the crash. If that's correct, rebooting will likely clear it. ``` time=2024-04-26T19:26:26.048Z level=INFO source=amd_linux.go:121 msg="amdgpu [0] gfx900 is supported" time=2024-04-26T19:26:26.048Z level=INFO source=amd_linux.go:121 msg="amdgpu [1] gfx900 is supported" time=2024-04-26T19:26:26.048Z level=INFO source=amd_linux.go:121 msg="amdgpu [2] gfx900 is supported" time=2024-04-26T19:26:26.048Z level=INFO source=amd_linux.go:121 msg="amdgpu [3] gfx900 is supported" time=2024-04-26T19:26:26.048Z level=INFO source=amd_linux.go:121 msg="amdgpu [4] gfx900 is supported" time=2024-04-26T19:26:26.048Z level=DEBUG source=amd_linux.go:169 msg="discovering VRAM for amdgpu devices" time=2024-04-26T19:26:26.048Z level=DEBUG source=amd_linux.go:188 msg="amdgpu devices [2 3 4 0 1]" time=2024-04-26T19:26:26.048Z level=INFO source=amd_linux.go:263 msg="[2] amdgpu totalMemory 16368M" time=2024-04-26T19:26:26.048Z level=INFO source=amd_linux.go:264 msg="[2] amdgpu freeMemory 5697M" time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:263 msg="[3] amdgpu totalMemory 16368M" time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:264 msg="[3] amdgpu freeMemory 5697M" time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:263 msg="[4] amdgpu totalMemory 16368M" time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:264 msg="[4] amdgpu freeMemory 5697M" time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:263 msg="[0] amdgpu totalMemory 16368M" time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:264 msg="[0] amdgpu freeMemory 16368M" time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:263 msg="[1] amdgpu totalMemory 16368M" time=2024-04-26T19:26:26.049Z level=INFO source=amd_linux.go:264 msg="[1] amdgpu freeMemory 5697M" ``` You can sanity test this theory by checking the contents of `ls /sys/class/kfd/kfd/topology/nodes/` and see how many are being reported.
Author
Owner

@ZanMax commented on GitHub (May 1, 2024):

Yes, I have 5 GPUs ( 5 x WX9100 ) in my system.

ls /sys/class/kfd/kfd/topology/nodes/
0 1 2 3 4 5

lspci | grep VGA
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon PRO WX 9100]
06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon PRO WX 9100]
09:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon PRO WX 9100]
0c:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon PRO WX 9100]
0f:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon PRO WX 9100]
12:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)

<!-- gh-comment-id:2089297164 --> @ZanMax commented on GitHub (May 1, 2024): Yes, I have 5 GPUs ( 5 x WX9100 ) in my system. ls /sys/class/kfd/kfd/topology/nodes/ 0 1 2 3 4 5 lspci | grep VGA 03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon PRO WX 9100] 06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon PRO WX 9100] 09:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon PRO WX 9100] 0c:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon PRO WX 9100] 0f:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon PRO WX 9100] 12:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)
Author
Owner

@dhiltgen commented on GitHub (May 2, 2024):

I think the explanation is likely bugs in ROCm related to multi-GPU. In 0.1.32 and older, we always spread out on available GPUs. The good news is in 0.1.33 (RC available now) we try to run models in a single GPU if they fit, and only spread across multiple GPUs when the model is larger than VRAM on the largest card. As long as you run models smaller than the biggest GPU, you should see things start working in 0.1.33, but larger models will likely still fail until the underlying bug is resolved.

https://github.com/ollama/ollama/releases

<!-- gh-comment-id:2091036882 --> @dhiltgen commented on GitHub (May 2, 2024): I think the explanation is likely bugs in ROCm related to multi-GPU. In 0.1.32 and older, we always spread out on available GPUs. The good news is in 0.1.33 (RC available now) we try to run models in a single GPU if they fit, and only spread across multiple GPUs when the model is larger than VRAM on the largest card. As long as you run models smaller than the biggest GPU, you should see things start working in 0.1.33, but larger models will likely still fail until the underlying bug is resolved. https://github.com/ollama/ollama/releases
Author
Owner

@ZanMax commented on GitHub (May 2, 2024):

I tried to run ollama with just one GPU:

OLLAMA_DEBUG=1 HIP_VISIBLE_DEVICES=1 ollama serve

as a result:

time=2024-05-02T21:54:36.454Z level=INFO source=gpu.go:314 msg="Discovered GPU libraries: []"
time=2024-05-02T21:54:36.454Z level=INFO source=cpu_common.go:15 msg="CPU has AVX"
time=2024-05-02T21:54:36.454Z level=INFO source=amd_linux.go:50 msg="AMD Driver: 6.2.4"
time=2024-05-02T21:54:36.454Z level=DEBUG source=amd_linux.go:169 msg="discovering VRAM for amdgpu devices"
time=2024-05-02T21:54:36.454Z level=DEBUG source=amd_linux.go:188 msg="amdgpu devices [1]"
time=2024-05-02T21:54:36.454Z level=INFO source=amd_linux.go:263 msg="[1] amdgpu totalMemory 16368M"
time=2024-05-02T21:54:36.454Z level=INFO source=amd_linux.go:264 msg="[1] amdgpu freeMemory 16335M"
time=2024-05-02T21:54:36.455Z level=INFO source=server.go:127 msg="offload to gpu" reallayers=33 layers=33 required="3538.9 MiB" used="3538.9 MiB" available="16335.1 MiB" kv="768.0 MiB" fulloffload="156.0 MiB" partialoffload="175.1 MiB"
time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cpu
time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cpu_avx
time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cpu_avx2
time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cuda_v11
time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/rocm_v60002
time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cpu
time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cpu_avx
time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cpu_avx2
time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cuda_v11
time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/rocm_v60002
time=2024-05-02T21:54:36.455Z level=INFO source=cpu_common.go:15 msg="CPU has AVX"
time=2024-05-02T21:54:36.456Z level=DEBUG source=server.go:259 msg="LD_LIBRARY_PATH=/tmp/ollama3088734008/runners/rocm_v60002"
time=2024-05-02T21:54:36.456Z level=INFO source=server.go:264 msg="starting llama server" cmd="/tmp/ollama3088734008/runners/rocm_v60002/ollama_llama_server --model /home/dev/.ollama/models/blobs/sha256-4fed7364ee3e0c7cb4fe0880148bfdfcd1b630981efa0802a6b62ee52e7da97e --ctx-size 2048 --batch-size 512 --embedding --log-format json --n-gpu-layers 33 --verbose --port 40127"
time=2024-05-02T21:54:36.457Z level=INFO source=server.go:389 msg="waiting for llama runner to start responding"
/tmp/ollama3088734008/runners/rocm_v60002/ollama_llama_server: error while loading shared libraries: libhipblas.so.2: cannot open shared object file: No such file or directory
time=2024-05-02T21:54:36.507Z level=ERROR source=routes.go:120 msg="error loading llama server" error="llama runner process no longer running: 127 "

What I noticed:
file=/tmp/ollama3088734008/runners/rocm_v60002

I have 5.7 installed in my system.
And I am not sure that 6.0 supports gfx900.

<!-- gh-comment-id:2091820935 --> @ZanMax commented on GitHub (May 2, 2024): I tried to run ollama with just one GPU: OLLAMA_DEBUG=1 HIP_VISIBLE_DEVICES=1 ollama serve as a result: > time=2024-05-02T21:54:36.454Z level=INFO source=gpu.go:314 msg="Discovered GPU libraries: []" > time=2024-05-02T21:54:36.454Z level=INFO source=cpu_common.go:15 msg="CPU has AVX" > time=2024-05-02T21:54:36.454Z level=INFO source=amd_linux.go:50 msg="AMD Driver: 6.2.4" > time=2024-05-02T21:54:36.454Z level=DEBUG source=amd_linux.go:169 msg="discovering VRAM for amdgpu devices" > time=2024-05-02T21:54:36.454Z level=DEBUG source=amd_linux.go:188 msg="amdgpu devices [1]" > time=2024-05-02T21:54:36.454Z level=INFO source=amd_linux.go:263 msg="[1] amdgpu totalMemory 16368M" > time=2024-05-02T21:54:36.454Z level=INFO source=amd_linux.go:264 msg="[1] amdgpu freeMemory 16335M" > time=2024-05-02T21:54:36.455Z level=INFO source=server.go:127 msg="offload to gpu" reallayers=33 layers=33 required="3538.9 MiB" used="3538.9 MiB" available="16335.1 MiB" kv="768.0 MiB" fulloffload="156.0 MiB" partialoffload="175.1 MiB" > time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cpu > time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cpu_avx > time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cpu_avx2 > time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cuda_v11 > time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/rocm_v60002 > time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cpu > time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cpu_avx > time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cpu_avx2 > time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/cuda_v11 > time=2024-05-02T21:54:36.455Z level=DEBUG source=payload.go:68 msg="availableServers : found" file=/tmp/ollama3088734008/runners/rocm_v60002 > time=2024-05-02T21:54:36.455Z level=INFO source=cpu_common.go:15 msg="CPU has AVX" > time=2024-05-02T21:54:36.456Z level=DEBUG source=server.go:259 msg="LD_LIBRARY_PATH=/tmp/ollama3088734008/runners/rocm_v60002" > time=2024-05-02T21:54:36.456Z level=INFO source=server.go:264 msg="starting llama server" cmd="/tmp/ollama3088734008/runners/rocm_v60002/ollama_llama_server --model /home/dev/.ollama/models/blobs/sha256-4fed7364ee3e0c7cb4fe0880148bfdfcd1b630981efa0802a6b62ee52e7da97e --ctx-size 2048 --batch-size 512 --embedding --log-format json --n-gpu-layers 33 --verbose --port 40127" > time=2024-05-02T21:54:36.457Z level=INFO source=server.go:389 msg="waiting for llama runner to start responding" > /tmp/ollama3088734008/runners/rocm_v60002/ollama_llama_server: error while loading shared libraries: libhipblas.so.2: cannot open shared object file: No such file or directory > time=2024-05-02T21:54:36.507Z level=ERROR source=routes.go:120 msg="error loading llama server" error="llama runner process no longer running: 127 " What I noticed: file=/tmp/ollama3088734008/runners/rocm_v60002 I have 5.7 installed in my system. And I am not sure that 6.0 supports gfx900.
Author
Owner

@dhiltgen commented on GitHub (May 2, 2024):

Hmm... this is implying /tmp/ollama3088734008/runners/rocm_v60002/libhipblas.so.2 doesn't exist.

If you're running 0.1.33, in debug mode you should see something like this during early startup

time=2024-05-02T23:49:39.975Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-05-02T23:49:39.975Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /tmp/daniel_ollama_test/rocm"
time=2024-05-02T23:49:39.975Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/share/ollama/lib/rocm"
time=2024-05-02T23:49:39.977Z level=DEBUG source=amd_linux.go:267 msg="rocm supported GPUs" types="[gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]"

gfx900 is included in the v6 ROCm we bundle.

<!-- gh-comment-id:2091919598 --> @dhiltgen commented on GitHub (May 2, 2024): Hmm... this is implying `/tmp/ollama3088734008/runners/rocm_v60002/libhipblas.so.2` doesn't exist. If you're running 0.1.33, in debug mode you should see something like this during early startup ``` time=2024-05-02T23:49:39.975Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-05-02T23:49:39.975Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /tmp/daniel_ollama_test/rocm" time=2024-05-02T23:49:39.975Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/share/ollama/lib/rocm" time=2024-05-02T23:49:39.977Z level=DEBUG source=amd_linux.go:267 msg="rocm supported GPUs" types="[gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]" ``` gfx900 is included in the v6 ROCm we bundle.
Author
Owner

@ZanMax commented on GitHub (May 3, 2024):

After update:

ollama -v
ollama version is 0.1.33

The same problem:

.time=2024-05-03T00:51:14.618Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get "http://127.0.0.1:43461/health": dial tcp 127.0.0.1:43461: i/o timeout"
time=2024-05-03T00:51:14.819Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding"
time=2024-05-03T00:51:15.020Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get "http://127.0.0.1:43461/health": dial tcp 127.0.0.1:43461: i/o timeout"
time=2024-05-03T00:51:15.221Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding"

Full log:

time=2024-05-03T00:51:10.287Z level=INFO source=gpu.go:96 msg="Detecting GPUs"
time=2024-05-03T00:51:10.287Z level=DEBUG source=gpu.go:203 msg="Searching for GPU library" name=libcudart.so*
time=2024-05-03T00:51:10.287Z level=DEBUG source=gpu.go:221 msg="gpu library search" globs="[/tmp/ollama3982970041/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers//libcudart.so /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so* /home/dev/libcudart.so**]"
time=2024-05-03T00:51:10.289Z level=DEBUG source=gpu.go:249 msg="discovered GPU libraries" paths=[/tmp/ollama3982970041/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-05-03T00:51:10.290Z level=DEBUG source=gpu.go:261 msg="Unable to load cudart" library=/tmp/ollama3982970041/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama"
time=2024-05-03T00:51:10.290Z level=INFO source=cpu_common.go:15 msg="CPU has AVX"
time=2024-05-03T00:51:10.290Z level=INFO source=amd_linux.go:46 msg="AMD Driver: 6.2.4"
time=2024-05-03T00:51:10.290Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-05-03T00:51:10.290Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-05-03T00:51:10.290Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-05-03T00:51:10.290Z level=INFO source=amd_linux.go:217 msg="amdgpu memory" gpu=0 total="16368.0 MiB"
time=2024-05-03T00:51:10.290Z level=INFO source=amd_linux.go:218 msg="amdgpu memory" gpu=0 available="16360.1 MiB"
time=2024-05-03T00:51:10.290Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-05-03T00:51:10.291Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/local/bin/rocm"
time=2024-05-03T00:51:10.291Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/share/ollama/lib/rocm"
time=2024-05-03T00:51:10.295Z level=DEBUG source=amd_linux.go:267 msg="rocm supported GPUs" types="[gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]"
time=2024-05-03T00:51:10.295Z level=INFO source=amd_linux.go:276 msg="amdgpu is supported" gpu=0 gpu_type=gfx900
time=2024-05-03T00:51:10.295Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/2/properties"
time=2024-05-03T00:51:10.295Z level=INFO source=amd_linux.go:217 msg="amdgpu memory" gpu=1 total="16368.0 MiB"
time=2024-05-03T00:51:10.295Z level=INFO source=amd_linux.go:218 msg="amdgpu memory" gpu=1 available="16360.1 MiB"
time=2024-05-03T00:51:10.295Z level=INFO source=amd_linux.go:276 msg="amdgpu is supported" gpu=1 gpu_type=gfx900
time=2024-05-03T00:51:10.295Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/3/properties"
time=2024-05-03T00:51:10.295Z level=INFO source=amd_linux.go:217 msg="amdgpu memory" gpu=2 total="16368.0 MiB"
time=2024-05-03T00:51:10.295Z level=INFO source=amd_linux.go:218 msg="amdgpu memory" gpu=2 available="16360.1 MiB"
time=2024-05-03T00:51:10.295Z level=INFO source=amd_linux.go:276 msg="amdgpu is supported" gpu=2 gpu_type=gfx900
time=2024-05-03T00:51:10.295Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/4/properties"
time=2024-05-03T00:51:10.296Z level=INFO source=amd_linux.go:217 msg="amdgpu memory" gpu=3 total="16368.0 MiB"
time=2024-05-03T00:51:10.296Z level=INFO source=amd_linux.go:218 msg="amdgpu memory" gpu=3 available="16360.1 MiB"
time=2024-05-03T00:51:10.296Z level=INFO source=amd_linux.go:276 msg="amdgpu is supported" gpu=3 gpu_type=gfx900
time=2024-05-03T00:51:10.296Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/5/properties"
time=2024-05-03T00:51:10.296Z level=INFO source=amd_linux.go:217 msg="amdgpu memory" gpu=4 total="16368.0 MiB"
time=2024-05-03T00:51:10.296Z level=INFO source=amd_linux.go:218 msg="amdgpu memory" gpu=4 available="16360.1 MiB"
time=2024-05-03T00:51:10.296Z level=INFO source=amd_linux.go:276 msg="amdgpu is supported" gpu=4 gpu_type=gfx900
time=2024-05-03T00:51:10.296Z level=DEBUG source=gguf.go:57 msg="model = &llm.gguf{containerGGUF:(llm.containerGGUF)(0xc00047ddc0), kv:llm.KV{}, tensors:[]llm.Tensor(nil), parameters:0x0}"
time=2024-05-03T00:51:10.699Z level=DEBUG source=sched.go:162 msg="loading first model" model=/home/dev/.ollama/models/blobs/sha256-4fed7364ee3e0c7cb4fe0880148bfdfcd1b630981efa0802a6b62ee52e7da97e
time=2024-05-03T00:51:10.699Z level=DEBUG source=memory.go:64 msg=evaluating library=rocm gpu_count=1 available="16360.1 MiB"
time=2024-05-03T00:51:10.700Z level=INFO source=memory.go:152 msg="offload to gpu" layers.real=-1 layers.estimate=33 memory.available="16360.1 MiB" memory.required.full="3538.9 MiB" memory.required.partial="3538.9 MiB" memory.required.kv="768.0 MiB" memory.weights.total="2157.9 MiB" memory.weights.repeating="2080.9 MiB" memory.weights.nonrepeating="77.1 MiB" memory.graph.full="156.0 MiB" memory.graph.partial="175.1 MiB"
time=2024-05-03T00:51:10.700Z level=DEBUG source=sched.go:508 msg="new model will fit in available VRAM in single GPU, loading" model=/home/dev/.ollama/models/blobs/sha256-4fed7364ee3e0c7cb4fe0880148bfdfcd1b630981efa0802a6b62ee52e7da97e gpu=0 available=17154772992 required="3538.9 MiB"
time=2024-05-03T00:51:10.700Z level=DEBUG source=memory.go:64 msg=evaluating library=rocm gpu_count=1 available="16360.1 MiB"
time=2024-05-03T00:51:10.701Z level=INFO source=memory.go:152 msg="offload to gpu" layers.real=-1 layers.estimate=33 memory.available="16360.1 MiB" memory.required.full="3538.9 MiB" memory.required.partial="3538.9 MiB" memory.required.kv="768.0 MiB" memory.weights.total="2157.9 MiB" memory.weights.repeating="2080.9 MiB" memory.weights.nonrepeating="77.1 MiB" memory.graph.full="156.0 MiB" memory.graph.partial="175.1 MiB"
time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cpu
time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cpu_avx
time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cpu_avx2
time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cuda_v11
time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/rocm_v60002
time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cpu
time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cpu_avx
time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cpu_avx2
time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cuda_v11
time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/rocm_v60002
time=2024-05-03T00:51:10.701Z level=INFO source=cpu_common.go:15 msg="CPU has AVX"
time=2024-05-03T00:51:10.702Z level=INFO source=server.go:289 msg="starting llama server" cmd="/tmp/ollama3982970041/runners/rocm_v60002/ollama_llama_server --model /home/dev/.ollama/models/blobs/sha256-4fed7364ee3e0c7cb4fe0880148bfdfcd1b630981efa0802a6b62ee52e7da97e --ctx-size 2048 --batch-size 512 --embedding --log-format json --n-gpu-layers 33 --verbose --parallel 1 --port 43461"
time=2024-05-03T00:51:10.703Z level=DEBUG source=server.go:291 msg=subprocess environment="[OLLAMA_DEBUG=1 SHELL=/bin/bash PWD=/home/dev LOGNAME=dev XDG_SESSION_TYPE=tty MOTD_SHOWN=pam HOME=/home/dev LANG=en_US.UTF-8 LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:
.tar=01;31:
.tgz=01;31:.arc=01;31:.arj=01;31:.taz=01;31:.lha=01;31:.lz4=01;31:.lzh=01;31:.lzma=01;31:.tlz=01;31:.txz=01;31:.tzo=01;31:.t7z=01;31:.zip=01;31:.z=01;31:.dz=01;31:.gz=01;31:.lrz=01;31:.lz=01;31:.lzo=01;31:.xz=01;31:.zst=01;31:.tzst=01;31:.bz2=01;31:.bz=01;31:.tbz=01;31:.tbz2=01;31:.tz=01;31:.deb=01;31:.rpm=01;31:.jar=01;31:.war=01;31:.ear=01;31:.sar=01;31:.rar=01;31:.alz=01;31:.ace=01;31:.zoo=01;31:.cpio=01;31:.7z=01;31:.rz=01;31:.cab=01;31:.wim=01;31:.swm=01;31:.dwm=01;31:.esd=01;31:.jpg=01;35:.jpeg=01;35:.mjpg=01;35:.mjpeg=01;35:.gif=01;35:.bmp=01;35:.pbm=01;35:.pgm=01;35:.ppm=01;35:.tga=01;35:.xbm=01;35:.xpm=01;35:.tif=01;35:.tiff=01;35:.png=01;35:.svg=01;35:.svgz=01;35:.mng=01;35:.pcx=01;35:.mov=01;35:.mpg=01;35:.mpeg=01;35:.m2v=01;35:.mkv=01;35:.webm=01;35:.webp=01;35:.ogm=01;35:.mp4=01;35:.m4v=01;35:.mp4v=01;35:.vob=01;35:.qt=01;35:.nuv=01;35:.wmv=01;35:.asf=01;35:.rm=01;35:.rmvb=01;35:.flc=01;35:.avi=01;35:.fli=01;35:.flv=01;35:.gl=01;35:.dl=01;35:.xcf=01;35:.xwd=01;35:.yuv=01;35:.cgm=01;35:.emf=01;35:.ogv=01;35:.ogx=01;35:.aac=00;36:.au=00;36:.flac=00;36:.m4a=00;36:.mid=00;36:.midi=00;36:.mka=00;36:.mp3=00;36:.mpc=00;36:.ogg=00;36:.ra=00;36:.wav=00;36:.oga=00;36:.opus=00;36:.spx=00;36:.xspf=00;36: LC_TERMINAL=iTerm2 SSH_CONNECTION=100.85.70.120 54311 100.106.7.66 22 LESSCLOSE=/usr/bin/lesspipe %s %s XDG_SESSION_CLASS=user TERM=xterm-256color LESSOPEN=| /usr/bin/lesspipe %s USER=dev LC_TERMINAL_VERSION=3.4.23 SHLVL=1 XDG_SESSION_ID=55 XDG_RUNTIME_DIR=/run/user/1000 SSH_CLIENT=100.85.70.120 54311 22 XDG_DATA_DIRS=/usr/local/share:/usr/share:/var/lib/snapd/desktop PATH=/home/dev/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/go/bin DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus SSH_TTY=/dev/pts/1 _=/usr/local/bin/ollama LD_LIBRARY_PATH=/usr/share/ollama/lib/rocm:/tmp/ollama3982970041/runners/rocm_v60002 HIP_VISIBLE_DEVICES=0]"
time=2024-05-03T00:51:10.703Z level=INFO source=sched.go:340 msg="loaded runners" count=1
time=2024-05-03T00:51:10.704Z level=INFO source=server.go:432 msg="waiting for llama runner to start responding"
{"function":"server_params_parse","level":"WARN","line":2497,"msg":"server.cpp is not built with verbose logging.","tid":"140112024857664","timestamp":1714697470}
time=2024-05-03T00:51:10.755Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get "http://127.0.0.1:43461/health": dial tcp 127.0.0.1:43461: connect: connection refused"
{"build":1,"commit":"952d03d","function":"main","level":"INFO","line":2823,"msg":"build info","tid":"140112024857664","timestamp":1714697470}
{"function":"main","level":"INFO","line":2830,"msg":"system info","n_threads":6,"n_threads_batch":-1,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | ","tid":"140112024857664","timestamp":1714697470,"total_threads":12}
llama_model_loader: loaded meta data with 25 key-value pairs and 291 tensors from /home/dev/.ollama/models/blobs/sha256-4fed7364ee3e0c7cb4fe0880148bfdfcd1b630981efa0802a6b62ee52e7da97e (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = LLaMA v2
llama_model_loader: - kv 2: llama.vocab_size u32 = 32064
llama_model_loader: - kv 3: llama.context_length u32 = 4096
llama_model_loader: - kv 4: llama.embedding_length u32 = 3072
llama_model_loader: - kv 5: llama.block_count u32 = 32
llama_model_loader: - kv 6: llama.feed_forward_length u32 = 8192
llama_model_loader: - kv 7: llama.rope.dimension_count u32 = 96
llama_model_loader: - kv 8: llama.attention.head_count u32 = 32
llama_model_loader: - kv 9: llama.attention.head_count_kv u32 = 32
llama_model_loader: - kv 10: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 11: llama.rope.freq_base f32 = 10000.000000
llama_model_loader: - kv 12: general.file_type u32 = 15
llama_model_loader: - kv 13: tokenizer.ggml.model str = llama
llama_model_loader: - kv 14: tokenizer.ggml.tokens arr[str,32064] = ["", "", "", "<0x00>", "<...
llama_model_loader: - kv 15: tokenizer.ggml.scores arr[f32,32064] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,32064] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 32000
llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 32000
llama_model_loader: - kv 21: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 22: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 23: tokenizer.chat_template str = {{ bos_token }}{% for message in mess...
llama_model_loader: - kv 24: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q4_K: 193 tensors
llama_model_loader: - type q6_K: 33 tensors
llm_load_vocab: special tokens definition check successful ( 323/32064 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32064
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 4096
llm_load_print_meta: n_embd = 3072
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 32
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 96
llm_load_print_meta: n_embd_head_k = 96
llm_load_print_meta: n_embd_head_v = 96
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: n_embd_k_gqa = 3072
llm_load_print_meta: n_embd_v_gqa = 3072
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 8192
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 4096
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = Q4_K - Medium
llm_load_print_meta: model params = 3.82 B
llm_load_print_meta: model size = 2.16 GiB (4.85 BPW)
llm_load_print_meta: general.name = LLaMA v2
llm_load_print_meta: BOS token = 1 ''
llm_load_print_meta: EOS token = 32000 '<|endoftext|>'
llm_load_print_meta: UNK token = 0 ''
llm_load_print_meta: PAD token = 32000 '<|endoftext|>'
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_print_meta: EOT token = 32007 '<|end|>'
time=2024-05-03T00:51:11.005Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding"
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 ROCm devices:
Device 0: Radeon Pro WX 9100, compute capability 9.0, VMM: no
llm_load_tensors: ggml ctx size = 0.30 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors: ROCm0 buffer size = 2157.94 MiB
llm_load_tensors: CPU buffer size = 52.84 MiB
.time=2024-05-03T00:51:14.618Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get "http://127.0.0.1:43461/health": dial tcp 127.0.0.1:43461: i/o timeout"
time=2024-05-03T00:51:14.819Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding"
time=2024-05-03T00:51:15.020Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get "http://127.0.0.1:43461/health": dial tcp 127.0.0.1:43461: i/o timeout"
time=2024-05-03T00:51:15.221Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding"
time=2024-05-03T00:51:15.823Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get "http://127.0.0.1:43461/health": dial tcp 127.0.0.1:43461: i/o timeout"
time=2024-05-03T00:51:16.024Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding"
time=2024-05-03T00:51:18.033Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get "http://127.0.0.1:43461/health": dial tcp 127.0.0.1:43461: i/o timeout"
time=2024-05-03T00:51:18.234Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding"
time=2024-05-03T00:51:18.636Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get "http://127.0.0.1:43461/health": dial tcp 127.0.0.1:43461: i/o timeout"
time=2024-05-03T00:51:18.836Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding"

<!-- gh-comment-id:2091967271 --> @ZanMax commented on GitHub (May 3, 2024): **After update:** > ollama -v > ollama version is 0.1.33 **The same problem:** > .time=2024-05-03T00:51:14.618Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:43461/health\": dial tcp 127.0.0.1:43461: i/o timeout" > time=2024-05-03T00:51:14.819Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding" > time=2024-05-03T00:51:15.020Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:43461/health\": dial tcp 127.0.0.1:43461: i/o timeout" > time=2024-05-03T00:51:15.221Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding" **Full log:** > time=2024-05-03T00:51:10.287Z level=INFO source=gpu.go:96 msg="Detecting GPUs" > time=2024-05-03T00:51:10.287Z level=DEBUG source=gpu.go:203 msg="Searching for GPU library" name=libcudart.so* > time=2024-05-03T00:51:10.287Z level=DEBUG source=gpu.go:221 msg="gpu library search" globs="[/tmp/ollama3982970041/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so* /home/dev/libcudart.so**]" > time=2024-05-03T00:51:10.289Z level=DEBUG source=gpu.go:249 msg="discovered GPU libraries" paths=[/tmp/ollama3982970041/runners/cuda_v11/libcudart.so.11.0] > cudaSetDevice err: 35 > time=2024-05-03T00:51:10.290Z level=DEBUG source=gpu.go:261 msg="Unable to load cudart" library=/tmp/ollama3982970041/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" > time=2024-05-03T00:51:10.290Z level=INFO source=cpu_common.go:15 msg="CPU has AVX" > time=2024-05-03T00:51:10.290Z level=INFO source=amd_linux.go:46 msg="AMD Driver: 6.2.4" > time=2024-05-03T00:51:10.290Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" > time=2024-05-03T00:51:10.290Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" > time=2024-05-03T00:51:10.290Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" > time=2024-05-03T00:51:10.290Z level=INFO source=amd_linux.go:217 msg="amdgpu memory" gpu=0 total="16368.0 MiB" > time=2024-05-03T00:51:10.290Z level=INFO source=amd_linux.go:218 msg="amdgpu memory" gpu=0 available="16360.1 MiB" > time=2024-05-03T00:51:10.290Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" > time=2024-05-03T00:51:10.291Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/local/bin/rocm" > time=2024-05-03T00:51:10.291Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/share/ollama/lib/rocm" > time=2024-05-03T00:51:10.295Z level=DEBUG source=amd_linux.go:267 msg="rocm supported GPUs" types="[gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]" > time=2024-05-03T00:51:10.295Z level=INFO source=amd_linux.go:276 msg="amdgpu is supported" gpu=0 gpu_type=gfx900 > time=2024-05-03T00:51:10.295Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/2/properties" > time=2024-05-03T00:51:10.295Z level=INFO source=amd_linux.go:217 msg="amdgpu memory" gpu=1 total="16368.0 MiB" > time=2024-05-03T00:51:10.295Z level=INFO source=amd_linux.go:218 msg="amdgpu memory" gpu=1 available="16360.1 MiB" > time=2024-05-03T00:51:10.295Z level=INFO source=amd_linux.go:276 msg="amdgpu is supported" gpu=1 gpu_type=gfx900 > time=2024-05-03T00:51:10.295Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/3/properties" > time=2024-05-03T00:51:10.295Z level=INFO source=amd_linux.go:217 msg="amdgpu memory" gpu=2 total="16368.0 MiB" > time=2024-05-03T00:51:10.295Z level=INFO source=amd_linux.go:218 msg="amdgpu memory" gpu=2 available="16360.1 MiB" > time=2024-05-03T00:51:10.295Z level=INFO source=amd_linux.go:276 msg="amdgpu is supported" gpu=2 gpu_type=gfx900 > time=2024-05-03T00:51:10.295Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/4/properties" > time=2024-05-03T00:51:10.296Z level=INFO source=amd_linux.go:217 msg="amdgpu memory" gpu=3 total="16368.0 MiB" > time=2024-05-03T00:51:10.296Z level=INFO source=amd_linux.go:218 msg="amdgpu memory" gpu=3 available="16360.1 MiB" > time=2024-05-03T00:51:10.296Z level=INFO source=amd_linux.go:276 msg="amdgpu is supported" gpu=3 gpu_type=gfx900 > time=2024-05-03T00:51:10.296Z level=DEBUG source=amd_linux.go:78 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/5/properties" > time=2024-05-03T00:51:10.296Z level=INFO source=amd_linux.go:217 msg="amdgpu memory" gpu=4 total="16368.0 MiB" > time=2024-05-03T00:51:10.296Z level=INFO source=amd_linux.go:218 msg="amdgpu memory" gpu=4 available="16360.1 MiB" > time=2024-05-03T00:51:10.296Z level=INFO source=amd_linux.go:276 msg="amdgpu is supported" gpu=4 gpu_type=gfx900 > time=2024-05-03T00:51:10.296Z level=DEBUG source=gguf.go:57 msg="model = &llm.gguf{containerGGUF:(*llm.containerGGUF)(0xc00047ddc0), kv:llm.KV{}, tensors:[]*llm.Tensor(nil), parameters:0x0}" > time=2024-05-03T00:51:10.699Z level=DEBUG source=sched.go:162 msg="loading first model" model=/home/dev/.ollama/models/blobs/sha256-4fed7364ee3e0c7cb4fe0880148bfdfcd1b630981efa0802a6b62ee52e7da97e > time=2024-05-03T00:51:10.699Z level=DEBUG source=memory.go:64 msg=evaluating library=rocm gpu_count=1 available="16360.1 MiB" > time=2024-05-03T00:51:10.700Z level=INFO source=memory.go:152 msg="offload to gpu" layers.real=-1 layers.estimate=33 memory.available="16360.1 MiB" memory.required.full="3538.9 MiB" memory.required.partial="3538.9 MiB" memory.required.kv="768.0 MiB" memory.weights.total="2157.9 MiB" memory.weights.repeating="2080.9 MiB" memory.weights.nonrepeating="77.1 MiB" memory.graph.full="156.0 MiB" memory.graph.partial="175.1 MiB" > time=2024-05-03T00:51:10.700Z level=DEBUG source=sched.go:508 msg="new model will fit in available VRAM in single GPU, loading" model=/home/dev/.ollama/models/blobs/sha256-4fed7364ee3e0c7cb4fe0880148bfdfcd1b630981efa0802a6b62ee52e7da97e gpu=0 available=17154772992 required="3538.9 MiB" > time=2024-05-03T00:51:10.700Z level=DEBUG source=memory.go:64 msg=evaluating library=rocm gpu_count=1 available="16360.1 MiB" > time=2024-05-03T00:51:10.701Z level=INFO source=memory.go:152 msg="offload to gpu" layers.real=-1 layers.estimate=33 memory.available="16360.1 MiB" memory.required.full="3538.9 MiB" memory.required.partial="3538.9 MiB" memory.required.kv="768.0 MiB" memory.weights.total="2157.9 MiB" memory.weights.repeating="2080.9 MiB" memory.weights.nonrepeating="77.1 MiB" memory.graph.full="156.0 MiB" memory.graph.partial="175.1 MiB" > time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cpu > time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cpu_avx > time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cpu_avx2 > time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cuda_v11 > time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/rocm_v60002 > time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cpu > time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cpu_avx > time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cpu_avx2 > time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/cuda_v11 > time=2024-05-03T00:51:10.701Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3982970041/runners/rocm_v60002 > time=2024-05-03T00:51:10.701Z level=INFO source=cpu_common.go:15 msg="CPU has AVX" > time=2024-05-03T00:51:10.702Z level=INFO source=server.go:289 msg="starting llama server" cmd="/tmp/ollama3982970041/runners/rocm_v60002/ollama_llama_server --model /home/dev/.ollama/models/blobs/sha256-4fed7364ee3e0c7cb4fe0880148bfdfcd1b630981efa0802a6b62ee52e7da97e --ctx-size 2048 --batch-size 512 --embedding --log-format json --n-gpu-layers 33 --verbose --parallel 1 --port 43461" > time=2024-05-03T00:51:10.703Z level=DEBUG source=server.go:291 msg=subprocess environment="[OLLAMA_DEBUG=1 SHELL=/bin/bash PWD=/home/dev LOGNAME=dev XDG_SESSION_TYPE=tty MOTD_SHOWN=pam HOME=/home/dev LANG=en_US.UTF-8 LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36: LC_TERMINAL=iTerm2 SSH_CONNECTION=100.85.70.120 54311 100.106.7.66 22 LESSCLOSE=/usr/bin/lesspipe %s %s XDG_SESSION_CLASS=user TERM=xterm-256color LESSOPEN=| /usr/bin/lesspipe %s USER=dev LC_TERMINAL_VERSION=3.4.23 SHLVL=1 XDG_SESSION_ID=55 XDG_RUNTIME_DIR=/run/user/1000 SSH_CLIENT=100.85.70.120 54311 22 XDG_DATA_DIRS=/usr/local/share:/usr/share:/var/lib/snapd/desktop PATH=/home/dev/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/go/bin DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus SSH_TTY=/dev/pts/1 _=/usr/local/bin/ollama LD_LIBRARY_PATH=/usr/share/ollama/lib/rocm:/tmp/ollama3982970041/runners/rocm_v60002 HIP_VISIBLE_DEVICES=0]" > time=2024-05-03T00:51:10.703Z level=INFO source=sched.go:340 msg="loaded runners" count=1 > time=2024-05-03T00:51:10.704Z level=INFO source=server.go:432 msg="waiting for llama runner to start responding" > {"function":"server_params_parse","level":"WARN","line":2497,"msg":"server.cpp is not built with verbose logging.","tid":"140112024857664","timestamp":1714697470} > time=2024-05-03T00:51:10.755Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:43461/health\": dial tcp 127.0.0.1:43461: connect: connection refused" > {"build":1,"commit":"952d03d","function":"main","level":"INFO","line":2823,"msg":"build info","tid":"140112024857664","timestamp":1714697470} > {"function":"main","level":"INFO","line":2830,"msg":"system info","n_threads":6,"n_threads_batch":-1,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | ","tid":"140112024857664","timestamp":1714697470,"total_threads":12} > llama_model_loader: loaded meta data with 25 key-value pairs and 291 tensors from /home/dev/.ollama/models/blobs/sha256-4fed7364ee3e0c7cb4fe0880148bfdfcd1b630981efa0802a6b62ee52e7da97e (version GGUF V3 (latest)) > llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. > llama_model_loader: - kv 0: general.architecture str = llama > llama_model_loader: - kv 1: general.name str = LLaMA v2 > llama_model_loader: - kv 2: llama.vocab_size u32 = 32064 > llama_model_loader: - kv 3: llama.context_length u32 = 4096 > llama_model_loader: - kv 4: llama.embedding_length u32 = 3072 > llama_model_loader: - kv 5: llama.block_count u32 = 32 > llama_model_loader: - kv 6: llama.feed_forward_length u32 = 8192 > llama_model_loader: - kv 7: llama.rope.dimension_count u32 = 96 > llama_model_loader: - kv 8: llama.attention.head_count u32 = 32 > llama_model_loader: - kv 9: llama.attention.head_count_kv u32 = 32 > llama_model_loader: - kv 10: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 > llama_model_loader: - kv 11: llama.rope.freq_base f32 = 10000.000000 > llama_model_loader: - kv 12: general.file_type u32 = 15 > llama_model_loader: - kv 13: tokenizer.ggml.model str = llama > llama_model_loader: - kv 14: tokenizer.ggml.tokens arr[str,32064] = ["<unk>", "<s>", "</s>", "<0x00>", "<... > llama_model_loader: - kv 15: tokenizer.ggml.scores arr[f32,32064] = [0.000000, 0.000000, 0.000000, 0.0000... > llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,32064] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... > llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 1 > llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 32000 > llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 0 > llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 32000 > llama_model_loader: - kv 21: tokenizer.ggml.add_bos_token bool = true > llama_model_loader: - kv 22: tokenizer.ggml.add_eos_token bool = false > llama_model_loader: - kv 23: tokenizer.chat_template str = {{ bos_token }}{% for message in mess... > llama_model_loader: - kv 24: general.quantization_version u32 = 2 > llama_model_loader: - type f32: 65 tensors > llama_model_loader: - type q4_K: 193 tensors > llama_model_loader: - type q6_K: 33 tensors > llm_load_vocab: special tokens definition check successful ( 323/32064 ). > llm_load_print_meta: format = GGUF V3 (latest) > llm_load_print_meta: arch = llama > llm_load_print_meta: vocab type = SPM > llm_load_print_meta: n_vocab = 32064 > llm_load_print_meta: n_merges = 0 > llm_load_print_meta: n_ctx_train = 4096 > llm_load_print_meta: n_embd = 3072 > llm_load_print_meta: n_head = 32 > llm_load_print_meta: n_head_kv = 32 > llm_load_print_meta: n_layer = 32 > llm_load_print_meta: n_rot = 96 > llm_load_print_meta: n_embd_head_k = 96 > llm_load_print_meta: n_embd_head_v = 96 > llm_load_print_meta: n_gqa = 1 > llm_load_print_meta: n_embd_k_gqa = 3072 > llm_load_print_meta: n_embd_v_gqa = 3072 > llm_load_print_meta: f_norm_eps = 0.0e+00 > llm_load_print_meta: f_norm_rms_eps = 1.0e-05 > llm_load_print_meta: f_clamp_kqv = 0.0e+00 > llm_load_print_meta: f_max_alibi_bias = 0.0e+00 > llm_load_print_meta: f_logit_scale = 0.0e+00 > llm_load_print_meta: n_ff = 8192 > llm_load_print_meta: n_expert = 0 > llm_load_print_meta: n_expert_used = 0 > llm_load_print_meta: causal attn = 1 > llm_load_print_meta: pooling type = 0 > llm_load_print_meta: rope type = 0 > llm_load_print_meta: rope scaling = linear > llm_load_print_meta: freq_base_train = 10000.0 > llm_load_print_meta: freq_scale_train = 1 > llm_load_print_meta: n_yarn_orig_ctx = 4096 > llm_load_print_meta: rope_finetuned = unknown > llm_load_print_meta: ssm_d_conv = 0 > llm_load_print_meta: ssm_d_inner = 0 > llm_load_print_meta: ssm_d_state = 0 > llm_load_print_meta: ssm_dt_rank = 0 > llm_load_print_meta: model type = 7B > llm_load_print_meta: model ftype = Q4_K - Medium > llm_load_print_meta: model params = 3.82 B > llm_load_print_meta: model size = 2.16 GiB (4.85 BPW) > llm_load_print_meta: general.name = LLaMA v2 > llm_load_print_meta: BOS token = 1 '<s>' > llm_load_print_meta: EOS token = 32000 '<|endoftext|>' > llm_load_print_meta: UNK token = 0 '<unk>' > llm_load_print_meta: PAD token = 32000 '<|endoftext|>' > llm_load_print_meta: LF token = 13 '<0x0A>' > llm_load_print_meta: EOT token = 32007 '<|end|>' > time=2024-05-03T00:51:11.005Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding" > ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no > ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes > ggml_cuda_init: found 1 ROCm devices: > Device 0: Radeon Pro WX 9100, compute capability 9.0, VMM: no > llm_load_tensors: ggml ctx size = 0.30 MiB > llm_load_tensors: offloading 32 repeating layers to GPU > llm_load_tensors: offloading non-repeating layers to GPU > llm_load_tensors: offloaded 33/33 layers to GPU > llm_load_tensors: ROCm0 buffer size = 2157.94 MiB > llm_load_tensors: CPU buffer size = 52.84 MiB > .time=2024-05-03T00:51:14.618Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:43461/health\": dial tcp 127.0.0.1:43461: i/o timeout" > time=2024-05-03T00:51:14.819Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding" > time=2024-05-03T00:51:15.020Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:43461/health\": dial tcp 127.0.0.1:43461: i/o timeout" > time=2024-05-03T00:51:15.221Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding" > time=2024-05-03T00:51:15.823Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:43461/health\": dial tcp 127.0.0.1:43461: i/o timeout" > time=2024-05-03T00:51:16.024Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding" > time=2024-05-03T00:51:18.033Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:43461/health\": dial tcp 127.0.0.1:43461: i/o timeout" > time=2024-05-03T00:51:18.234Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding" > time=2024-05-03T00:51:18.636Z level=DEBUG source=server.go:466 msg="server not yet available" error="health resp: Get \"http://127.0.0.1:43461/health\": dial tcp 127.0.0.1:43461: i/o timeout" > time=2024-05-03T00:51:18.836Z level=DEBUG source=server.go:466 msg="server not yet available" error="server not responding" >
Author
Owner

@ZanMax commented on GitHub (May 3, 2024):

I can provide access to the server for debugging if it's necessary.

<!-- gh-comment-id:2093458066 --> @ZanMax commented on GitHub (May 3, 2024): I can provide access to the server for debugging if it's necessary.
Author
Owner

@dhiltgen commented on GitHub (May 4, 2024):

From the log it looks like we're correctly discovering the 4 GPUs, we determine the model can fit in a single GPU, and while loading the model, it seems to be getting suck somewhere in the low-level ROCm logic.

It's possible this might be a variation on #3840 just with a slightly different failure mode. It appears ROCm may have some issues with the gfx900 GPUs possibly...

<!-- gh-comment-id:2094398039 --> @dhiltgen commented on GitHub (May 4, 2024): From the log it looks like we're correctly discovering the 4 GPUs, we determine the model can fit in a single GPU, and while loading the model, it seems to be getting suck somewhere in the low-level ROCm logic. It's possible this might be a variation on #3840 just with a slightly different failure mode. It appears ROCm may have some issues with the gfx900 GPUs possibly...
Author
Owner

@ZanMax commented on GitHub (May 6, 2024):

In my case llama.cpp works great.
I run 5 processes of llama.cpp each on a different GPU.
Thank you for your help.
I think it's not critical to support so old hardware.

<!-- gh-comment-id:2096835576 --> @ZanMax commented on GitHub (May 6, 2024): In my case llama.cpp works great. I run 5 processes of llama.cpp each on a different GPU. Thank you for your help. I think it's not critical to support so old hardware.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#28193