[GH-ISSUE #3243] Support Steam Deck Docker amdgpu - gfx1033 #64037

Open
opened 2026-05-03 15:55:59 -05:00 by GiteaMirror · 43 comments
Owner

Originally created by @FairyTail2000 on GitHub (Mar 19, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3243

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

Steam Deck GPU not supported (apperantly)

Logs:

time=2024-03-19T11:24:28.162Z level=INFO source=images.go:806 msg="total blobs: 54"
time=2024-03-19T11:24:28.420Z level=INFO source=images.go:813 msg="total unused blobs removed: 54"
time=2024-03-19T11:24:28.420Z level=INFO source=routes.go:1110 msg="Listening on [::]:11434 (version 0.1.29)"
time=2024-03-19T11:24:28.422Z level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama2388187498/runners ..."
time=2024-03-19T11:24:32.228Z level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [cpu rocm_v60000 cpu_avx2 cuda_v11 cpu_avx]"
time=2024-03-19T11:24:32.228Z level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-19T11:24:32.229Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-19T11:24:32.240Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []"
time=2024-03-19T11:24:32.240Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-19T11:24:32.240Z level=WARN source=amd_linux.go:53 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-03-19T11:24:32.240Z level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1033]"
time=2024-03-19T11:24:32.262Z level=WARN source=amd_linux.go:114 msg="amdgpu [0] gfx1033 is not supported by /tmp/ollama2388187498/rocm [gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]"
time=2024-03-19T11:24:32.262Z level=WARN source=amd_linux.go:116 msg="See https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for HSA_OVERRIDE_GFX_VERSION usage"
time=2024-03-19T11:24:32.262Z level=INFO source=amd_linux.go:127 msg="all detected amdgpus are skipped, falling back to CPU"
time=2024-03-19T11:24:32.262Z level=INFO source=routes.go:1133 msg="no GPU detected"

What did you expect to see?

Steam Deck GPU being supported

Steps to reproduce

Start the official rocm docker container on the steam deck

docker run --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:rocm

Are there any recent changes that introduced the issue?

No response

OS

Linux

Architecture

amd64

Platform

Docker

Ollama version

0.1.29

GPU

AMD

GPU info

ROCk module is loaded

HSA System Attributes

Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE

==========
HSA Agents


Agent 1


Name: AMD Custom APU 0932
Uuid: CPU-XX
Marketing Name: AMD Custom APU 0932
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2800
BDFID: 0
Internal Node ID: 0
Compute Unit: 8
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 12073356(0xb8398c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 12073356(0xb8398c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 12073356(0xb8398c) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:


Agent 2


Name: gfx1033
Uuid: GPU-XX
Marketing Name: AMD Radeon Graphics
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 1024(0x400) KB
Chip ID: 5173(0x1435)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 0
BDFID: 1024
Internal Node ID: 1
Compute Unit: 8
SIMDs per CU: 2
Shader Engines: 1
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 4194304(0x400000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1033
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***

CPU

AMD

Other software

No response

Originally created by @FairyTail2000 on GitHub (Mar 19, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3243 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? Steam Deck GPU not supported (apperantly) Logs: > time=2024-03-19T11:24:28.162Z level=INFO source=images.go:806 msg="total blobs: 54" > time=2024-03-19T11:24:28.420Z level=INFO source=images.go:813 msg="total unused blobs removed: 54" > time=2024-03-19T11:24:28.420Z level=INFO source=routes.go:1110 msg="Listening on [::]:11434 (version 0.1.29)" > time=2024-03-19T11:24:28.422Z level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama2388187498/runners ..." > time=2024-03-19T11:24:32.228Z level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [cpu rocm_v60000 cpu_avx2 cuda_v11 cpu_avx]" > time=2024-03-19T11:24:32.228Z level=INFO source=gpu.go:77 msg="Detecting GPU type" > time=2024-03-19T11:24:32.229Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" > time=2024-03-19T11:24:32.240Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []" > time=2024-03-19T11:24:32.240Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" > time=2024-03-19T11:24:32.240Z level=WARN source=amd_linux.go:53 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" > time=2024-03-19T11:24:32.240Z level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1033]" > time=2024-03-19T11:24:32.262Z level=WARN source=amd_linux.go:114 msg="amdgpu [0] gfx1033 is not supported by /tmp/ollama2388187498/rocm [gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]" > time=2024-03-19T11:24:32.262Z level=WARN source=amd_linux.go:116 msg="See https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for HSA_OVERRIDE_GFX_VERSION usage" > time=2024-03-19T11:24:32.262Z level=INFO source=amd_linux.go:127 msg="all detected amdgpus are skipped, falling back to CPU" > time=2024-03-19T11:24:32.262Z level=INFO source=routes.go:1133 msg="no GPU detected" ### What did you expect to see? Steam Deck GPU being supported ### Steps to reproduce Start the official rocm docker container on the steam deck docker run --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:rocm ### Are there any recent changes that introduced the issue? _No response_ ### OS Linux ### Architecture amd64 ### Platform Docker ### Ollama version 0.1.29 ### GPU AMD ### GPU info ROCk module is loaded ===================== HSA System Attributes ===================== Runtime Version: 1.1 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE ========== HSA Agents ========== ******* Agent 1 ******* Name: AMD Custom APU 0932 Uuid: CPU-XX Marketing Name: AMD Custom APU 0932 Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 2800 BDFID: 0 Internal Node ID: 0 Compute Unit: 8 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 12073356(0xb8398c) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 12073356(0xb8398c) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 12073356(0xb8398c) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: ******* Agent 2 ******* Name: gfx1033 Uuid: GPU-XX Marketing Name: AMD Radeon Graphics Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 1024(0x400) KB Chip ID: 5173(0x1435) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 0 BDFID: 1024 Internal Node ID: 1 Compute Unit: 8 SIMDs per CU: 2 Shader Engines: 1 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 32(0x20) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 4194304(0x400000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx1033 Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 *** Done *** ### CPU AMD ### Other software _No response_
GiteaMirror added the amdfeature request labels 2026-05-03 15:56:00 -05:00
Author
Owner

@FairyTail2000 commented on GitHub (Mar 19, 2024):

Further testing shows that forcing gfx1030 works and is compatible. The docker line for this is:

docker run --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 -e "HSA_OVERRIDE_GFX_VERSION=gfx1030" --name ollama ollama/ollama:rocm

It might be worth to stick a note onto the docker image page explaining this. So that future users don't need to find this issue to get it working

<!-- gh-comment-id:2006956056 --> @FairyTail2000 commented on GitHub (Mar 19, 2024): Further testing shows that forcing gfx1030 works and is compatible. The docker line for this is: docker run --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 -e "HSA_OVERRIDE_GFX_VERSION=gfx1030" --name ollama ollama/ollama:rocm It might be worth to stick a note onto the docker image page explaining this. So that future users don't need to find this issue to get it working
Author
Owner

@dhiltgen commented on GitHub (Mar 20, 2024):

Glad to hear the override worked for this GPU.

<!-- gh-comment-id:2008964506 --> @dhiltgen commented on GitHub (Mar 20, 2024): Glad to hear the override worked for this GPU.
Author
Owner

@FairyTail2000 commented on GitHub (Mar 26, 2024):

@dhiltgen override no longer works on the newest beta version of steam os.

time=2024-03-26T09:00:26.199Z level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1033]"
time=2024-03-26T09:00:26.200Z level=INFO source=amd_linux.go:246 msg="[0] amdgpu totalMemory 4096M"
time=2024-03-26T09:00:26.200Z level=INFO source=amd_linux.go:247 msg="[0] amdgpu freeMemory 4096M"
time=2024-03-26T09:00:26.200Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-26T09:00:26.200Z level=WARN source=amd_linux.go:53 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-03-26T09:00:26.200Z level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1033]"
time=2024-03-26T09:00:26.201Z level=INFO source=amd_linux.go:246 msg="[0] amdgpu totalMemory 4096M"
time=2024-03-26T09:00:26.201Z level=INFO source=amd_linux.go:247 msg="[0] amdgpu freeMemory 4096M"
time=2024-03-26T09:00:26.201Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-26T09:00:26.416Z level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama632250383/runners/rocm_v60000/libext_server.so"
time=2024-03-26T09:00:26.417Z level=INFO source=dyn_ext_server.go:150 msg="Initializing llama server"
time=2024-03-26T09:00:26.432Z level=WARN source=llm.go:170 msg="Failed to load dynamic library /tmp/ollama632250383/runners/rocm_v60000/libext_server.so Unable to init GPU: invalid device ordinal"
time=2024-03-26T09:00:26.434Z level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama632250383/runners/cpu_avx2/libext_server.so"
time=2024-03-26T09:00:26.434Z level=INFO source=dyn_ext_server.go:150 msg="Initializing llama server"

<!-- gh-comment-id:2019871613 --> @FairyTail2000 commented on GitHub (Mar 26, 2024): @dhiltgen override no longer works on the newest beta version of steam os. > time=2024-03-26T09:00:26.199Z level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1033]" time=2024-03-26T09:00:26.200Z level=INFO source=amd_linux.go:246 msg="[0] amdgpu totalMemory 4096M" time=2024-03-26T09:00:26.200Z level=INFO source=amd_linux.go:247 msg="[0] amdgpu freeMemory 4096M" time=2024-03-26T09:00:26.200Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-26T09:00:26.200Z level=WARN source=amd_linux.go:53 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-03-26T09:00:26.200Z level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1033]" time=2024-03-26T09:00:26.201Z level=INFO source=amd_linux.go:246 msg="[0] amdgpu totalMemory 4096M" time=2024-03-26T09:00:26.201Z level=INFO source=amd_linux.go:247 msg="[0] amdgpu freeMemory 4096M" time=2024-03-26T09:00:26.201Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-26T09:00:26.416Z level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama632250383/runners/rocm_v60000/libext_server.so" time=2024-03-26T09:00:26.417Z level=INFO source=dyn_ext_server.go:150 msg="Initializing llama server" time=2024-03-26T09:00:26.432Z level=WARN source=llm.go:170 msg="Failed to load dynamic library /tmp/ollama632250383/runners/rocm_v60000/libext_server.so Unable to init GPU: invalid device ordinal" time=2024-03-26T09:00:26.434Z level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama632250383/runners/cpu_avx2/libext_server.so" time=2024-03-26T09:00:26.434Z level=INFO source=dyn_ext_server.go:150 msg="Initializing llama server"
Author
Owner

@5310 commented on GitHub (Mar 26, 2024):

I can't get the image to run on my still on stable 3.5.7 SteamOS either, with the override ofc.

Did you also used to get the /sys/module/amdgpu/version missing back when it did work, @FairyTail2000?

Edit: Or could you tell me what the exact version of the image worked for you, so I can try that instead of latest?

<!-- gh-comment-id:2020953126 --> @5310 commented on GitHub (Mar 26, 2024): I can't get the image to run on my still on stable 3.5.7 SteamOS either, with the override ofc. Did you also used to get the `/sys/module/amdgpu/version` missing back when it did work, @FairyTail2000? Edit: Or could you tell me what the exact version of the image worked for you, so I can try that instead of latest?
Author
Owner

@FairyTail2000 commented on GitHub (Mar 26, 2024):

@5310 Yes I did get the missing amdgpu version error.

The exact image is sha256:9b14e2877bf00cd2a24f1ee8e92512f4d1164f7c4132fffe7b55ebc7aa79d7f0. It should be the latest. I haven't changed the image since then

<!-- gh-comment-id:2020994714 --> @FairyTail2000 commented on GitHub (Mar 26, 2024): @5310 Yes I did get the missing amdgpu version error. The exact image is sha256:9b14e2877bf00cd2a24f1ee8e92512f4d1164f7c4132fffe7b55ebc7aa79d7f0. It should be the latest. I haven't changed the image since then
Author
Owner

@5310 commented on GitHub (Mar 26, 2024):

Thanks a lot! Let's see what I'm doing wrong then, I'm also on that image... 🤔

<!-- gh-comment-id:2020997411 --> @5310 commented on GitHub (Mar 26, 2024): Thanks a lot! Let's see what I'm doing wrong then, I'm also on that image... :thinking:
Author
Owner

@FairyTail2000 commented on GitHub (Apr 4, 2024):

@dhiltgen the newest rocm docker image just hangs doing nothing. Not even the cpu runner seems to be starting

https://hub.docker.com/layers/ollama/ollama/rocm/images/sha256-7986e8c813e478064978e0a17cebb93af58f27f05ca6ee613c0d3b0850048c93?context=explore

<!-- gh-comment-id:2036999922 --> @FairyTail2000 commented on GitHub (Apr 4, 2024): @dhiltgen the newest rocm docker image just hangs doing nothing. Not even the cpu runner seems to be starting https://hub.docker.com/layers/ollama/ollama/rocm/images/sha256-7986e8c813e478064978e0a17cebb93af58f27f05ca6ee613c0d3b0850048c93?context=explore
Author
Owner

@dhiltgen commented on GitHub (Apr 23, 2024):

@FairyTail2000 can you share a log of the container with debug enabled so we can see more details on the hang?

docker run ... -e OLLAMA_DEBUG=1 ...

<!-- gh-comment-id:2072905646 --> @dhiltgen commented on GitHub (Apr 23, 2024): @FairyTail2000 can you share a log of the container with debug enabled so we can see more details on the hang? `docker run ... -e OLLAMA_DEBUG=1 ...`
Author
Owner

@FairyTail2000 commented on GitHub (Apr 24, 2024):

Thanks for the response, it seems like I just misinterpreted the output and it was just waiting for requests. However it still uses the cpu, is there anyway I can add additional drivers for gpu's at runtime? Or is it more complex to add a new driver?

<!-- gh-comment-id:2074091761 --> @FairyTail2000 commented on GitHub (Apr 24, 2024): Thanks for the response, it seems like I just misinterpreted the output and it was just waiting for requests. However it still uses the cpu, is there anyway I can add additional drivers for gpu's at runtime? Or is it more complex to add a new driver?
Author
Owner

@dhiltgen commented on GitHub (Apr 28, 2024):

@FairyTail2000 are you setting the override variable as mentioned earlier in this issue? If not, that will explain why it is running on CPU. If you are setting it and it's still not working on the GPU, can you share your server log so we can investigate?

<!-- gh-comment-id:2081604586 --> @dhiltgen commented on GitHub (Apr 28, 2024): @FairyTail2000 are you setting the override variable as mentioned earlier in this issue? If not, that will explain why it is running on CPU. If you are setting it and it's still not working on the GPU, can you share your server log so we can investigate?
Author
Owner

@FairyTail2000 commented on GitHub (Apr 29, 2024):

Yes I do set the enviroment variable. Here is the command line and debug logs (after loading llama3:instruct remotely)

(deck@steamdeck ~)$ docker run --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 -e "HSA_OVERRIDE_GFX_VERSION=gfx1030" -e "OLLAMA_DEBUG=1" --name ollama ollama/ollama:0.1.30-rc4-rocm

time=2024-04-29T06:56:06.926Z level=INFO source=images.go:804 msg="total blobs: 22"
time=2024-04-29T06:56:06.945Z level=INFO source=images.go:811 msg="total unused blobs removed: 0"
time=2024-04-29T06:56:06.945Z level=INFO source=routes.go:1118 msg="Listening on [::]:11434 (version 0.1.30-rc4)"
time=2024-04-29T06:56:06.950Z level=INFO source=payload_common.go:113 msg="Extracting dynamic libraries to /tmp/ollama3355126040/runners ..."
time=2024-04-29T06:56:10.805Z level=INFO source=payload_common.go:140 msg="Dynamic LLM libraries [rocm_v60000 cpu_avx2 cpu_avx cpu cuda_v11]"
time=2024-04-29T06:56:10.805Z level=DEBUG source=payload_common.go:141 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-04-29T06:56:10.805Z level=INFO source=gpu.go:115 msg="Detecting GPU type"
time=2024-04-29T06:56:10.806Z level=INFO source=gpu.go:265 msg="Searching for GPU management library libcudart.so*"
time=2024-04-29T06:56:10.806Z level=DEBUG source=gpu.go:283 msg="gpu management search paths: [/tmp/ollama3355126040/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers//libcudart.so /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so* /opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so**]"
time=2024-04-29T06:56:10.842Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: [/tmp/ollama3355126040/runners/cuda_v11/libcudart.so.11.0]"
wiring cudart library functions in /tmp/ollama3355126040/runners/cuda_v11/libcudart.so.11.0
dlsym: cudaSetDevice
dlsym: cudaDeviceSynchronize
dlsym: cudaDeviceReset
dlsym: cudaMemGetInfo
dlsym: cudaGetDeviceCount
dlsym: cudaDeviceGetAttribute
dlsym: cudaDriverGetVersion
cudaSetDevice err: 35
time=2024-04-29T06:56:10.843Z level=INFO source=gpu.go:340 msg="Unable to load cudart CUDA management library /tmp/ollama3355126040/runners/cuda_v11/libcudart.so.11.0: cudart init failure: 35"
time=2024-04-29T06:56:10.843Z level=INFO source=gpu.go:265 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-04-29T06:56:10.843Z level=DEBUG source=gpu.go:283 msg="gpu management search paths: [/usr/local/cuda/lib64/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/libnvidia-ml.so* /usr/lib/wsl/lib/libnvidia-ml.so* /usr/lib/wsl/drivers//libnvidia-ml.so /opt/cuda/lib64/libnvidia-ml.so* /usr/lib*/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/libnvidia-ml.so* /usr/local/lib*/libnvidia-ml.so* /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so* /opt/rocm/lib/libnvidia-ml.so* /usr/local/lib/libnvidia-ml.so* /opt/rh/devtoolset-7/root/libnvidia-ml.so*]"
time=2024-04-29T06:56:10.844Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: []"
time=2024-04-29T06:56:10.844Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-04-29T06:56:10.844Z level=WARN source=amd_linux.go:53 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-04-29T06:56:10.844Z level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1033]"
time=2024-04-29T06:56:10.844Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /tmp/ollama3355126040/rocm"
time=2024-04-29T06:56:10.844Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/bin"
time=2024-04-29T06:56:10.845Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/bin/rocm"
time=2024-04-29T06:56:10.845Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/share/ollama/lib/rocm"
time=2024-04-29T06:56:10.845Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-04-29T06:56:10.846Z level=DEBUG source=amd_linux.go:279 msg="host rocm linked /opt/rocm/lib => /tmp/ollama3355126040/rocm"
time=2024-04-29T06:56:10.846Z level=DEBUG source=amd_linux.go:123 msg="skipping rocm gfx compatibility check with HSA_OVERRIDE_GFX_VERSION=gfx1030"
time=2024-04-29T06:56:10.846Z level=DEBUG source=amd_linux.go:152 msg="discovering VRAM for amdgpu devices"
time=2024-04-29T06:56:10.846Z level=DEBUG source=amd_linux.go:171 msg="amdgpu devices [0]"
time=2024-04-29T06:56:10.846Z level=INFO source=amd_linux.go:246 msg="[0] amdgpu totalMemory 4096M"
time=2024-04-29T06:56:10.846Z level=INFO source=amd_linux.go:247 msg="[0] amdgpu freeMemory 4096M"
time=2024-04-29T06:56:10.846Z level=DEBUG source=gpu.go:254 msg="rocm detected 1 devices with 3072M available memory"
[GIN] 2024/04/29 - 06:56:24 | 200 | 3.028904ms | 10.210.134.15 | HEAD "/"
[GIN] 2024/04/29 - 06:56:25 | 200 | 5.90615ms | 10.210.134.15 | POST "/api/show"
[GIN] 2024/04/29 - 06:56:25 | 200 | 1.055035ms | 10.210.134.15 | POST "/api/show"
time=2024-04-29T06:56:26.351Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-04-29T06:56:26.351Z level=WARN source=amd_linux.go:53 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-04-29T06:56:26.351Z level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1033]"
time=2024-04-29T06:56:26.351Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /tmp/ollama3355126040/rocm"
time=2024-04-29T06:56:26.352Z level=DEBUG source=amd_linux.go:123 msg="skipping rocm gfx compatibility check with HSA_OVERRIDE_GFX_VERSION=gfx1030"
time=2024-04-29T06:56:26.352Z level=DEBUG source=amd_linux.go:152 msg="discovering VRAM for amdgpu devices"
time=2024-04-29T06:56:26.352Z level=DEBUG source=amd_linux.go:171 msg="amdgpu devices [0]"
time=2024-04-29T06:56:26.352Z level=INFO source=amd_linux.go:246 msg="[0] amdgpu totalMemory 4096M"
time=2024-04-29T06:56:26.352Z level=INFO source=amd_linux.go:247 msg="[0] amdgpu freeMemory 4096M"
time=2024-04-29T06:56:26.352Z level=DEBUG source=gpu.go:254 msg="rocm detected 1 devices with 3072M available memory"
time=2024-04-29T06:56:26.352Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-04-29T06:56:26.352Z level=WARN source=amd_linux.go:53 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-04-29T06:56:26.352Z level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1033]"
time=2024-04-29T06:56:26.352Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /tmp/ollama3355126040/rocm"
time=2024-04-29T06:56:26.353Z level=DEBUG source=amd_linux.go:123 msg="skipping rocm gfx compatibility check with HSA_OVERRIDE_GFX_VERSION=gfx1030"
time=2024-04-29T06:56:26.353Z level=DEBUG source=amd_linux.go:152 msg="discovering VRAM for amdgpu devices"
time=2024-04-29T06:56:26.353Z level=DEBUG source=amd_linux.go:171 msg="amdgpu devices [0]"
time=2024-04-29T06:56:26.353Z level=INFO source=amd_linux.go:246 msg="[0] amdgpu totalMemory 4096M"
time=2024-04-29T06:56:26.353Z level=INFO source=amd_linux.go:247 msg="[0] amdgpu freeMemory 4096M"
time=2024-04-29T06:56:26.353Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-04-29T06:56:26.353Z level=DEBUG source=payload_common.go:94 msg="ordered list of LLM libraries to try [/tmp/ollama3355126040/runners/rocm_v60000/libext_server.so /tmp/ollama3355126040/runners/cpu_avx2/libext_server.so]"
time=2024-04-29T06:56:26.675Z level=INFO source=dyn_ext_server.go:87 msg="Loading Dynamic llm server: /tmp/ollama3355126040/runners/rocm_v60000/libext_server.so"
time=2024-04-29T06:56:26.676Z level=INFO source=dyn_ext_server.go:147 msg="Initializing llama server"
time=2024-04-29T06:56:26.676Z level=DEBUG source=dyn_ext_server.go:148 msg="server params: {model:0x7f7f58000b40 n_ctx:2048 n_batch:512 n_threads:0 n_parallel:1 rope_freq_base:0 rope_freq_scale:0 memory_f16:true n_gpu_layers:20 main_gpu:0 use_mlock:false use_mmap:true numa:0 embedding:true lora_adapters: mmproj: verbose_logging:true _:[0 0 0 0 0 0 0]}"
[1714373786] system info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
[1714373786] Performing pre-initialization of GPU
time=2024-04-29T06:56:26.686Z level=DEBUG source=dyn_ext_server.go:155 msg="failure during initialization: Unable to init GPU: invalid device ordinal"
time=2024-04-29T06:56:26.686Z level=WARN source=llm.go:170 msg="Failed to load dynamic library /tmp/ollama3355126040/runners/rocm_v60000/libext_server.so Unable to init GPU: invalid device ordinal"
time=2024-04-29T06:56:26.688Z level=INFO source=dyn_ext_server.go:87 msg="Loading Dynamic llm server: /tmp/ollama3355126040/runners/cpu_avx2/libext_server.so"
time=2024-04-29T06:56:26.688Z level=INFO source=dyn_ext_server.go:147 msg="Initializing llama server"
time=2024-04-29T06:56:26.688Z level=DEBUG source=dyn_ext_server.go:148 msg="server params: {model:0x7f7f58041ab0 n_ctx:2048 n_batch:512 n_threads:0 n_parallel:1 rope_freq_base:0 rope_freq_scale:0 memory_f16:true n_gpu_layers:20 main_gpu:0 use_mlock:false use_mmap:true numa:0 embedding:true lora_adapters: mmproj: verbose_logging:true _:[0 0 0 0 0 0 0]}"
[1714373786] system info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from /root/.ollama/models/blobs/sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = Meta-Llama-3-8B-Instruct
llama_model_loader: - kv 2: llama.block_count u32 = 32
llama_model_loader: - kv 3: llama.context_length u32 = 8192
llama_model_loader: - kv 4: llama.embedding_length u32 = 4096
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 6: llama.attention.head_count u32 = 32
llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: general.file_type u32 = 2
llama_model_loader: - kv 11: llama.vocab_size u32 = 128256
llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 14: tokenizer.ggml.tokens arr[str,128256] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 16: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 128001
llama_model_loader: - kv 19: tokenizer.chat_template str = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv 20: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q4_0: 225 tensors
llama_model_loader: - type q6_K: 1 tensors
llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 128256
llm_load_print_meta: n_merges = 280147
llm_load_print_meta: n_ctx_train = 8192
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 8192
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = Q4_0
llm_load_print_meta: model params = 8.03 B
llm_load_print_meta: model size = 4.33 GiB (4.64 BPW)
llm_load_print_meta: general.name = Meta-Llama-3-8B-Instruct
llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token = 128001 '<|end_of_text|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_tensors: ggml ctx size = 0.11 MiB
llm_load_tensors: CPU buffer size = 4437.80 MiB
.......................................................................................
llama_new_context_with_model: n_ctx = 2048
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: freq_base = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CPU KV buffer size = 256.00 MiB
llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB
llama_new_context_with_model: CPU output buffer size = 258.50 MiB
llama_new_context_with_model: CPU compute buffer size = 258.50 MiB
llama_new_context_with_model: graph nodes = 1060
llama_new_context_with_model: graph splits = 1
[1714373792] warming up the model with an empty run
loading library /tmp/ollama3355126040/runners/rocm_v60000/libext_server.so
loading library /tmp/ollama3355126040/runners/cpu_avx2/libext_server.so
{"function":"initialize","level":"INFO","line":422,"msg":"initializing slots","n_slots":1,"tid":"140185090115328","timestamp":1714373792}
{"function":"initialize","level":"INFO","line":431,"msg":"new slot","n_ctx_slot":2048,"slot_id":0,"tid":"140185090115328","timestamp":1714373792}
time=2024-04-29T06:56:32.423Z level=INFO source=dyn_ext_server.go:159 msg="Starting llama main loop"
time=2024-04-29T06:56:32.423Z level=DEBUG source=prompt.go:172 msg="prompt now fits in context window" required=1 window=2048
[1714373792] llama server main loop starting
{"function":"update_slots","level":"INFO","line":1550,"msg":"all slots are idle and system prompt is empty, clear the KV cache","tid":"140175322068736","timestamp":1714373792}
[GIN] 2024/04/29 - 06:56:32 | 200 | 7.379156952s | 10.210.134.15 | POST "/api/chat"

<!-- gh-comment-id:2082009572 --> @FairyTail2000 commented on GitHub (Apr 29, 2024): Yes I do set the enviroment variable. Here is the command line and debug logs (after loading llama3:instruct remotely) (deck@steamdeck ~)$ docker run --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 -e "HSA_OVERRIDE_GFX_VERSION=gfx1030" -e "OLLAMA_DEBUG=1" --name ollama ollama/ollama:0.1.30-rc4-rocm > time=2024-04-29T06:56:06.926Z level=INFO source=images.go:804 msg="total blobs: 22" time=2024-04-29T06:56:06.945Z level=INFO source=images.go:811 msg="total unused blobs removed: 0" time=2024-04-29T06:56:06.945Z level=INFO source=routes.go:1118 msg="Listening on [::]:11434 (version 0.1.30-rc4)" time=2024-04-29T06:56:06.950Z level=INFO source=payload_common.go:113 msg="Extracting dynamic libraries to /tmp/ollama3355126040/runners ..." time=2024-04-29T06:56:10.805Z level=INFO source=payload_common.go:140 msg="Dynamic LLM libraries [rocm_v60000 cpu_avx2 cpu_avx cpu cuda_v11]" time=2024-04-29T06:56:10.805Z level=DEBUG source=payload_common.go:141 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" time=2024-04-29T06:56:10.805Z level=INFO source=gpu.go:115 msg="Detecting GPU type" time=2024-04-29T06:56:10.806Z level=INFO source=gpu.go:265 msg="Searching for GPU management library libcudart.so*" time=2024-04-29T06:56:10.806Z level=DEBUG source=gpu.go:283 msg="gpu management search paths: [/tmp/ollama3355126040/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so* /opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so**]" time=2024-04-29T06:56:10.842Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: [/tmp/ollama3355126040/runners/cuda_v11/libcudart.so.11.0]" wiring cudart library functions in /tmp/ollama3355126040/runners/cuda_v11/libcudart.so.11.0 dlsym: cudaSetDevice dlsym: cudaDeviceSynchronize dlsym: cudaDeviceReset dlsym: cudaMemGetInfo dlsym: cudaGetDeviceCount dlsym: cudaDeviceGetAttribute dlsym: cudaDriverGetVersion cudaSetDevice err: 35 time=2024-04-29T06:56:10.843Z level=INFO source=gpu.go:340 msg="Unable to load cudart CUDA management library /tmp/ollama3355126040/runners/cuda_v11/libcudart.so.11.0: cudart init failure: 35" time=2024-04-29T06:56:10.843Z level=INFO source=gpu.go:265 msg="Searching for GPU management library libnvidia-ml.so" time=2024-04-29T06:56:10.843Z level=DEBUG source=gpu.go:283 msg="gpu management search paths: [/usr/local/cuda/lib64/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/libnvidia-ml.so* /usr/lib/wsl/lib/libnvidia-ml.so* /usr/lib/wsl/drivers/*/libnvidia-ml.so* /opt/cuda/lib64/libnvidia-ml.so* /usr/lib*/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/libnvidia-ml.so* /usr/local/lib*/libnvidia-ml.so* /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so* /opt/rocm/lib/libnvidia-ml.so* /usr/local/lib/libnvidia-ml.so* /opt/rh/devtoolset-7/root/libnvidia-ml.so*]" time=2024-04-29T06:56:10.844Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: []" time=2024-04-29T06:56:10.844Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-04-29T06:56:10.844Z level=WARN source=amd_linux.go:53 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-04-29T06:56:10.844Z level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1033]" time=2024-04-29T06:56:10.844Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /tmp/ollama3355126040/rocm" time=2024-04-29T06:56:10.844Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/bin" time=2024-04-29T06:56:10.845Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/bin/rocm" time=2024-04-29T06:56:10.845Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/share/ollama/lib/rocm" time=2024-04-29T06:56:10.845Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-04-29T06:56:10.846Z level=DEBUG source=amd_linux.go:279 msg="host rocm linked /opt/rocm/lib => /tmp/ollama3355126040/rocm" time=2024-04-29T06:56:10.846Z level=DEBUG source=amd_linux.go:123 msg="skipping rocm gfx compatibility check with HSA_OVERRIDE_GFX_VERSION=gfx1030" time=2024-04-29T06:56:10.846Z level=DEBUG source=amd_linux.go:152 msg="discovering VRAM for amdgpu devices" time=2024-04-29T06:56:10.846Z level=DEBUG source=amd_linux.go:171 msg="amdgpu devices [0]" time=2024-04-29T06:56:10.846Z level=INFO source=amd_linux.go:246 msg="[0] amdgpu totalMemory 4096M" time=2024-04-29T06:56:10.846Z level=INFO source=amd_linux.go:247 msg="[0] amdgpu freeMemory 4096M" time=2024-04-29T06:56:10.846Z level=DEBUG source=gpu.go:254 msg="rocm detected 1 devices with 3072M available memory" [GIN] 2024/04/29 - 06:56:24 | 200 | 3.028904ms | 10.210.134.15 | HEAD "/" [GIN] 2024/04/29 - 06:56:25 | 200 | 5.90615ms | 10.210.134.15 | POST "/api/show" [GIN] 2024/04/29 - 06:56:25 | 200 | 1.055035ms | 10.210.134.15 | POST "/api/show" time=2024-04-29T06:56:26.351Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-04-29T06:56:26.351Z level=WARN source=amd_linux.go:53 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-04-29T06:56:26.351Z level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1033]" time=2024-04-29T06:56:26.351Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /tmp/ollama3355126040/rocm" time=2024-04-29T06:56:26.352Z level=DEBUG source=amd_linux.go:123 msg="skipping rocm gfx compatibility check with HSA_OVERRIDE_GFX_VERSION=gfx1030" time=2024-04-29T06:56:26.352Z level=DEBUG source=amd_linux.go:152 msg="discovering VRAM for amdgpu devices" time=2024-04-29T06:56:26.352Z level=DEBUG source=amd_linux.go:171 msg="amdgpu devices [0]" time=2024-04-29T06:56:26.352Z level=INFO source=amd_linux.go:246 msg="[0] amdgpu totalMemory 4096M" time=2024-04-29T06:56:26.352Z level=INFO source=amd_linux.go:247 msg="[0] amdgpu freeMemory 4096M" time=2024-04-29T06:56:26.352Z level=DEBUG source=gpu.go:254 msg="rocm detected 1 devices with 3072M available memory" time=2024-04-29T06:56:26.352Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-04-29T06:56:26.352Z level=WARN source=amd_linux.go:53 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-04-29T06:56:26.352Z level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx1033]" time=2024-04-29T06:56:26.352Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /tmp/ollama3355126040/rocm" time=2024-04-29T06:56:26.353Z level=DEBUG source=amd_linux.go:123 msg="skipping rocm gfx compatibility check with HSA_OVERRIDE_GFX_VERSION=gfx1030" time=2024-04-29T06:56:26.353Z level=DEBUG source=amd_linux.go:152 msg="discovering VRAM for amdgpu devices" time=2024-04-29T06:56:26.353Z level=DEBUG source=amd_linux.go:171 msg="amdgpu devices [0]" time=2024-04-29T06:56:26.353Z level=INFO source=amd_linux.go:246 msg="[0] amdgpu totalMemory 4096M" time=2024-04-29T06:56:26.353Z level=INFO source=amd_linux.go:247 msg="[0] amdgpu freeMemory 4096M" time=2024-04-29T06:56:26.353Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-04-29T06:56:26.353Z level=DEBUG source=payload_common.go:94 msg="ordered list of LLM libraries to try [/tmp/ollama3355126040/runners/rocm_v60000/libext_server.so /tmp/ollama3355126040/runners/cpu_avx2/libext_server.so]" time=2024-04-29T06:56:26.675Z level=INFO source=dyn_ext_server.go:87 msg="Loading Dynamic llm server: /tmp/ollama3355126040/runners/rocm_v60000/libext_server.so" time=2024-04-29T06:56:26.676Z level=INFO source=dyn_ext_server.go:147 msg="Initializing llama server" time=2024-04-29T06:56:26.676Z level=DEBUG source=dyn_ext_server.go:148 msg="server params: {model:0x7f7f58000b40 n_ctx:2048 n_batch:512 n_threads:0 n_parallel:1 rope_freq_base:0 rope_freq_scale:0 memory_f16:true n_gpu_layers:20 main_gpu:0 use_mlock:false use_mmap:true numa:0 embedding:true lora_adapters:<nil> mmproj:<nil> verbose_logging:true _:[0 0 0 0 0 0 0]}" [1714373786] system info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | [1714373786] Performing pre-initialization of GPU time=2024-04-29T06:56:26.686Z level=DEBUG source=dyn_ext_server.go:155 msg="failure during initialization: Unable to init GPU: invalid device ordinal" time=2024-04-29T06:56:26.686Z level=WARN source=llm.go:170 msg="Failed to load dynamic library /tmp/ollama3355126040/runners/rocm_v60000/libext_server.so Unable to init GPU: invalid device ordinal" time=2024-04-29T06:56:26.688Z level=INFO source=dyn_ext_server.go:87 msg="Loading Dynamic llm server: /tmp/ollama3355126040/runners/cpu_avx2/libext_server.so" time=2024-04-29T06:56:26.688Z level=INFO source=dyn_ext_server.go:147 msg="Initializing llama server" time=2024-04-29T06:56:26.688Z level=DEBUG source=dyn_ext_server.go:148 msg="server params: {model:0x7f7f58041ab0 n_ctx:2048 n_batch:512 n_threads:0 n_parallel:1 rope_freq_base:0 rope_freq_scale:0 memory_f16:true n_gpu_layers:20 main_gpu:0 use_mlock:false use_mmap:true numa:0 embedding:true lora_adapters:<nil> mmproj:<nil> verbose_logging:true _:[0 0 0 0 0 0 0]}" [1714373786] system info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from /root/.ollama/models/blobs/sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = Meta-Llama-3-8B-Instruct llama_model_loader: - kv 2: llama.block_count u32 = 32 llama_model_loader: - kv 3: llama.context_length u32 = 8192 llama_model_loader: - kv 4: llama.embedding_length u32 = 4096 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336 llama_model_loader: - kv 6: llama.attention.head_count u32 = 32 llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 10: general.file_type u32 = 2 llama_model_loader: - kv 11: llama.vocab_size u32 = 128256 llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 14: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 16: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 128001 llama_model_loader: - kv 19: tokenizer.chat_template str = {% set loop_messages = messages %}{% ... llama_model_loader: - kv 20: general.quantization_version u32 = 2 llama_model_loader: - type f32: 65 tensors llama_model_loader: - type q4_0: 225 tensors llama_model_loader: - type q6_K: 1 tensors llm_load_vocab: special tokens definition check successful ( 256/128256 ). llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 128256 llm_load_print_meta: n_merges = 280147 llm_load_print_meta: n_ctx_train = 8192 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 4 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 14336 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 500000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 8192 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: model type = 7B llm_load_print_meta: model ftype = Q4_0 llm_load_print_meta: model params = 8.03 B llm_load_print_meta: model size = 4.33 GiB (4.64 BPW) llm_load_print_meta: general.name = Meta-Llama-3-8B-Instruct llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' llm_load_print_meta: EOS token = 128001 '<|end_of_text|>' llm_load_print_meta: LF token = 128 'Ä' llm_load_tensors: ggml ctx size = 0.11 MiB llm_load_tensors: CPU buffer size = 4437.80 MiB ....................................................................................... llama_new_context_with_model: n_ctx = 2048 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: freq_base = 500000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: CPU KV buffer size = 256.00 MiB llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB llama_new_context_with_model: CPU output buffer size = 258.50 MiB llama_new_context_with_model: CPU compute buffer size = 258.50 MiB llama_new_context_with_model: graph nodes = 1060 llama_new_context_with_model: graph splits = 1 [1714373792] warming up the model with an empty run loading library /tmp/ollama3355126040/runners/rocm_v60000/libext_server.so loading library /tmp/ollama3355126040/runners/cpu_avx2/libext_server.so {"function":"initialize","level":"INFO","line":422,"msg":"initializing slots","n_slots":1,"tid":"140185090115328","timestamp":1714373792} {"function":"initialize","level":"INFO","line":431,"msg":"new slot","n_ctx_slot":2048,"slot_id":0,"tid":"140185090115328","timestamp":1714373792} time=2024-04-29T06:56:32.423Z level=INFO source=dyn_ext_server.go:159 msg="Starting llama main loop" time=2024-04-29T06:56:32.423Z level=DEBUG source=prompt.go:172 msg="prompt now fits in context window" required=1 window=2048 [1714373792] llama server main loop starting {"function":"update_slots","level":"INFO","line":1550,"msg":"all slots are idle and system prompt is empty, clear the KV cache","tid":"140175322068736","timestamp":1714373792} [GIN] 2024/04/29 - 06:56:32 | 200 | 7.379156952s | 10.210.134.15 | POST "/api/chat"
Author
Owner

@Talleyrand-34 commented on GitHub (Jun 5, 2024):

Sorry, but after reading this i don't understand, ollama has steamdeck support?

<!-- gh-comment-id:2149995851 --> @Talleyrand-34 commented on GitHub (Jun 5, 2024): Sorry, but after reading this i don't understand, ollama has steamdeck support?
Author
Owner

@FairyTail2000 commented on GitHub (Jun 6, 2024):

Thanks for your responds. But at least for my steam deck, it reports the driver as incompatible and with the override, the gpu initilization will fail

If yours behaves differently, please share your configuration so I can replicate it

<!-- gh-comment-id:2151455954 --> @FairyTail2000 commented on GitHub (Jun 6, 2024): Thanks for your responds. But at least for my steam deck, it reports the driver as incompatible and with the override, the gpu initilization will fail If yours behaves differently, please share your configuration so I can replicate it
Author
Owner

@dhiltgen commented on GitHub (Jun 6, 2024):

My suspicion is the bundled ROCm library we're including is somehow incompatible with the system. Building from source might be a viable workaround until we can get this resolved.

Unable to init GPU: invalid device ordinal

<!-- gh-comment-id:2153497946 --> @dhiltgen commented on GitHub (Jun 6, 2024): My suspicion is the bundled ROCm library we're including is somehow incompatible with the system. Building from source might be a viable workaround until we can get this resolved. `Unable to init GPU: invalid device ordinal`
Author
Owner

@sammcj commented on GitHub (Jun 7, 2024):

Might be related - I'm using nvidia GPUs (Ryzen 7600 CPU/APU though) but I noticed recently Ollama is spamming the logs with AMD GPU errors similar to above even when I'm not passing any AMD GPU through to the container.

amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-07T00:47:47.003Z level=WARN source=amd_linux.go:163 msg="amdgpu too old gfx000" gpu=0
time=2024-06-07T00:47:47.003Z level=INFO source=amd_linux.go:311 msg="no compatible amdgpu devices detected"

Note that this occurs even when building Ollama from source.

<!-- gh-comment-id:2153660886 --> @sammcj commented on GitHub (Jun 7, 2024): Might be related - I'm using nvidia GPUs (Ryzen 7600 CPU/APU though) but I noticed recently Ollama is spamming the logs with AMD GPU errors similar to above even when I'm not passing any AMD GPU through to the container. ``` amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-07T00:47:47.003Z level=WARN source=amd_linux.go:163 msg="amdgpu too old gfx000" gpu=0 time=2024-06-07T00:47:47.003Z level=INFO source=amd_linux.go:311 msg="no compatible amdgpu devices detected" ``` Note that this occurs even when building Ollama from source.
Author
Owner

@xyproto commented on GitHub (Jun 9, 2024):

Does it work now on the Steam Deck if you:

  • Enter Desktop Mode.
  • Perform the following steps, if you haven't already:
    • Use passwd to create a password for the deck user.
    • Disable read-only mode with sudo btrfs property set -ts / ro false.
    • Initialize the pacman keyring with sudo pacman-key --init.
    • Populate the pacman keyring with the default Arch Linux keys with sudo pacman-key --populate archlinux.
  • And then:
    • Sync the packages with sudo pacman -Sy.
    • Install Ollama for ROCm with sudo pacman -S ollama-rocm.
    • Start/restart the Ollama service with sudo systemctl restart ollama.
    • Try it out with ie. ollama run tinyllama "Write a haiku about 7 llamas"

?

<!-- gh-comment-id:2156617862 --> @xyproto commented on GitHub (Jun 9, 2024): Does it work now on the Steam Deck if you: * Enter Desktop Mode. * Perform the following steps, if you haven't already: * Use `passwd` to create a password for the deck user. * Disable read-only mode with `sudo btrfs property set -ts / ro false`. * Initialize the pacman keyring with `sudo pacman-key --init`. * Populate the pacman keyring with the default Arch Linux keys with `sudo pacman-key --populate archlinux`. * And then: * Sync the packages with `sudo pacman -Sy`. * Install Ollama for ROCm with `sudo pacman -S ollama-rocm`. * Start/restart the Ollama service with `sudo systemctl restart ollama`. * Try it out with ie. `ollama run tinyllama "Write a haiku about 7 llamas"` ?
Author
Owner

@FairyTail2000 commented on GitHub (Jun 10, 2024):

I do not have enough space on my system partition for this to check, but since I use docker for my work with ollama, I would really like a solution for that

<!-- gh-comment-id:2157278563 --> @FairyTail2000 commented on GitHub (Jun 10, 2024): I do not have enough space on my system partition for this to check, but since I use docker for my work with ollama, I would really like a solution for that
Author
Owner

@sebastianlutter commented on GitHub (Jun 11, 2024):

@xyproto I don't have the ollama-rocm available with pacman after following your steps:

passwd
sudo steamos-readonly disable
sudo pacman-key --init
sudo pacman-key --populate archlinux

Then:

(deck@steamdeck common)$ sudo pacman -Sy
:: Synchronizing package databases...
 jupiter-3.5 is up to date
 holo-3.5 is up to date
 core-3.5 is up to date
 extra-3.5 is up to date
 community-3.5 is up to date
 multilib-3.5 is up to date
(deck@steamdeck common)$ pacman -Ss ollama
(deck@steamdeck common)$ pacman -Ss ollama-rocm
(deck@steamdeck common)$ sudo pacman -S ollama-rocm
[sudo] password for deck: 
error: target not found: ollama-rocm

Don't know why I don't see the package:

<!-- gh-comment-id:2160194194 --> @sebastianlutter commented on GitHub (Jun 11, 2024): @xyproto I don't have the `ollama-rocm` available with pacman after following your steps: ``` passwd sudo steamos-readonly disable sudo pacman-key --init sudo pacman-key --populate archlinux ``` Then: ``` (deck@steamdeck common)$ sudo pacman -Sy :: Synchronizing package databases... jupiter-3.5 is up to date holo-3.5 is up to date core-3.5 is up to date extra-3.5 is up to date community-3.5 is up to date multilib-3.5 is up to date (deck@steamdeck common)$ pacman -Ss ollama (deck@steamdeck common)$ pacman -Ss ollama-rocm (deck@steamdeck common)$ sudo pacman -S ollama-rocm [sudo] password for deck: error: target not found: ollama-rocm ``` Don't know why I don't see the package: * https://archlinux.org/packages/extra/x86_64/ollama-rocm/
Author
Owner

@xyproto commented on GitHub (Jun 11, 2024):

@sebastianlutter Ah, it will probably be available in a later release of SteamOS, then. In the mean time, enabling pacman mirrors and repositories in /etc and then installing the package is probably possible. Or installing the binary package directly with pacman -U. I'll have to test this myself before having a definite answer.

<!-- gh-comment-id:2160380223 --> @xyproto commented on GitHub (Jun 11, 2024): @sebastianlutter Ah, it will probably be available in a later release of SteamOS, then. In the mean time, enabling pacman mirrors and repositories in /etc and then installing the package is probably possible. Or installing the binary package directly with pacman -U. I'll have to test this myself before having a definite answer.
Author
Owner

@Talleyrand-34 commented on GitHub (Jun 11, 2024):

I tried to install ollama-rocm from aur but needs pacman 6.1 and steam only delivers pacman 6.0.

Also important thing

ls /sys/module/amdgpu/version
ls: cannot access '/sys/module/amdgpu/version': No such file or directory

ls /sys/module/amdgpu
coresize  holders   initstate  parameters  sections    taint
drivers   initsize  notes      refcnt      srcversion  uevent

<!-- gh-comment-id:2160568774 --> @Talleyrand-34 commented on GitHub (Jun 11, 2024): I tried to install ollama-rocm from aur but needs pacman 6.1 and steam only delivers pacman 6.0. Also important thing ``` log ls /sys/module/amdgpu/version ls: cannot access '/sys/module/amdgpu/version': No such file or directory ls /sys/module/amdgpu coresize holders initstate parameters sections taint drivers initsize notes refcnt srcversion uevent ```
Author
Owner

@Talleyrand-34 commented on GitHub (Jun 11, 2024):

Here is ollama new installed:

/usr/local/bin/ollama -v              
Warning: could not connect to a running Ollama instance
Warning: client version is 0.1.42

Is it the version?

  • Debug log
OLLAMA_DEBUG=true /usr/local/bin/ollama serve
2024/06/11 14:38:30 routes.go:1011: INFO server config env="map[OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST: OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY
time=2024-06-11T14:38:30.950+02:00 level=INFO source=images.go:740 msg="total blobs: 0"
time=2024-06-11T14:38:30.950+02:00 level=INFO source=images.go:747 msg="total unused blobs removed: 0"
time=2024-06-11T14:38:30.950+02:00 level=INFO source=routes.go:1057 msg="Listening on 127.0.0.1:11434 (version 0.1.42)"
time=2024-06-11T14:38:30.951+02:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama1949874256/runners
time=2024-06-11T14:38:30.951+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz
time=2024-06-11T14:38:30.951+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz
time=2024-06-11T14:38:30.951+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz
time=2024-06-11T14:38:30.951+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz
time=2024-06-11T14:38:30.951+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz
time=2024-06-11T14:38:30.951+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz
time=2024-06-11T14:38:30.951+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz
time=2024-06-11T14:38:30.951+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/deps.txt.gz
time=2024-06-11T14:38:30.951+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/ollama_llama_serv
time=2024-06-11T14:38:34.258+02:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1949874256/runners/cpu
time=2024-06-11T14:38:34.258+02:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1949874256/runners/cpu_avx
time=2024-06-11T14:38:34.258+02:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1949874256/runners/cpu_avx2
time=2024-06-11T14:38:34.258+02:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1949874256/runners/cuda_v11
time=2024-06-11T14:38:34.258+02:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1949874256/runners/rocm_v60002
time=2024-06-11T14:38:34.258+02:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx2 cuda_v11 rocm_v60002 cpu cpu_avx]"
time=2024-06-11T14:38:34.258+02:00 level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-06-11T14:38:34.258+02:00 level=DEBUG source=sched.go:90 msg="starting llm scheduler"
time=2024-06-11T14:38:34.258+02:00 level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
time=2024-06-11T14:38:34.258+02:00 level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
time=2024-06-11T14:38:34.258+02:00 level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-06-11T14:38:34.269+02:00 level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[]
time=2024-06-11T14:38:34.269+02:00 level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
time=2024-06-11T14:38:34.269+02:00 level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/root/libcudart.so** /tmp/ollama1949874256/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-06-11T14:38:34.272+02:00 level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama1949874256/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-06-11T14:38:34.272+02:00 level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama1949874256/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-06-11T14:38:34.272+02:00 level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-06-11T14:38:34.272+02:00 level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-11T14:38:34.272+02:00 level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-11T14:38:34.273+02:00 level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-11T14:38:34.273+02:00 level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-06-11T14:38:34.273+02:00 level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="1.0 GiB"
time=2024-06-11T14:38:34.273+02:00 level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="1.0 GiB"
time=2024-06-11T14:38:34.273+02:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-06-11T14:38:34.273+02:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/lib64"
time=2024-06-11T14:38:34.275+02:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/local/bin/rocm"
time=2024-06-11T14:38:34.275+02:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/share/ollama/lib/rocm"
time=2024-06-11T14:38:34.276+02:00 level=DEBUG source=amd_linux.go:292 msg="rocm supported GPUs" types="[gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]"
time=2024-06-11T14:38:34.276+02:00 level=WARN source=amd_linux.go:296 msg="amdgpu is not supported" gpu=0 gpu_type=gfx1033 library=/usr/share/ollama/lib/rocm supported_types="[gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]"
time=2024-06-11T14:38:34.276+02:00 level=WARN source=amd_linux.go:298 msg="See https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides for HSA_OVERRIDE_GFX_VERSION usage"
time=2024-06-11T14:38:34.276+02:00 level=INFO source=amd_linux.go:311 msg="no compatible amdgpu devices detected"
time=2024-06-11T14:38:34.276+02:00 level=INFO source=types.go:71 msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="14.5 GiB" available="155.3 MiB"

Override gpu driver 1030 and debug log

HSA_OVERRIDE_GFX_VERSION=gfx1030 OLLAMA_DEBUG=true /usr/local/bin/ollama serve
2024/06/11 14:42:37 routes.go:1011: INFO server config env="map[OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST: OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS: OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
time=2024-06-11T14:42:37.763+02:00 level=INFO source=images.go:740 msg="total blobs: 0"
time=2024-06-11T14:42:37.763+02:00 level=INFO source=images.go:747 msg="total unused blobs removed: 0"
time=2024-06-11T14:42:37.763+02:00 level=INFO source=routes.go:1057 msg="Listening on 127.0.0.1:11434 (version 0.1.42)"
time=2024-06-11T14:42:37.763+02:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3770653781/runners
time=2024-06-11T14:42:37.764+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz
time=2024-06-11T14:42:37.764+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz
time=2024-06-11T14:42:37.764+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz
time=2024-06-11T14:42:37.764+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz
time=2024-06-11T14:42:37.764+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz
time=2024-06-11T14:42:37.764+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz
time=2024-06-11T14:42:37.764+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz
time=2024-06-11T14:42:37.764+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/deps.txt.gz
time=2024-06-11T14:42:37.764+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/ollama_llama_server.gz
time=2024-06-11T14:42:41.017+02:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3770653781/runners/cpu
time=2024-06-11T14:42:41.017+02:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3770653781/runners/cpu_avx
time=2024-06-11T14:42:41.017+02:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3770653781/runners/cpu_avx2
time=2024-06-11T14:42:41.017+02:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3770653781/runners/cuda_v11
time=2024-06-11T14:42:41.017+02:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3770653781/runners/rocm_v60002
time=2024-06-11T14:42:41.017+02:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]"
time=2024-06-11T14:42:41.017+02:00 level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-06-11T14:42:41.017+02:00 level=DEBUG source=sched.go:90 msg="starting llm scheduler"
time=2024-06-11T14:42:41.017+02:00 level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
time=2024-06-11T14:42:41.017+02:00 level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
time=2024-06-11T14:42:41.017+02:00 level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-06-11T14:42:41.027+02:00 level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[]
time=2024-06-11T14:42:41.028+02:00 level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
time=2024-06-11T14:42:41.028+02:00 level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/root/libcudart.so** /tmp/ollama3770653781/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-06-11T14:42:41.032+02:00 level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama3770653781/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-06-11T14:42:41.032+02:00 level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama3770653781/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-06-11T14:42:41.032+02:00 level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-06-11T14:42:41.032+02:00 level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-11T14:42:41.032+02:00 level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-11T14:42:41.032+02:00 level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-11T14:42:41.032+02:00 level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-06-11T14:42:41.032+02:00 level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="1.0 GiB"
time=2024-06-11T14:42:41.032+02:00 level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="1.0 GiB"
time=2024-06-11T14:42:41.032+02:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-06-11T14:42:41.032+02:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/lib64"
time=2024-06-11T14:42:41.034+02:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/local/bin/rocm"
time=2024-06-11T14:42:41.034+02:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/share/ollama/lib/rocm"
time=2024-06-11T14:42:41.034+02:00 level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030
time=2024-06-11T14:42:41.034+02:00 level=INFO source=types.go:71 msg="inference compute" id=0 library=rocm compute=gfx1033 driver=0.0 name=1002:163f total="1.0 GiB" available="1.0 GiB"

Rocminfo

/opt/rocm/bin/rocminfo
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE
System Endianness:       LITTLE

==========
HSA Agents
==========
*******
Agent 1
*******
  Name:                    AMD Custom APU 0405
  Uuid:                    CPU-XX
  Marketing Name:          AMD Custom APU 0405
  Vendor Name:             CPU
  Feature:                 None specified
  Profile:                 FULL_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        0(0x0)
  Queue Min Size:          0(0x0)
  Queue Max Size:          0(0x0)
  Queue Type:              MULTI
  Node:                    0
  Device Type:             CPU
  Cache Info:
    L1:                      32768(0x8000) KB
  Chip ID:                 0(0x0)
  ASIC Revision:           0(0x0)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   2800
  BDFID:                   0
  Internal Node ID:        0
  Compute Unit:            8
  SIMDs per CU:            0
  Shader Engines:          0
  Shader Arrs. per Eng.:   0
  WatchPts on Addr. Ranges:1
  Features:                None
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: FINE GRAINED
      Size:                    15169984(0xe779c0) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 2
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    15169984(0xe779c0) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 3
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    15169984(0xe779c0) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
  ISA Info:
*******
Agent 2
*******
  Name:                    gfx1033
  Uuid:                    GPU-XX
  Marketing Name:          AMD Custom GPU 0405
  Vendor Name:             AMD
  Feature:                 KERNEL_DISPATCH
  Profile:                 BASE_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        128(0x80)
  Queue Min Size:          64(0x40)
  Queue Max Size:          131072(0x20000)
  Queue Type:              MULTI
  Node:                    1
  Device Type:             GPU
  Cache Info:
    L1:                      16(0x10) KB
    L2:                      1024(0x400) KB
  Chip ID:                 5695(0x163f)
  ASIC Revision:           0(0x0)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   0
  BDFID:                   1024
  Internal Node ID:        1
  Compute Unit:            8
  SIMDs per CU:            2
  Shader Engines:          1
  Shader Arrs. per Eng.:   1
  WatchPts on Addr. Ranges:4
  Features:                KERNEL_DISPATCH
  Fast F16 Operation:      TRUE
  Wavefront Size:          32(0x20)
  Workgroup Max Size:      1024(0x400)
  Workgroup Max Size per Dimension:
    x                        1024(0x400)
    y                        1024(0x400)
    z                        1024(0x400)
  Max Waves Per CU:        32(0x20)
  Max Work-item Per CU:    1024(0x400)
  Grid Max Size:           4294967295(0xffffffff)
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)
    y                        4294967295(0xffffffff)
    z                        4294967295(0xffffffff)
  Max fbarriers/Workgrp:   32
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    1048576(0x100000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 2
      Segment:                 GROUP
      Size:                    64(0x40) KB
      Allocatable:             FALSE
      Alloc Granule:           0KB
      Alloc Alignment:         0KB
      Accessible by all:       FALSE
  ISA Info:
    ISA 1
      Name:                    amdgcn-amd-amdhsa--gfx1033
      Machine Models:          HSA_MACHINE_MODEL_LARGE
      Profiles:                HSA_PROFILE_BASE
      Default Rounding Mode:   NEAR
      Default Rounding Mode:   NEAR
      Fast f16:                TRUE
      Workgroup Max Size:      1024(0x400)
      Workgroup Max Size per Dimension:
        x                        1024(0x400)
        y                        1024(0x400)
        z                        1024(0x400)
      Grid Max Size:           4294967295(0xffffffff)
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)
        y                        4294967295(0xffffffff)
        z                        4294967295(0xffffffff)
      FBarrier Max Size:       32
*** Done ***
<!-- gh-comment-id:2160698563 --> @Talleyrand-34 commented on GitHub (Jun 11, 2024): Here is ollama new installed: ``` log /usr/local/bin/ollama -v Warning: could not connect to a running Ollama instance Warning: client version is 0.1.42 ``` Is it the version? - Debug log ```log OLLAMA_DEBUG=true /usr/local/bin/ollama serve 2024/06/11 14:38:30 routes.go:1011: INFO server config env="map[OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST: OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY time=2024-06-11T14:38:30.950+02:00 level=INFO source=images.go:740 msg="total blobs: 0" time=2024-06-11T14:38:30.950+02:00 level=INFO source=images.go:747 msg="total unused blobs removed: 0" time=2024-06-11T14:38:30.950+02:00 level=INFO source=routes.go:1057 msg="Listening on 127.0.0.1:11434 (version 0.1.42)" time=2024-06-11T14:38:30.951+02:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama1949874256/runners time=2024-06-11T14:38:30.951+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz time=2024-06-11T14:38:30.951+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz time=2024-06-11T14:38:30.951+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz time=2024-06-11T14:38:30.951+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz time=2024-06-11T14:38:30.951+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz time=2024-06-11T14:38:30.951+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz time=2024-06-11T14:38:30.951+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz time=2024-06-11T14:38:30.951+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/deps.txt.gz time=2024-06-11T14:38:30.951+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/ollama_llama_serv time=2024-06-11T14:38:34.258+02:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1949874256/runners/cpu time=2024-06-11T14:38:34.258+02:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1949874256/runners/cpu_avx time=2024-06-11T14:38:34.258+02:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1949874256/runners/cpu_avx2 time=2024-06-11T14:38:34.258+02:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1949874256/runners/cuda_v11 time=2024-06-11T14:38:34.258+02:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama1949874256/runners/rocm_v60002 time=2024-06-11T14:38:34.258+02:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx2 cuda_v11 rocm_v60002 cpu cpu_avx]" time=2024-06-11T14:38:34.258+02:00 level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" time=2024-06-11T14:38:34.258+02:00 level=DEBUG source=sched.go:90 msg="starting llm scheduler" time=2024-06-11T14:38:34.258+02:00 level=DEBUG source=gpu.go:132 msg="Detecting GPUs" time=2024-06-11T14:38:34.258+02:00 level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* time=2024-06-11T14:38:34.258+02:00 level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-06-11T14:38:34.269+02:00 level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[] time=2024-06-11T14:38:34.269+02:00 level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* time=2024-06-11T14:38:34.269+02:00 level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/root/libcudart.so** /tmp/ollama1949874256/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-06-11T14:38:34.272+02:00 level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama1949874256/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 35 time=2024-06-11T14:38:34.272+02:00 level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama1949874256/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2024-06-11T14:38:34.272+02:00 level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" time=2024-06-11T14:38:34.272+02:00 level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-11T14:38:34.272+02:00 level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-11T14:38:34.273+02:00 level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-11T14:38:34.273+02:00 level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-06-11T14:38:34.273+02:00 level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="1.0 GiB" time=2024-06-11T14:38:34.273+02:00 level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="1.0 GiB" time=2024-06-11T14:38:34.273+02:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-06-11T14:38:34.273+02:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/lib64" time=2024-06-11T14:38:34.275+02:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/local/bin/rocm" time=2024-06-11T14:38:34.275+02:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/share/ollama/lib/rocm" time=2024-06-11T14:38:34.276+02:00 level=DEBUG source=amd_linux.go:292 msg="rocm supported GPUs" types="[gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]" time=2024-06-11T14:38:34.276+02:00 level=WARN source=amd_linux.go:296 msg="amdgpu is not supported" gpu=0 gpu_type=gfx1033 library=/usr/share/ollama/lib/rocm supported_types="[gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]" time=2024-06-11T14:38:34.276+02:00 level=WARN source=amd_linux.go:298 msg="See https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides for HSA_OVERRIDE_GFX_VERSION usage" time=2024-06-11T14:38:34.276+02:00 level=INFO source=amd_linux.go:311 msg="no compatible amdgpu devices detected" time=2024-06-11T14:38:34.276+02:00 level=INFO source=types.go:71 msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="14.5 GiB" available="155.3 MiB" ``` Override gpu driver 1030 and debug log ``` log HSA_OVERRIDE_GFX_VERSION=gfx1030 OLLAMA_DEBUG=true /usr/local/bin/ollama serve 2024/06/11 14:42:37 routes.go:1011: INFO server config env="map[OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST: OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS: OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]" time=2024-06-11T14:42:37.763+02:00 level=INFO source=images.go:740 msg="total blobs: 0" time=2024-06-11T14:42:37.763+02:00 level=INFO source=images.go:747 msg="total unused blobs removed: 0" time=2024-06-11T14:42:37.763+02:00 level=INFO source=routes.go:1057 msg="Listening on 127.0.0.1:11434 (version 0.1.42)" time=2024-06-11T14:42:37.763+02:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3770653781/runners time=2024-06-11T14:42:37.764+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz time=2024-06-11T14:42:37.764+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz time=2024-06-11T14:42:37.764+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz time=2024-06-11T14:42:37.764+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz time=2024-06-11T14:42:37.764+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz time=2024-06-11T14:42:37.764+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz time=2024-06-11T14:42:37.764+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz time=2024-06-11T14:42:37.764+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/deps.txt.gz time=2024-06-11T14:42:37.764+02:00 level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/ollama_llama_server.gz time=2024-06-11T14:42:41.017+02:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3770653781/runners/cpu time=2024-06-11T14:42:41.017+02:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3770653781/runners/cpu_avx time=2024-06-11T14:42:41.017+02:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3770653781/runners/cpu_avx2 time=2024-06-11T14:42:41.017+02:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3770653781/runners/cuda_v11 time=2024-06-11T14:42:41.017+02:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3770653781/runners/rocm_v60002 time=2024-06-11T14:42:41.017+02:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]" time=2024-06-11T14:42:41.017+02:00 level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" time=2024-06-11T14:42:41.017+02:00 level=DEBUG source=sched.go:90 msg="starting llm scheduler" time=2024-06-11T14:42:41.017+02:00 level=DEBUG source=gpu.go:132 msg="Detecting GPUs" time=2024-06-11T14:42:41.017+02:00 level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* time=2024-06-11T14:42:41.017+02:00 level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-06-11T14:42:41.027+02:00 level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[] time=2024-06-11T14:42:41.028+02:00 level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* time=2024-06-11T14:42:41.028+02:00 level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/root/libcudart.so** /tmp/ollama3770653781/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-06-11T14:42:41.032+02:00 level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama3770653781/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 35 time=2024-06-11T14:42:41.032+02:00 level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama3770653781/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2024-06-11T14:42:41.032+02:00 level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" time=2024-06-11T14:42:41.032+02:00 level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-11T14:42:41.032+02:00 level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-11T14:42:41.032+02:00 level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-11T14:42:41.032+02:00 level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-06-11T14:42:41.032+02:00 level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="1.0 GiB" time=2024-06-11T14:42:41.032+02:00 level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="1.0 GiB" time=2024-06-11T14:42:41.032+02:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-06-11T14:42:41.032+02:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/lib64" time=2024-06-11T14:42:41.034+02:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/local/bin/rocm" time=2024-06-11T14:42:41.034+02:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/share/ollama/lib/rocm" time=2024-06-11T14:42:41.034+02:00 level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 time=2024-06-11T14:42:41.034+02:00 level=INFO source=types.go:71 msg="inference compute" id=0 library=rocm compute=gfx1033 driver=0.0 name=1002:163f total="1.0 GiB" available="1.0 GiB" ``` Rocminfo ``` log /opt/rocm/bin/rocminfo ROCk module is loaded ===================== HSA System Attributes ===================== Runtime Version: 1.1 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE ========== HSA Agents ========== ******* Agent 1 ******* Name: AMD Custom APU 0405 Uuid: CPU-XX Marketing Name: AMD Custom APU 0405 Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 2800 BDFID: 0 Internal Node ID: 0 Compute Unit: 8 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 15169984(0xe779c0) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 15169984(0xe779c0) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 15169984(0xe779c0) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: ******* Agent 2 ******* Name: gfx1033 Uuid: GPU-XX Marketing Name: AMD Custom GPU 0405 Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 1024(0x400) KB Chip ID: 5695(0x163f) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 0 BDFID: 1024 Internal Node ID: 1 Compute Unit: 8 SIMDs per CU: 2 Shader Engines: 1 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 32(0x20) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 1048576(0x100000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx1033 Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 *** Done *** ```
Author
Owner

@Talleyrand-34 commented on GitHub (Jun 11, 2024):

Aditionally I would recommend an special method to install ollama on steam deck / modify installation path because with an update everything outside of /home is wiped.

<!-- gh-comment-id:2160705665 --> @Talleyrand-34 commented on GitHub (Jun 11, 2024): Aditionally I would recommend an special method to install ollama on steam deck / modify installation path because with an update everything outside of /home is wiped.
Author
Owner

@sebastianlutter commented on GitHub (Jun 11, 2024):

I got it working with docker like this on my steam deck:

VERSION="0.1.43-rocm"

docker run --rm \
	--device /dev/kfd \
	--device /dev/dri \
	-v $(pwd)/ollama:/root/.ollama \
	-p 11434:11434 \
	-e "HSA_OVERRIDE_GFX_VERSION=gfx1030" \
	--name ollama ollama/ollama:${VERSION}

GPU support seems to work:

2024/06/11 18:11:50 routes.go:1011: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST: OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS: OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
time=2024-06-11T18:11:50.313Z level=INFO source=images.go:740 msg="total blobs: 0"
time=2024-06-11T18:11:50.313Z level=INFO source=images.go:747 msg="total unused blobs removed: 0"
time=2024-06-11T18:11:50.313Z level=INFO source=routes.go:1057 msg="Listening on [::]:11434 (version 0.1.42)"
time=2024-06-11T18:11:50.318Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2755781949/runners
time=2024-06-11T18:11:54.384Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx2 cuda_v11 rocm_v60002 cpu cpu_avx]"
time=2024-06-11T18:11:54.412Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-11T18:11:54.413Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030
time=2024-06-11T18:11:54.413Z level=INFO source=types.go:71 msg="inference compute" id=0 library=rocm compute=gfx1033 driver=0.0 name=1002:163f total="1.0 GiB" available="1.0 GiB"

Steam version:

(deck@steamdeck ~)$ cat /etc/*release*|grep ID
DISTRIB_ID="SteamOS"
ID=steamos
ID_LIKE=arch
VARIANT_ID=steamdeck
VERSION_ID=3.5.19
BUILD_ID=20240422.1

GPU:

(deck@steamdeck ~)$ sudo lspci| grep -i gpu
04:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] VanGogh [AMD Custom GPU 0405] (rev ae)
<!-- gh-comment-id:2161631991 --> @sebastianlutter commented on GitHub (Jun 11, 2024): I got it working with docker like this on my steam deck: ``` VERSION="0.1.43-rocm" docker run --rm \ --device /dev/kfd \ --device /dev/dri \ -v $(pwd)/ollama:/root/.ollama \ -p 11434:11434 \ -e "HSA_OVERRIDE_GFX_VERSION=gfx1030" \ --name ollama ollama/ollama:${VERSION} ``` GPU support seems to work: ``` 2024/06/11 18:11:50 routes.go:1011: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST: OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS: OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]" time=2024-06-11T18:11:50.313Z level=INFO source=images.go:740 msg="total blobs: 0" time=2024-06-11T18:11:50.313Z level=INFO source=images.go:747 msg="total unused blobs removed: 0" time=2024-06-11T18:11:50.313Z level=INFO source=routes.go:1057 msg="Listening on [::]:11434 (version 0.1.42)" time=2024-06-11T18:11:50.318Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2755781949/runners time=2024-06-11T18:11:54.384Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx2 cuda_v11 rocm_v60002 cpu cpu_avx]" time=2024-06-11T18:11:54.412Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-11T18:11:54.413Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 time=2024-06-11T18:11:54.413Z level=INFO source=types.go:71 msg="inference compute" id=0 library=rocm compute=gfx1033 driver=0.0 name=1002:163f total="1.0 GiB" available="1.0 GiB" ``` Steam version: ``` (deck@steamdeck ~)$ cat /etc/*release*|grep ID DISTRIB_ID="SteamOS" ID=steamos ID_LIKE=arch VARIANT_ID=steamdeck VERSION_ID=3.5.19 BUILD_ID=20240422.1 ``` GPU: ``` (deck@steamdeck ~)$ sudo lspci| grep -i gpu 04:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] VanGogh [AMD Custom GPU 0405] (rev ae) ```
Author
Owner

@5310 commented on GitHub (Jun 12, 2024):

time=2024-06-11T18:11:54.412Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"

Still looks like it's running of of CPU though. At least it did on my Deck after I tried to run that version just now—was checking the system resource monitor and only the CPU ramps up. Same error message.

Edit: It seems while I still don't have a /sys/module/amdgpu/version file on the Deck, I do have a /sys/module/amdgpu/srcversion file, containing D97512588BEE2F480E82473

<!-- gh-comment-id:2162283081 --> @5310 commented on GitHub (Jun 12, 2024): > time=2024-06-11T18:11:54.412Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" Still looks like it's running of of CPU though. At least it did on my Deck after I tried to run that version just now—was checking the system resource monitor and only the CPU ramps up. Same error message. Edit: It seems while I still don't have a `/sys/module/amdgpu/version` file on the Deck, I do have a `/sys/module/amdgpu/srcversion` file, containing `D97512588BEE2F480E82473`
Author
Owner

@Talleyrand-34 commented on GitHub (Jun 12, 2024):

GPU support seems to work:

2024/06/11 18:11:50 routes.go:1011: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST: OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS: OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
...
time=2024-06-11T18:11:54.412Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-11T18:11:54.413Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030
time=2024-06-11T18:11:54.413Z level=INFO source=types.go:71 msg="inference compute" id=0 library=rocm compute=gfx1033 driver=0.0 name=1002:163f total="1.0 GiB" available="1.0 GiB"

I also get this, but if you read the characteristics of the gpu you notice that it is not the specs of the gpu. Moreover "ollama serve" fails to start due to this gpu does not exists

<!-- gh-comment-id:2162837240 --> @Talleyrand-34 commented on GitHub (Jun 12, 2024): > GPU support seems to work: > > ``` > 2024/06/11 18:11:50 routes.go:1011: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST: OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS: OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]" > ... > time=2024-06-11T18:11:54.412Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" > time=2024-06-11T18:11:54.413Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 > time=2024-06-11T18:11:54.413Z level=INFO source=types.go:71 msg="inference compute" id=0 library=rocm compute=gfx1033 driver=0.0 name=1002:163f total="1.0 GiB" available="1.0 GiB" > ``` I also get this, but if you read the characteristics of the gpu you notice that it is not the specs of the gpu. Moreover "ollama serve" fails to start due to this gpu does not exists
Author
Owner

@FairyTail2000 commented on GitHub (Jun 12, 2024):

I tried to replicate the logs in my initial report. This is no longer possible, it now just crashes with the error, core dumped.

docker run --device /dev/kfd --device /dev/dri -v /sys:/sys:ro -v ollama:/root/.ollama -p 11434:11434 -e "HSA_OVERRIDE_GFX_VERSION=gfx1030" -e "OLLAMA_DEBUG=1" --name ollama ollama/ollama:rocm

https://gist.github.com/FairyTail2000/c14a26441afc7a944b032b5da403fb7c

docker run --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 -e "HSA_OVERRIDE_GFX_VERSION=gfx1030" -e "OLLAMA_DEBUG=1" --name ollama ollama/ollama:rocm

https://gist.github.com/FairyTail2000/e4e643aefbf94fdd0d14057965797008

Those are the full debug logs including the command line used. The initialization of the gpu now fails repeatedly. It no longer falls back to cpu, when trying to load a model into the gpu. It crashes and reports the crash to the ollama client

<!-- gh-comment-id:2162887133 --> @FairyTail2000 commented on GitHub (Jun 12, 2024): I tried to replicate the logs in my initial report. This is no longer possible, it now just crashes with the error, core dumped. > docker run --device /dev/kfd --device /dev/dri -v /sys:/sys:ro -v ollama:/root/.ollama -p 11434:11434 -e "HSA_OVERRIDE_GFX_VERSION=gfx1030" -e "OLLAMA_DEBUG=1" --name ollama ollama/ollama:rocm https://gist.github.com/FairyTail2000/c14a26441afc7a944b032b5da403fb7c > docker run --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 -e "HSA_OVERRIDE_GFX_VERSION=gfx1030" -e "OLLAMA_DEBUG=1" --name ollama ollama/ollama:rocm https://gist.github.com/FairyTail2000/e4e643aefbf94fdd0d14057965797008 Those are the full debug logs including the command line used. The initialization of the gpu now fails repeatedly. It no longer falls back to cpu, when trying to load a model into the gpu. It crashes and reports the crash to the ollama client
Author
Owner

@sebastianlutter commented on GitHub (Jun 12, 2024):

time=2024-06-11T18:11:54.412Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"

Still looks like it's running of of CPU though. At least it did on my Deck after I tried to run that version just now—was checking the system resource monitor and only the CPU ramps up. Same error message.

Edit: It seems while I still don't have a /sys/module/amdgpu/version file on the Deck, I do have a /sys/module/amdgpu/srcversion file, containing D97512588BEE2F480E82473

Did not had the time yet to test. It does not crash and properly answers when I use the model via curl. I noticed this line in the logs:

time=2024-06-12T11:46:09.397Z level=INFO source=server.go:341 msg="starting llama server" cmd="/tmp/ollama3354728015/runners/cpu_avx2/ollama_llama_server --model /root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 2048 --batch-size 512 --embedding --log-disable --parallel 1 --port 38009"

The ollama server is started with runners/cpu_avx2, obviously CPU only. Here is the full log:

2024/06/12 11:45:49 routes.go:1011: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST: OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS: OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
time=2024-06-12T11:45:49.643Z level=INFO source=images.go:740 msg="total blobs: 5"
time=2024-06-12T11:45:49.643Z level=INFO source=images.go:747 msg="total unused blobs removed: 0"
time=2024-06-12T11:45:49.644Z level=INFO source=routes.go:1057 msg="Listening on [::]:11434 (version 0.1.43)"
time=2024-06-12T11:45:49.646Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3354728015/runners
time=2024-06-12T11:45:54.168Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]"
time=2024-06-12T11:45:54.183Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-12T11:45:54.184Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030
time=2024-06-12T11:45:54.184Z level=INFO source=types.go:71 msg="inference compute" id=0 library=rocm compute=gfx1033 driver=0.0 name=1002:163f total="1.0 GiB" available="1.0 GiB"
time=2024-06-12T11:46:07.899Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-12T11:46:07.900Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030
time=2024-06-12T11:46:09.391Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=0 memory.available="1.0 GiB" memory.required.full="5.0 GiB" memory.required.partial="1.2 GiB" memory.required.kv="256.0 MiB" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="164.0 MiB" memory.graph.partial="677.5 MiB"
time=2024-06-12T11:46:09.392Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=0 memory.available="1.0 GiB" memory.required.full="5.0 GiB" memory.required.partial="1.2 GiB" memory.required.kv="256.0 MiB" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="164.0 MiB" memory.graph.partial="677.5 MiB"
time=2024-06-12T11:46:09.393Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=0 memory.available="1.0 GiB" memory.required.full="5.0 GiB" memory.required.partial="1.2 GiB" memory.required.kv="256.0 MiB" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="164.0 MiB" memory.graph.partial="677.5 MiB"
time=2024-06-12T11:46:09.397Z level=INFO source=server.go:341 msg="starting llama server" cmd="/tmp/ollama3354728015/runners/cpu_avx2/ollama_llama_server --model /root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 2048 --batch-size 512 --embedding --log-disable --parallel 1 --port 38009"
time=2024-06-12T11:46:09.398Z level=INFO source=sched.go:338 msg="loaded runners" count=1
time=2024-06-12T11:46:09.398Z level=INFO source=server.go:529 msg="waiting for llama runner to start responding"
time=2024-06-12T11:46:09.398Z level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server error"
INFO [main] build info | build=1 commit="5921b8f" tid="140354377021312" timestamp=1718192769
INFO [main] system info | n_threads=4 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="140354377021312" timestamp=1718192769 total_threads=8
INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="7" port="38009" tid="140354377021312" timestamp=1718192769
llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = Meta-Llama-3-8B-Instruct
llama_model_loader: - kv   2:                          llama.block_count u32              = 32
llama_model_loader: - kv   3:                       llama.context_length u32              = 8192
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   7:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   8:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32              = 2
llama_model_loader: - kv  11:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  12:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  14:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  16:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  17:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  20:                    tokenizer.chat_template str              = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv  21:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_0:  225 tensors
llama_model_loader: - type q6_K:    1 tensors
time=2024-06-12T11:46:09.650Z level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server loading model"
llm_load_vocab: special tokens cache size = 256
llm_load_vocab: token to piece cache size = 1.5928 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: n_ctx_train      = 8192
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 14336
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 8192
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 8B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 8.03 B
llm_load_print_meta: model size       = 4.33 GiB (4.64 BPW) 
llm_load_print_meta: general.name     = Meta-Llama-3-8B-Instruct
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128009 '<|eot_id|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'
llm_load_tensors: ggml ctx size =    0.15 MiB
llm_load_tensors:        CPU buffer size =  4437.80 MiB
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:        CPU KV buffer size =   256.00 MiB
llama_new_context_with_model: KV self size  =  256.00 MiB, K (f16):  128.00 MiB, V (f16):  128.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.50 MiB
llama_new_context_with_model:        CPU compute buffer size =   258.50 MiB
llama_new_context_with_model: graph nodes  = 1030
llama_new_context_with_model: graph splits = 1
INFO [main] model loaded | tid="140354377021312" timestamp=1718192770
time=2024-06-12T11:46:11.156Z level=INFO source=server.go:572 msg="llama runner started in 1.76 seconds"
[GIN] 2024/06/12 - 11:46:20 | 200 | 12.460964943s |    192.168.2.18 | POST     "/api/generate"

It tries to load to GPU, fails, and falls back to CPU

<!-- gh-comment-id:2162913709 --> @sebastianlutter commented on GitHub (Jun 12, 2024): > > time=2024-06-11T18:11:54.412Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" > > Still looks like it's running of of CPU though. At least it did on my Deck after I tried to run that version just now—was checking the system resource monitor and only the CPU ramps up. Same error message. > > Edit: It seems while I still don't have a `/sys/module/amdgpu/version` file on the Deck, I do have a `/sys/module/amdgpu/srcversion` file, containing `D97512588BEE2F480E82473` Did not had the time yet to test. It does not crash and properly answers when I use the model via curl. I noticed this line in the logs: ``` time=2024-06-12T11:46:09.397Z level=INFO source=server.go:341 msg="starting llama server" cmd="/tmp/ollama3354728015/runners/cpu_avx2/ollama_llama_server --model /root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 2048 --batch-size 512 --embedding --log-disable --parallel 1 --port 38009" ``` The ollama server is started with `runners/cpu_avx2`, obviously CPU only. Here is the full log: ``` 2024/06/12 11:45:49 routes.go:1011: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST: OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS: OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]" time=2024-06-12T11:45:49.643Z level=INFO source=images.go:740 msg="total blobs: 5" time=2024-06-12T11:45:49.643Z level=INFO source=images.go:747 msg="total unused blobs removed: 0" time=2024-06-12T11:45:49.644Z level=INFO source=routes.go:1057 msg="Listening on [::]:11434 (version 0.1.43)" time=2024-06-12T11:45:49.646Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3354728015/runners time=2024-06-12T11:45:54.168Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60002]" time=2024-06-12T11:45:54.183Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-12T11:45:54.184Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 time=2024-06-12T11:45:54.184Z level=INFO source=types.go:71 msg="inference compute" id=0 library=rocm compute=gfx1033 driver=0.0 name=1002:163f total="1.0 GiB" available="1.0 GiB" time=2024-06-12T11:46:07.899Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-12T11:46:07.900Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 time=2024-06-12T11:46:09.391Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=0 memory.available="1.0 GiB" memory.required.full="5.0 GiB" memory.required.partial="1.2 GiB" memory.required.kv="256.0 MiB" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="164.0 MiB" memory.graph.partial="677.5 MiB" time=2024-06-12T11:46:09.392Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=0 memory.available="1.0 GiB" memory.required.full="5.0 GiB" memory.required.partial="1.2 GiB" memory.required.kv="256.0 MiB" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="164.0 MiB" memory.graph.partial="677.5 MiB" time=2024-06-12T11:46:09.393Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=0 memory.available="1.0 GiB" memory.required.full="5.0 GiB" memory.required.partial="1.2 GiB" memory.required.kv="256.0 MiB" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="164.0 MiB" memory.graph.partial="677.5 MiB" time=2024-06-12T11:46:09.397Z level=INFO source=server.go:341 msg="starting llama server" cmd="/tmp/ollama3354728015/runners/cpu_avx2/ollama_llama_server --model /root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 2048 --batch-size 512 --embedding --log-disable --parallel 1 --port 38009" time=2024-06-12T11:46:09.398Z level=INFO source=sched.go:338 msg="loaded runners" count=1 time=2024-06-12T11:46:09.398Z level=INFO source=server.go:529 msg="waiting for llama runner to start responding" time=2024-06-12T11:46:09.398Z level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server error" INFO [main] build info | build=1 commit="5921b8f" tid="140354377021312" timestamp=1718192769 INFO [main] system info | n_threads=4 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="140354377021312" timestamp=1718192769 total_threads=8 INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="7" port="38009" tid="140354377021312" timestamp=1718192769 llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = Meta-Llama-3-8B-Instruct llama_model_loader: - kv 2: llama.block_count u32 = 32 llama_model_loader: - kv 3: llama.context_length u32 = 8192 llama_model_loader: - kv 4: llama.embedding_length u32 = 4096 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336 llama_model_loader: - kv 6: llama.attention.head_count u32 = 32 llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 10: general.file_type u32 = 2 llama_model_loader: - kv 11: llama.vocab_size u32 = 128256 llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 14: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 20: tokenizer.chat_template str = {% set loop_messages = messages %}{% ... llama_model_loader: - kv 21: general.quantization_version u32 = 2 llama_model_loader: - type f32: 65 tensors llama_model_loader: - type q4_0: 225 tensors llama_model_loader: - type q6_K: 1 tensors time=2024-06-12T11:46:09.650Z level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server loading model" llm_load_vocab: special tokens cache size = 256 llm_load_vocab: token to piece cache size = 1.5928 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 128256 llm_load_print_meta: n_merges = 280147 llm_load_print_meta: n_ctx_train = 8192 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 4 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 14336 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 500000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 8192 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: model type = 8B llm_load_print_meta: model ftype = Q4_0 llm_load_print_meta: model params = 8.03 B llm_load_print_meta: model size = 4.33 GiB (4.64 BPW) llm_load_print_meta: general.name = Meta-Llama-3-8B-Instruct llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' llm_load_print_meta: EOS token = 128009 '<|eot_id|>' llm_load_print_meta: LF token = 128 'Ä' llm_load_print_meta: EOT token = 128009 '<|eot_id|>' llm_load_tensors: ggml ctx size = 0.15 MiB llm_load_tensors: CPU buffer size = 4437.80 MiB llama_new_context_with_model: n_ctx = 2048 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 500000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: CPU KV buffer size = 256.00 MiB llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB llama_new_context_with_model: CPU output buffer size = 0.50 MiB llama_new_context_with_model: CPU compute buffer size = 258.50 MiB llama_new_context_with_model: graph nodes = 1030 llama_new_context_with_model: graph splits = 1 INFO [main] model loaded | tid="140354377021312" timestamp=1718192770 time=2024-06-12T11:46:11.156Z level=INFO source=server.go:572 msg="llama runner started in 1.76 seconds" [GIN] 2024/06/12 - 11:46:20 | 200 | 12.460964943s | 192.168.2.18 | POST "/api/generate" ``` It tries to load to GPU, fails, and falls back to CPU
Author
Owner

@5310 commented on GitHub (Jun 12, 2024):

So, I ran this again: llama3:8b on the Deck with Docker (well, Podman since SteamOS now has it preinstalled) with debug on.

I wasn't paying enough attention, but I also get the line about ROCm, and it should be 1GB for the Deck by default.

time=2024-06-12T13:25:08.950Z level=INFO source=types.go:71 msg="inference compute" id=0 library=rocm compute=gfx1033 driver=0.0 name=1002:163f total="1.0 GiB" available="1.0 GiB"

Only I realized at this point my VRAM had reset back to 1GB again from the 4GB I had it set to before. Neither llama3:8b nor gemma:2b would probably run on it anyway. While I reboot to increase my VRAM allocation again, here's my run info—

System information:

$ cat /etc/*release*|grep ID
DISTRIB_ID="SteamOS"
ID=steamos
ID_LIKE=arch
VARIANT_ID=steamdeck
VERSION_ID=3.5.19
BUILD_ID=20240422.1
$ sudo lspci| grep -i gpu
[sudo] password for deck: 
04:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] VanGogh [AMD Custom GPU 0405] (rev ae)
$ rocminfo
ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Custom APU 0405                
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Custom APU 0405                
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2800                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            8                                  
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    15169984(0xe779c0) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    15169984(0xe779c0) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    15169984(0xe779c0) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1033                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Custom GPU 0405                
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      1024(0x400) KB                     
  Chip ID:                 5695(0x163f)                       
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   0                                  
  BDFID:                   1024                               
  Internal Node ID:        1                                  
  Compute Unit:            8                                  
  SIMDs per CU:            2                                  
  Shader Engines:          1                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 111                                
  SDMA engine uCode::      70                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    1048576(0x100000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    1048576(0x100000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1033         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***

Ollama log:

$ bash ollama-oci.sh 
2024/06/12 13:25:04 routes.go:1011: INFO server config env="map[OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST: OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS: OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
time=2024-06-12T13:25:04.384Z level=INFO source=images.go:740 msg="total blobs: 5"
time=2024-06-12T13:25:04.388Z level=INFO source=images.go:747 msg="total unused blobs removed: 0"
time=2024-06-12T13:25:04.388Z level=INFO source=routes.go:1057 msg="Listening on [::]:11434 (version 0.1.43)"
time=2024-06-12T13:25:04.391Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3572804540/runners
time=2024-06-12T13:25:04.391Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz
time=2024-06-12T13:25:04.391Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz
time=2024-06-12T13:25:04.391Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz
time=2024-06-12T13:25:04.391Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz
time=2024-06-12T13:25:04.391Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz
time=2024-06-12T13:25:04.391Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz
time=2024-06-12T13:25:04.391Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz
time=2024-06-12T13:25:04.391Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/deps.txt.gz
time=2024-06-12T13:25:04.391Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/ollama_llama_server.gz
time=2024-06-12T13:25:08.921Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3572804540/runners/cpu
time=2024-06-12T13:25:08.921Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3572804540/runners/cpu_avx
time=2024-06-12T13:25:08.921Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3572804540/runners/cpu_avx2
time=2024-06-12T13:25:08.921Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3572804540/runners/cuda_v11
time=2024-06-12T13:25:08.921Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3572804540/runners/rocm_v60002
time=2024-06-12T13:25:08.921Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cuda_v11 rocm_v60002 cpu]"
time=2024-06-12T13:25:08.921Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-06-12T13:25:08.921Z level=DEBUG source=sched.go:90 msg="starting llm scheduler"
time=2024-06-12T13:25:08.921Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
time=2024-06-12T13:25:08.923Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
time=2024-06-12T13:25:08.923Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-06-12T13:25:08.945Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[]
time=2024-06-12T13:25:08.945Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
time=2024-06-12T13:25:08.945Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama3572804540/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-06-12T13:25:08.947Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama3572804540/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-06-12T13:25:08.949Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama3572804540/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-06-12T13:25:08.949Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-06-12T13:25:08.949Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-12T13:25:08.949Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T13:25:08.949Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T13:25:08.949Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-06-12T13:25:08.950Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="1.0 GiB"
time=2024-06-12T13:25:08.950Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="1.0 GiB"
time=2024-06-12T13:25:08.950Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-06-12T13:25:08.950Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030
time=2024-06-12T13:25:08.950Z level=INFO source=types.go:71 msg="inference compute" id=0 library=rocm compute=gfx1033 driver=0.0 name=1002:163f total="1.0 GiB" available="1.0 GiB"
[GIN] 2024/06/12 - 13:25:08 | 200 |     546.235µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/06/12 - 13:25:08 | 200 |    2.811165ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/06/12 - 13:25:08 | 200 |     816.463µs |       127.0.0.1 | POST     "/api/show"
time=2024-06-12T13:25:08.959Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
time=2024-06-12T13:25:08.959Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
time=2024-06-12T13:25:08.959Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-06-12T13:25:08.960Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[]
time=2024-06-12T13:25:08.960Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
time=2024-06-12T13:25:08.960Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama3572804540/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-06-12T13:25:08.962Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama3572804540/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-06-12T13:25:08.963Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama3572804540/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-06-12T13:25:08.963Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-06-12T13:25:08.963Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-12T13:25:08.963Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T13:25:08.963Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T13:25:08.963Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-06-12T13:25:08.963Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="1.0 GiB"
time=2024-06-12T13:25:08.963Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="1.0 GiB"
time=2024-06-12T13:25:08.963Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-06-12T13:25:08.964Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030
time=2024-06-12T13:25:08.964Z level=DEBUG source=gguf.go:57 msg="model = &llm.gguf{containerGGUF:(*llm.containerGGUF)(0xc0003bd780), kv:llm.KV{}, tensors:[]*llm.Tensor(nil), parameters:0x0}"
time=2024-06-12T13:25:10.432Z level=DEBUG source=sched.go:153 msg="loading first model" model=/root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa
time=2024-06-12T13:25:10.432Z level=DEBUG source=memory.go:44 msg=evaluating library=rocm gpu_count=1 available="1.0 GiB"
time=2024-06-12T13:25:10.432Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=0 memory.available="1.0 GiB" memory.required.full="5.0 GiB" memory.required.partial="1.2 GiB" memory.required.kv="256.0 MiB" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="164.0 MiB" memory.graph.partial="677.5 MiB"
time=2024-06-12T13:25:10.432Z level=DEBUG source=memory.go:177 msg="insufficient VRAM to load any model layers"
time=2024-06-12T13:25:10.432Z level=DEBUG source=memory.go:44 msg=evaluating library=rocm gpu_count=1 available="1.0 GiB"
time=2024-06-12T13:25:10.433Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=0 memory.available="1.0 GiB" memory.required.full="5.0 GiB" memory.required.partial="1.2 GiB" memory.required.kv="256.0 MiB" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="164.0 MiB" memory.graph.partial="677.5 MiB"
time=2024-06-12T13:25:10.433Z level=DEBUG source=memory.go:177 msg="insufficient VRAM to load any model layers"
time=2024-06-12T13:25:10.433Z level=DEBUG source=memory.go:44 msg=evaluating library=rocm gpu_count=1 available="1.0 GiB"
time=2024-06-12T13:25:10.433Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=0 memory.available="1.0 GiB" memory.required.full="5.0 GiB" memory.required.partial="1.2 GiB" memory.required.kv="256.0 MiB" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="164.0 MiB" memory.graph.partial="677.5 MiB"
time=2024-06-12T13:25:10.433Z level=DEBUG source=memory.go:177 msg="insufficient VRAM to load any model layers"
time=2024-06-12T13:25:10.433Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-06-12T13:25:10.433Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3572804540/runners/cpu
time=2024-06-12T13:25:10.434Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3572804540/runners/cpu_avx
time=2024-06-12T13:25:10.434Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3572804540/runners/cpu_avx2
time=2024-06-12T13:25:10.434Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3572804540/runners/cuda_v11
time=2024-06-12T13:25:10.434Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3572804540/runners/rocm_v60002
time=2024-06-12T13:25:10.434Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3572804540/runners/cpu
time=2024-06-12T13:25:10.434Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3572804540/runners/cpu_avx
time=2024-06-12T13:25:10.434Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3572804540/runners/cpu_avx2
time=2024-06-12T13:25:10.434Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3572804540/runners/cuda_v11
time=2024-06-12T13:25:10.434Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3572804540/runners/rocm_v60002
time=2024-06-12T13:25:10.439Z level=INFO source=server.go:341 msg="starting llama server" cmd="/tmp/ollama3572804540/runners/cpu_avx2/ollama_llama_server --model /root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 2048 --batch-size 512 --embedding --log-disable --verbose --parallel 1 --port 44047"
time=2024-06-12T13:25:10.439Z level=DEBUG source=server.go:356 msg=subprocess environment="[PATH=/opt/rh/devtoolset-7/root/usr/bin:/opt/rocm/hcc/bin:/opt/rocm/hip/bin:/opt/rocm/bin:/opt/rocm/hcc/bin::/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HSA_OVERRIDE_GFX_VERSION=gfx1030 LD_LIBRARY_PATH=/opt/rocm/lib:/tmp/ollama3572804540/runners/cpu_avx2:/opt/rocm/lib:/usr/local/lib:/opt/rh/devtoolset-7/root HIP_VISIBLE_DEVICES=0]"
time=2024-06-12T13:25:10.440Z level=INFO source=sched.go:338 msg="loaded runners" count=1
time=2024-06-12T13:25:10.440Z level=INFO source=server.go:529 msg="waiting for llama runner to start responding"
time=2024-06-12T13:25:10.440Z level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server error"
INFO [main] build info | build=1 commit="5921b8f" tid="140359403939712" timestamp=1718198710
INFO [main] system info | n_threads=4 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="140359403939712" timestamp=1718198710 total_threads=8
INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="7" port="44047" tid="140359403939712" timestamp=1718198710
llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = Meta-Llama-3-8B-Instruct
llama_model_loader: - kv   2:                          llama.block_count u32              = 32
llama_model_loader: - kv   3:                       llama.context_length u32              = 8192
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   7:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   8:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                          general.file_type u32              = 2
llama_model_loader: - kv  11:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  12:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  14:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  16:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
time=2024-06-12T13:25:10.692Z level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: - kv  17:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  20:                    tokenizer.chat_template str              = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv  21:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_0:  225 tensors
llama_model_loader: - type q6_K:    1 tensors
llm_load_vocab: special tokens cache size = 256
llm_load_vocab: token to piece cache size = 1.5928 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: n_ctx_train      = 8192
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 14336
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 8192
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 8B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 8.03 B
llm_load_print_meta: model size       = 4.33 GiB (4.64 BPW) 
llm_load_print_meta: general.name     = Meta-Llama-3-8B-Instruct
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128009 '<|eot_id|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'
llm_load_tensors: ggml ctx size =    0.15 MiB
llm_load_tensors:        CPU buffer size =  4437.80 MiB
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 500000.0
llama_new_context_with_model: freq_scale = 1
time=2024-06-12T13:25:11.948Z level=DEBUG source=server.go:578 msg="model load progress 1.00"
llama_kv_cache_init:        CPU KV buffer size =   256.00 MiB
llama_new_context_with_model: KV self size  =  256.00 MiB, K (f16):  128.00 MiB, V (f16):  128.00 MiB
llama_new_context_with_model:        CPU  output buffer size =     0.50 MiB
llama_new_context_with_model:        CPU compute buffer size =   258.50 MiB
llama_new_context_with_model: graph nodes  = 1030
llama_new_context_with_model: graph splits = 1
time=2024-06-12T13:25:12.199Z level=DEBUG source=server.go:581 msg="model load completed, waiting for server to become available" status="llm server loading model"
DEBUG [initialize] initializing slots | n_slots=1 tid="140359403939712" timestamp=1718198712
DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=0 tid="140359403939712" timestamp=1718198712
INFO [main] model loaded | tid="140359403939712" timestamp=1718198712
DEBUG [update_slots] all slots are idle and system prompt is empty, clear the KV cache | tid="140359403939712" timestamp=1718198712
DEBUG [process_single_task] slot data | n_idle_slots=1 n_processing_slots=0 task_id=0 tid="140359403939712" timestamp=1718198712
time=2024-06-12T13:25:12.458Z level=INFO source=server.go:572 msg="llama runner started in 2.02 seconds"
time=2024-06-12T13:25:12.458Z level=DEBUG source=sched.go:351 msg="finished setting up runner" model=/root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa
time=2024-06-12T13:25:12.458Z level=DEBUG source=prompt.go:172 msg="prompt now fits in context window" required=1 window=2048
[GIN] 2024/06/12 - 13:25:12 | 200 |  3.500021851s |       127.0.0.1 | POST     "/api/chat"
time=2024-06-12T13:25:12.458Z level=DEBUG source=sched.go:355 msg="context for request finished"
time=2024-06-12T13:25:12.458Z level=DEBUG source=sched.go:237 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa duration=5m0s
time=2024-06-12T13:25:12.458Z level=DEBUG source=sched.go:255 msg="after processing request finished event" modelPath=/root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa refCount=0
time=2024-06-12T13:25:23.515Z level=DEBUG source=sched.go:446 msg="evaluating already loaded" model=/root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa
DEBUG [process_single_task] slot data | n_idle_slots=1 n_processing_slots=0 task_id=1 tid="140359403939712" timestamp=1718198723
DEBUG [process_single_task] slot data | n_idle_slots=1 n_processing_slots=0 task_id=2 tid="140359403939712" timestamp=1718198723
DEBUG [log_server_request] request | method="POST" params={} path="/tokenize" remote_addr="127.0.0.1" remote_port=34890 status=200 tid="140359355057920" timestamp=1718198723
time=2024-06-12T13:25:23.606Z level=DEBUG source=prompt.go:172 msg="prompt now fits in context window" required=13 window=2048
time=2024-06-12T13:25:23.606Z level=DEBUG source=routes.go:1305 msg="chat handler" prompt="<|start_header_id|>user<|end_header_id|>\n\nHullo<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" images=0
time=2024-06-12T13:25:23.606Z level=DEBUG source=server.go:668 msg="setting token limit to 10x num_ctx" num_ctx=2048 num_predict=20480
DEBUG [process_single_task] slot data | n_idle_slots=1 n_processing_slots=0 task_id=3 tid="140359403939712" timestamp=1718198723
DEBUG [launch_slot_with_data] slot is processing task | slot_id=0 task_id=4 tid="140359403939712" timestamp=1718198723
DEBUG [update_slots] slot progression | ga_i=0 n_past=0 n_past_se=0 n_prompt_tokens_processed=12 slot_id=0 task_id=4 tid="140359403939712" timestamp=1718198723
DEBUG [update_slots] kv cache rm [p0, end) | p0=0 slot_id=0 task_id=4 tid="140359403939712" timestamp=1718198723
DEBUG [print_timings] prompt eval time     =    1455.98 ms /    12 tokens (  121.33 ms per token,     8.24 tokens per second) | n_prompt_tokens_processed=12 n_tokens_second=8.241871454278218 slot_id=0 t_prompt_processing=1455.98 t_token=121.33166666666666 task_id=4 tid="140359403939712" timestamp=1718198731
DEBUG [print_timings] generation eval time =    6179.49 ms /    26 runs   (  237.67 ms per token,     4.21 tokens per second) | n_decoded=26 n_tokens_second=4.207464235744869 slot_id=0 t_token=237.67284615384614 t_token_generation=6179.494 task_id=4 tid="140359403939712" timestamp=1718198731
DEBUG [print_timings]           total time =    7635.47 ms | slot_id=0 t_prompt_processing=1455.98 t_token_generation=6179.494 t_total=7635.474 task_id=4 tid="140359403939712" timestamp=1718198731
DEBUG [update_slots] slot released | n_cache_tokens=38 n_ctx=2048 n_past=37 n_system_tokens=0 slot_id=0 task_id=4 tid="140359403939712" timestamp=1718198731 truncated=false
DEBUG [log_server_request] request | method="POST" params={} path="/completion" remote_addr="127.0.0.1" remote_port=34890 status=200 tid="140359355057920" timestamp=1718198731
[GIN] 2024/06/12 - 13:25:31 | 200 |  7.771807214s |       127.0.0.1 | POST     "/api/chat"
time=2024-06-12T13:25:31.286Z level=DEBUG source=sched.go:304 msg="context for request finished"
time=2024-06-12T13:25:31.286Z level=DEBUG source=sched.go:237 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa duration=5m0s
time=2024-06-12T13:25:31.286Z level=DEBUG source=sched.go:255 msg="after processing request finished event" modelPath=/root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa refCount=0

Edit: Increased the VRAM to 4GB. Ollama log reflect the change, but trying to run anything throws Could not initialize Tensile host: No devices found. Unlike before, at 1GB VRAM, it doesn't even run on the CPU. I can only run it on the CPU by setting invalid HIP_VISIBLE_DEVICES.

& podman exec -it ollama ollama run gemma:2b
Error: llama runner process has terminated: signal: aborted (core dumped) error:Could not initialize Tensile host: No devices found
bash ollama-oci.sh 
2024/06/12 15:00:56 routes.go:1011: INFO server config env="map[OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST: OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS: OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]"
time=2024-06-12T15:00:56.986Z level=INFO source=images.go:740 msg="total blobs: 10"
time=2024-06-12T15:00:56.987Z level=INFO source=images.go:747 msg="total unused blobs removed: 0"
time=2024-06-12T15:00:56.987Z level=INFO source=routes.go:1057 msg="Listening on [::]:11434 (version 0.1.43)"
time=2024-06-12T15:00:56.990Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama689615480/runners
time=2024-06-12T15:00:56.991Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz
time=2024-06-12T15:00:56.991Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz
time=2024-06-12T15:00:56.991Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz
time=2024-06-12T15:00:56.991Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz
time=2024-06-12T15:00:56.991Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz
time=2024-06-12T15:00:56.991Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz
time=2024-06-12T15:00:56.991Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz
time=2024-06-12T15:00:56.991Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/deps.txt.gz
time=2024-06-12T15:00:56.991Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/ollama_llama_server.gz
time=2024-06-12T15:01:01.992Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama689615480/runners/cpu
time=2024-06-12T15:01:01.992Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama689615480/runners/cpu_avx
time=2024-06-12T15:01:01.992Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama689615480/runners/cpu_avx2
time=2024-06-12T15:01:01.992Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama689615480/runners/cuda_v11
time=2024-06-12T15:01:01.992Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama689615480/runners/rocm_v60002
time=2024-06-12T15:01:01.992Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx2 cuda_v11 rocm_v60002 cpu cpu_avx]"
time=2024-06-12T15:01:01.992Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-06-12T15:01:01.992Z level=DEBUG source=sched.go:90 msg="starting llm scheduler"
time=2024-06-12T15:01:01.992Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
time=2024-06-12T15:01:01.992Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
time=2024-06-12T15:01:01.992Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-06-12T15:01:02.024Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[]
time=2024-06-12T15:01:02.025Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
time=2024-06-12T15:01:02.025Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-06-12T15:01:02.026Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-06-12T15:01:02.027Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-06-12T15:01:02.027Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-06-12T15:01:02.027Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-12T15:01:02.028Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:02.028Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:02.028Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-06-12T15:01:02.028Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB"
time=2024-06-12T15:01:02.028Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB"
time=2024-06-12T15:01:02.028Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-06-12T15:01:02.029Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030
time=2024-06-12T15:01:02.029Z level=INFO source=types.go:71 msg="inference compute" id=0 library=rocm compute=gfx1033 driver=0.0 name=1002:163f total="4.0 GiB" available="4.0 GiB"
[GIN] 2024/06/12 - 15:01:04 | 200 |     508.521µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/06/12 - 15:01:04 | 200 |    1.806745ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/06/12 - 15:01:04 | 200 |     576.742µs |       127.0.0.1 | POST     "/api/show"
time=2024-06-12T15:01:04.227Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
time=2024-06-12T15:01:04.227Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
time=2024-06-12T15:01:04.227Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-06-12T15:01:04.229Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[]
time=2024-06-12T15:01:04.229Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
time=2024-06-12T15:01:04.229Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-06-12T15:01:04.230Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-06-12T15:01:04.231Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-06-12T15:01:04.231Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-06-12T15:01:04.231Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-12T15:01:04.231Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:04.231Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:04.231Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-06-12T15:01:04.231Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB"
time=2024-06-12T15:01:04.231Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB"
time=2024-06-12T15:01:04.231Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-06-12T15:01:04.232Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030
time=2024-06-12T15:01:04.232Z level=DEBUG source=gguf.go:57 msg="model = &llm.gguf{containerGGUF:(*llm.containerGGUF)(0xc000494780), kv:llm.KV{}, tensors:[]*llm.Tensor(nil), parameters:0x0}"
time=2024-06-12T15:01:05.537Z level=DEBUG source=sched.go:153 msg="loading first model" model=/root/.ollama/models/blobs/sha256-c1864a5eb19305c40519da12cc543519e48a0697ecd30e15d5ac228644957d12
time=2024-06-12T15:01:05.537Z level=DEBUG source=memory.go:44 msg=evaluating library=rocm gpu_count=1 available="4.0 GiB"
time=2024-06-12T15:01:05.537Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=19 memory.available="4.0 GiB" memory.required.full="2.6 GiB" memory.required.partial="2.6 GiB" memory.required.kv="36.0 MiB" memory.weights.total="1.6 GiB" memory.weights.repeating="1.0 GiB" memory.weights.nonrepeating="531.5 MiB" memory.graph.full="504.2 MiB" memory.graph.partial="918.6 MiB"
time=2024-06-12T15:01:05.537Z level=DEBUG source=sched.go:563 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-c1864a5eb19305c40519da12cc543519e48a0697ecd30e15d5ac228644957d12 gpu=0 available=4294967296 required="2.6 GiB"
time=2024-06-12T15:01:05.537Z level=DEBUG source=memory.go:44 msg=evaluating library=rocm gpu_count=1 available="4.0 GiB"
time=2024-06-12T15:01:05.538Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=19 memory.available="4.0 GiB" memory.required.full="2.6 GiB" memory.required.partial="2.6 GiB" memory.required.kv="36.0 MiB" memory.weights.total="1.6 GiB" memory.weights.repeating="1.0 GiB" memory.weights.nonrepeating="531.5 MiB" memory.graph.full="504.2 MiB" memory.graph.partial="918.6 MiB"
time=2024-06-12T15:01:05.538Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama689615480/runners/cpu
time=2024-06-12T15:01:05.538Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama689615480/runners/cpu_avx
time=2024-06-12T15:01:05.538Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama689615480/runners/cpu_avx2
time=2024-06-12T15:01:05.538Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama689615480/runners/cuda_v11
time=2024-06-12T15:01:05.538Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama689615480/runners/rocm_v60002
time=2024-06-12T15:01:05.538Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama689615480/runners/cpu
time=2024-06-12T15:01:05.538Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama689615480/runners/cpu_avx
time=2024-06-12T15:01:05.538Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama689615480/runners/cpu_avx2
time=2024-06-12T15:01:05.538Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama689615480/runners/cuda_v11
time=2024-06-12T15:01:05.538Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama689615480/runners/rocm_v60002
time=2024-06-12T15:01:05.538Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-06-12T15:01:05.544Z level=INFO source=server.go:341 msg="starting llama server" cmd="/tmp/ollama689615480/runners/rocm_v60002/ollama_llama_server --model /root/.ollama/models/blobs/sha256-c1864a5eb19305c40519da12cc543519e48a0697ecd30e15d5ac228644957d12 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 19 --verbose --parallel 1 --port 40529"
time=2024-06-12T15:01:05.544Z level=DEBUG source=server.go:356 msg=subprocess environment="[PATH=/opt/rh/devtoolset-7/root/usr/bin:/opt/rocm/hcc/bin:/opt/rocm/hip/bin:/opt/rocm/bin:/opt/rocm/hcc/bin::/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HSA_OVERRIDE_GFX_VERSION=gfx1030 LD_LIBRARY_PATH=/opt/rocm/lib:/tmp/ollama689615480/runners/rocm_v60002:/opt/rocm/lib:/usr/local/lib:/opt/rh/devtoolset-7/root HIP_VISIBLE_DEVICES=0]"
time=2024-06-12T15:01:05.544Z level=INFO source=sched.go:338 msg="loaded runners" count=1
time=2024-06-12T15:01:05.544Z level=INFO source=server.go:529 msg="waiting for llama runner to start responding"
time=2024-06-12T15:01:05.544Z level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server error"
INFO [main] build info | build=1 commit="5921b8f" tid="139643683318848" timestamp=1718204465
INFO [main] system info | n_threads=4 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="139643683318848" timestamp=1718204465 total_threads=8
INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="7" port="40529" tid="139643683318848" timestamp=1718204465
llama_model_loader: loaded meta data with 21 key-value pairs and 164 tensors from /root/.ollama/models/blobs/sha256-c1864a5eb19305c40519da12cc543519e48a0697ecd30e15d5ac228644957d12 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = gemma
llama_model_loader: - kv   1:                               general.name str              = gemma-2b-it
llama_model_loader: - kv   2:                       gemma.context_length u32              = 8192
llama_model_loader: - kv   3:                          gemma.block_count u32              = 18
llama_model_loader: - kv   4:                     gemma.embedding_length u32              = 2048
llama_model_loader: - kv   5:                  gemma.feed_forward_length u32              = 16384
llama_model_loader: - kv   6:                 gemma.attention.head_count u32              = 8
llama_model_loader: - kv   7:              gemma.attention.head_count_kv u32              = 1
llama_model_loader: - kv   8:                 gemma.attention.key_length u32              = 256
llama_model_loader: - kv   9:               gemma.attention.value_length u32              = 256
llama_model_loader: - kv  10:     gemma.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  12:                tokenizer.ggml.bos_token_id u32              = 2
llama_model_loader: - kv  13:                tokenizer.ggml.eos_token_id u32              = 1
llama_model_loader: - kv  14:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  15:            tokenizer.ggml.unknown_token_id u32              = 3
time=2024-06-12T15:01:05.796Z level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,256128]  = ["<pad>", "<eos>", "<bos>", "<unk>", ...
llama_model_loader: - kv  17:                      tokenizer.ggml.scores arr[f32,256128]  = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,256128]  = [3, 3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  19:               general.quantization_version u32              = 2
llama_model_loader: - kv  20:                          general.file_type u32              = 2
llama_model_loader: - type  f32:   37 tensors
llama_model_loader: - type q4_0:  126 tensors
llama_model_loader: - type q8_0:    1 tensors
llm_load_vocab: special tokens cache size = 388
llm_load_vocab: token to piece cache size = 3.2028 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = gemma
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 256128
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 8192
llm_load_print_meta: n_embd           = 2048
llm_load_print_meta: n_head           = 8
llm_load_print_meta: n_head_kv        = 1
llm_load_print_meta: n_layer          = 18
llm_load_print_meta: n_rot            = 256
llm_load_print_meta: n_embd_head_k    = 256
llm_load_print_meta: n_embd_head_v    = 256
llm_load_print_meta: n_gqa            = 8
llm_load_print_meta: n_embd_k_gqa     = 256
llm_load_print_meta: n_embd_v_gqa     = 256
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 16384
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 2
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 8192
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 2B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 2.51 B
llm_load_print_meta: model size       = 1.56 GiB (5.34 BPW) 
llm_load_print_meta: general.name     = gemma-2b-it
llm_load_print_meta: BOS token        = 2 '<bos>'
llm_load_print_meta: EOS token        = 1 '<eos>'
llm_load_print_meta: UNK token        = 3 '<unk>'
llm_load_print_meta: PAD token        = 0 '<pad>'
llm_load_print_meta: LF token         = 227 '<0x0A>'
llm_load_print_meta: EOT token        = 107 '<end_of_turn>'

rocBLAS error: Could not initialize Tensile host: No devices found
time=2024-06-12T15:01:06.999Z level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server not responding"
time=2024-06-12T15:01:07.701Z level=ERROR source=sched.go:344 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped) error:Could not initialize Tensile host: No devices found"
time=2024-06-12T15:01:07.701Z level=DEBUG source=sched.go:347 msg="triggering expiration for failed load" model=/root/.ollama/models/blobs/sha256-c1864a5eb19305c40519da12cc543519e48a0697ecd30e15d5ac228644957d12
time=2024-06-12T15:01:07.701Z level=DEBUG source=sched.go:258 msg="runner expired event received" modelPath=/root/.ollama/models/blobs/sha256-c1864a5eb19305c40519da12cc543519e48a0697ecd30e15d5ac228644957d12
time=2024-06-12T15:01:07.701Z level=DEBUG source=sched.go:274 msg="got lock to unload" modelPath=/root/.ollama/models/blobs/sha256-c1864a5eb19305c40519da12cc543519e48a0697ecd30e15d5ac228644957d12
time=2024-06-12T15:01:07.701Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
time=2024-06-12T15:01:07.701Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
[GIN] 2024/06/12 - 15:01:07 | 500 |  3.474920334s |       127.0.0.1 | POST     "/api/chat"
time=2024-06-12T15:01:07.701Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-06-12T15:01:07.703Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[]
time=2024-06-12T15:01:07.703Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
time=2024-06-12T15:01:07.703Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-06-12T15:01:07.705Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-06-12T15:01:07.705Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-06-12T15:01:07.705Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-06-12T15:01:07.705Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-12T15:01:07.705Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:07.705Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:07.705Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-06-12T15:01:07.706Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB"
time=2024-06-12T15:01:07.706Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB"
time=2024-06-12T15:01:07.706Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-06-12T15:01:07.706Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030
time=2024-06-12T15:01:07.706Z level=DEBUG source=server.go:990 msg="stopping llama server"
time=2024-06-12T15:01:07.706Z level=DEBUG source=sched.go:279 msg="runner released" modelPath=/root/.ollama/models/blobs/sha256-c1864a5eb19305c40519da12cc543519e48a0697ecd30e15d5ac228644957d12
time=2024-06-12T15:01:07.956Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
time=2024-06-12T15:01:07.956Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
time=2024-06-12T15:01:07.956Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-06-12T15:01:07.957Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[]
time=2024-06-12T15:01:07.958Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
time=2024-06-12T15:01:07.958Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-06-12T15:01:07.959Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-06-12T15:01:07.959Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-06-12T15:01:07.959Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-06-12T15:01:07.959Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-12T15:01:07.959Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:07.959Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:07.959Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-06-12T15:01:07.960Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB"
time=2024-06-12T15:01:07.960Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB"
time=2024-06-12T15:01:07.960Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-06-12T15:01:07.960Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030
time=2024-06-12T15:01:08.207Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
time=2024-06-12T15:01:08.207Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
time=2024-06-12T15:01:08.207Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-06-12T15:01:08.209Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[]
time=2024-06-12T15:01:08.209Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
time=2024-06-12T15:01:08.209Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-06-12T15:01:08.210Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-06-12T15:01:08.211Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-06-12T15:01:08.211Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-06-12T15:01:08.211Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-12T15:01:08.211Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:08.211Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:08.211Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-06-12T15:01:08.211Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB"
time=2024-06-12T15:01:08.211Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB"
time=2024-06-12T15:01:08.211Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-06-12T15:01:08.211Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030
time=2024-06-12T15:01:08.457Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
time=2024-06-12T15:01:08.457Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
time=2024-06-12T15:01:08.457Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-06-12T15:01:08.458Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[]
time=2024-06-12T15:01:08.458Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
time=2024-06-12T15:01:08.458Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-06-12T15:01:08.460Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-06-12T15:01:08.460Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-06-12T15:01:08.460Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-06-12T15:01:08.460Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-12T15:01:08.460Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:08.461Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:08.461Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-06-12T15:01:08.461Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB"
time=2024-06-12T15:01:08.461Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB"
time=2024-06-12T15:01:08.461Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-06-12T15:01:08.461Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030
time=2024-06-12T15:01:08.706Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
time=2024-06-12T15:01:08.706Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
time=2024-06-12T15:01:08.706Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-06-12T15:01:08.708Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[]
time=2024-06-12T15:01:08.708Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
time=2024-06-12T15:01:08.708Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-06-12T15:01:08.710Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-06-12T15:01:08.710Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-06-12T15:01:08.710Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-06-12T15:01:08.710Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-12T15:01:08.710Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:08.710Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:08.710Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-06-12T15:01:08.711Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB"
time=2024-06-12T15:01:08.711Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB"
time=2024-06-12T15:01:08.711Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-06-12T15:01:08.711Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030
time=2024-06-12T15:01:08.957Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
time=2024-06-12T15:01:08.957Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
time=2024-06-12T15:01:08.957Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-06-12T15:01:08.959Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[]
time=2024-06-12T15:01:08.959Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
time=2024-06-12T15:01:08.959Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-06-12T15:01:08.961Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-06-12T15:01:08.961Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-06-12T15:01:08.961Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-06-12T15:01:08.961Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-12T15:01:08.961Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:08.961Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:08.961Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-06-12T15:01:08.962Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB"
time=2024-06-12T15:01:08.962Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB"
time=2024-06-12T15:01:08.962Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-06-12T15:01:08.962Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030
time=2024-06-12T15:01:09.207Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
time=2024-06-12T15:01:09.207Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
time=2024-06-12T15:01:09.207Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-06-12T15:01:09.209Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[]
time=2024-06-12T15:01:09.210Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
time=2024-06-12T15:01:09.210Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-06-12T15:01:09.211Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-06-12T15:01:09.211Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-06-12T15:01:09.211Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-06-12T15:01:09.211Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-12T15:01:09.212Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:09.212Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:09.212Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-06-12T15:01:09.212Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB"
time=2024-06-12T15:01:09.212Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB"
time=2024-06-12T15:01:09.212Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-06-12T15:01:09.212Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030
time=2024-06-12T15:01:09.456Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
time=2024-06-12T15:01:09.456Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
time=2024-06-12T15:01:09.456Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-06-12T15:01:09.458Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[]
time=2024-06-12T15:01:09.458Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
time=2024-06-12T15:01:09.458Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-06-12T15:01:09.460Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-06-12T15:01:09.461Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-06-12T15:01:09.461Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-06-12T15:01:09.461Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-12T15:01:09.461Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:09.461Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:09.461Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-06-12T15:01:09.461Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB"
time=2024-06-12T15:01:09.461Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB"
time=2024-06-12T15:01:09.461Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-06-12T15:01:09.461Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030
time=2024-06-12T15:01:09.707Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
time=2024-06-12T15:01:09.707Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
time=2024-06-12T15:01:09.707Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-06-12T15:01:09.709Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[]
time=2024-06-12T15:01:09.709Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
time=2024-06-12T15:01:09.709Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-06-12T15:01:09.710Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-06-12T15:01:09.711Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-06-12T15:01:09.711Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-06-12T15:01:09.711Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-12T15:01:09.711Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:09.711Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:09.711Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-06-12T15:01:09.711Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB"
time=2024-06-12T15:01:09.711Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB"
time=2024-06-12T15:01:09.711Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-06-12T15:01:09.712Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030
time=2024-06-12T15:01:09.957Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
time=2024-06-12T15:01:09.957Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
time=2024-06-12T15:01:09.957Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-06-12T15:01:09.959Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[]
time=2024-06-12T15:01:09.959Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
time=2024-06-12T15:01:09.959Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-06-12T15:01:09.960Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-06-12T15:01:09.961Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-06-12T15:01:09.961Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-06-12T15:01:09.961Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-12T15:01:09.961Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:09.961Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:09.961Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-06-12T15:01:09.961Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB"
time=2024-06-12T15:01:09.961Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB"
time=2024-06-12T15:01:09.962Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-06-12T15:01:09.962Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030
time=2024-06-12T15:01:10.207Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
time=2024-06-12T15:01:10.207Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
time=2024-06-12T15:01:10.207Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-06-12T15:01:10.209Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[]
time=2024-06-12T15:01:10.209Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
time=2024-06-12T15:01:10.209Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-06-12T15:01:10.210Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-06-12T15:01:10.211Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-06-12T15:01:10.211Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-06-12T15:01:10.211Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-12T15:01:10.211Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:10.211Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:10.211Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-06-12T15:01:10.211Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB"
time=2024-06-12T15:01:10.211Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB"
time=2024-06-12T15:01:10.211Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-06-12T15:01:10.211Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030
time=2024-06-12T15:01:10.457Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
time=2024-06-12T15:01:10.457Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
time=2024-06-12T15:01:10.457Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-06-12T15:01:10.458Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[]
time=2024-06-12T15:01:10.458Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
time=2024-06-12T15:01:10.458Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-06-12T15:01:10.460Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-06-12T15:01:10.460Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-06-12T15:01:10.460Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-06-12T15:01:10.461Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-12T15:01:10.461Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:10.461Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:10.461Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-06-12T15:01:10.461Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB"
time=2024-06-12T15:01:10.461Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB"
time=2024-06-12T15:01:10.461Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-06-12T15:01:10.461Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030
time=2024-06-12T15:01:10.706Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
time=2024-06-12T15:01:10.706Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
time=2024-06-12T15:01:10.706Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-06-12T15:01:10.708Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[]
time=2024-06-12T15:01:10.709Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
time=2024-06-12T15:01:10.709Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-06-12T15:01:10.710Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-06-12T15:01:10.710Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-06-12T15:01:10.710Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-06-12T15:01:10.710Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-12T15:01:10.711Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:10.711Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:10.711Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-06-12T15:01:10.711Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB"
time=2024-06-12T15:01:10.711Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB"
time=2024-06-12T15:01:10.711Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-06-12T15:01:10.711Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030
time=2024-06-12T15:01:10.956Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
time=2024-06-12T15:01:10.956Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
time=2024-06-12T15:01:10.956Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-06-12T15:01:10.958Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[]
time=2024-06-12T15:01:10.958Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
time=2024-06-12T15:01:10.958Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-06-12T15:01:10.960Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-06-12T15:01:10.960Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-06-12T15:01:10.960Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-06-12T15:01:10.961Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-12T15:01:10.961Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:10.961Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:10.961Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-06-12T15:01:10.961Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB"
time=2024-06-12T15:01:10.961Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB"
time=2024-06-12T15:01:10.961Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-06-12T15:01:10.962Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030
time=2024-06-12T15:01:11.206Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
time=2024-06-12T15:01:11.207Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
time=2024-06-12T15:01:11.207Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-06-12T15:01:11.209Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[]
time=2024-06-12T15:01:11.209Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
time=2024-06-12T15:01:11.209Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-06-12T15:01:11.210Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-06-12T15:01:11.211Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-06-12T15:01:11.211Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-06-12T15:01:11.211Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-12T15:01:11.211Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:11.211Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:11.211Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-06-12T15:01:11.212Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB"
time=2024-06-12T15:01:11.212Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB"
time=2024-06-12T15:01:11.212Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-06-12T15:01:11.212Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030
time=2024-06-12T15:01:11.457Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
time=2024-06-12T15:01:11.457Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
time=2024-06-12T15:01:11.457Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-06-12T15:01:11.459Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[]
time=2024-06-12T15:01:11.459Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
time=2024-06-12T15:01:11.459Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-06-12T15:01:11.461Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-06-12T15:01:11.461Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-06-12T15:01:11.461Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-06-12T15:01:11.461Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-12T15:01:11.462Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:11.462Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:11.462Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-06-12T15:01:11.462Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB"
time=2024-06-12T15:01:11.462Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB"
time=2024-06-12T15:01:11.462Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-06-12T15:01:11.463Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030
time=2024-06-12T15:01:11.706Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
time=2024-06-12T15:01:11.706Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
time=2024-06-12T15:01:11.706Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-06-12T15:01:11.709Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[]
time=2024-06-12T15:01:11.709Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
time=2024-06-12T15:01:11.709Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-06-12T15:01:11.710Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-06-12T15:01:11.711Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-06-12T15:01:11.711Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-06-12T15:01:11.711Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-12T15:01:11.711Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:11.711Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:11.711Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-06-12T15:01:11.712Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB"
time=2024-06-12T15:01:11.712Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB"
time=2024-06-12T15:01:11.712Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-06-12T15:01:11.712Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030
time=2024-06-12T15:01:11.957Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
time=2024-06-12T15:01:11.957Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
time=2024-06-12T15:01:11.957Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-06-12T15:01:11.959Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[]
time=2024-06-12T15:01:11.959Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
time=2024-06-12T15:01:11.959Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-06-12T15:01:11.961Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-06-12T15:01:11.962Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-06-12T15:01:11.962Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-06-12T15:01:11.962Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-12T15:01:11.962Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:11.962Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:11.962Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-06-12T15:01:11.963Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB"
time=2024-06-12T15:01:11.963Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB"
time=2024-06-12T15:01:11.963Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-06-12T15:01:11.963Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030
time=2024-06-12T15:01:12.207Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
time=2024-06-12T15:01:12.207Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
time=2024-06-12T15:01:12.207Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-06-12T15:01:12.209Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[]
time=2024-06-12T15:01:12.209Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
time=2024-06-12T15:01:12.209Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-06-12T15:01:12.210Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-06-12T15:01:12.211Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-06-12T15:01:12.211Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-06-12T15:01:12.211Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-12T15:01:12.211Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:12.211Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:12.211Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-06-12T15:01:12.212Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB"
time=2024-06-12T15:01:12.212Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB"
time=2024-06-12T15:01:12.212Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-06-12T15:01:12.212Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030
time=2024-06-12T15:01:12.457Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
time=2024-06-12T15:01:12.457Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
time=2024-06-12T15:01:12.457Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-06-12T15:01:12.459Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[]
time=2024-06-12T15:01:12.459Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
time=2024-06-12T15:01:12.459Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-06-12T15:01:12.461Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-06-12T15:01:12.461Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-06-12T15:01:12.462Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-06-12T15:01:12.462Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-12T15:01:12.462Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:12.462Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:12.462Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-06-12T15:01:12.462Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB"
time=2024-06-12T15:01:12.462Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB"
time=2024-06-12T15:01:12.462Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-06-12T15:01:12.463Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030
time=2024-06-12T15:01:12.707Z level=WARN source=sched.go:511 msg="gpu VRAM usage didn't recover within timeout" seconds=5.005717187
time=2024-06-12T15:01:12.707Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
time=2024-06-12T15:01:12.707Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
time=2024-06-12T15:01:12.707Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-06-12T15:01:12.707Z level=DEBUG source=sched.go:283 msg="sending an unloaded event" modelPath=/root/.ollama/models/blobs/sha256-c1864a5eb19305c40519da12cc543519e48a0697ecd30e15d5ac228644957d12
time=2024-06-12T15:01:12.707Z level=DEBUG source=sched.go:206 msg="ignoring unload event with no pending requests"
time=2024-06-12T15:01:12.709Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[]
time=2024-06-12T15:01:12.709Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
time=2024-06-12T15:01:12.709Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-06-12T15:01:12.711Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-06-12T15:01:12.711Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-06-12T15:01:12.711Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-06-12T15:01:12.712Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-12T15:01:12.712Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:12.712Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:12.712Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-06-12T15:01:12.712Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB"
time=2024-06-12T15:01:12.712Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB"
time=2024-06-12T15:01:12.712Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-06-12T15:01:12.713Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030
time=2024-06-12T15:01:12.957Z level=WARN source=sched.go:511 msg="gpu VRAM usage didn't recover within timeout" seconds=5.255826171
time=2024-06-12T15:01:12.957Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs"
time=2024-06-12T15:01:12.957Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so*
time=2024-06-12T15:01:12.957Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-06-12T15:01:12.959Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[]
time=2024-06-12T15:01:12.959Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so*
time=2024-06-12T15:01:12.959Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-06-12T15:01:12.960Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0]
cudaSetDevice err: 35
time=2024-06-12T15:01:12.961Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
time=2024-06-12T15:01:12.961Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-06-12T15:01:12.961Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-06-12T15:01:12.961Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:12.961Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-06-12T15:01:12.961Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-06-12T15:01:12.961Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB"
time=2024-06-12T15:01:12.961Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB"
time=2024-06-12T15:01:12.961Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-06-12T15:01:12.962Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030
time=2024-06-12T15:01:13.207Z level=WARN source=sched.go:511 msg="gpu VRAM usage didn't recover within timeout" seconds=5.505716674
<!-- gh-comment-id:2163028123 --> @5310 commented on GitHub (Jun 12, 2024): So, I ran this again: `llama3:8b` on the Deck with Docker (well, Podman since SteamOS now has it preinstalled) with debug on. I wasn't paying enough attention, but I also get the line about ROCm, and it _should_ be 1GB for the Deck by default. ``` time=2024-06-12T13:25:08.950Z level=INFO source=types.go:71 msg="inference compute" id=0 library=rocm compute=gfx1033 driver=0.0 name=1002:163f total="1.0 GiB" available="1.0 GiB" ``` Only I realized at this point my VRAM had reset back to 1GB again from the 4GB I had it set to before. Neither `llama3:8b` nor `gemma:2b` would probably run on it anyway. While I reboot to increase my VRAM allocation again, here's my run info— System information: ``` $ cat /etc/*release*|grep ID DISTRIB_ID="SteamOS" ID=steamos ID_LIKE=arch VARIANT_ID=steamdeck VERSION_ID=3.5.19 BUILD_ID=20240422.1 ``` ``` $ sudo lspci| grep -i gpu [sudo] password for deck: 04:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] VanGogh [AMD Custom GPU 0405] (rev ae) ``` ``` $ rocminfo ROCk module is loaded ===================== HSA System Attributes ===================== Runtime Version: 1.1 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE Mwaitx: DISABLED DMAbuf Support: YES ========== HSA Agents ========== ******* Agent 1 ******* Name: AMD Custom APU 0405 Uuid: CPU-XX Marketing Name: AMD Custom APU 0405 Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 2800 BDFID: 0 Internal Node ID: 0 Compute Unit: 8 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 15169984(0xe779c0) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 15169984(0xe779c0) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 15169984(0xe779c0) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: ******* Agent 2 ******* Name: gfx1033 Uuid: GPU-XX Marketing Name: AMD Custom GPU 0405 Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 1024(0x400) KB Chip ID: 5695(0x163f) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 0 BDFID: 1024 Internal Node ID: 1 Compute Unit: 8 SIMDs per CU: 2 Shader Engines: 1 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Coherent Host Access: FALSE Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 32(0x20) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 111 SDMA engine uCode:: 70 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 1048576(0x100000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 1048576(0x100000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx1033 Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 *** Done *** ``` Ollama log: ``` $ bash ollama-oci.sh 2024/06/12 13:25:04 routes.go:1011: INFO server config env="map[OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST: OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS: OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]" time=2024-06-12T13:25:04.384Z level=INFO source=images.go:740 msg="total blobs: 5" time=2024-06-12T13:25:04.388Z level=INFO source=images.go:747 msg="total unused blobs removed: 0" time=2024-06-12T13:25:04.388Z level=INFO source=routes.go:1057 msg="Listening on [::]:11434 (version 0.1.43)" time=2024-06-12T13:25:04.391Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3572804540/runners time=2024-06-12T13:25:04.391Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz time=2024-06-12T13:25:04.391Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz time=2024-06-12T13:25:04.391Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz time=2024-06-12T13:25:04.391Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz time=2024-06-12T13:25:04.391Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz time=2024-06-12T13:25:04.391Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz time=2024-06-12T13:25:04.391Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz time=2024-06-12T13:25:04.391Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/deps.txt.gz time=2024-06-12T13:25:04.391Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/ollama_llama_server.gz time=2024-06-12T13:25:08.921Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3572804540/runners/cpu time=2024-06-12T13:25:08.921Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3572804540/runners/cpu_avx time=2024-06-12T13:25:08.921Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3572804540/runners/cpu_avx2 time=2024-06-12T13:25:08.921Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3572804540/runners/cuda_v11 time=2024-06-12T13:25:08.921Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3572804540/runners/rocm_v60002 time=2024-06-12T13:25:08.921Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cuda_v11 rocm_v60002 cpu]" time=2024-06-12T13:25:08.921Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" time=2024-06-12T13:25:08.921Z level=DEBUG source=sched.go:90 msg="starting llm scheduler" time=2024-06-12T13:25:08.921Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" time=2024-06-12T13:25:08.923Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* time=2024-06-12T13:25:08.923Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-06-12T13:25:08.945Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[] time=2024-06-12T13:25:08.945Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* time=2024-06-12T13:25:08.945Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama3572804540/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-06-12T13:25:08.947Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama3572804540/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 35 time=2024-06-12T13:25:08.949Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama3572804540/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2024-06-12T13:25:08.949Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" time=2024-06-12T13:25:08.949Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-12T13:25:08.949Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T13:25:08.949Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T13:25:08.949Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-06-12T13:25:08.950Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="1.0 GiB" time=2024-06-12T13:25:08.950Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="1.0 GiB" time=2024-06-12T13:25:08.950Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-06-12T13:25:08.950Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 time=2024-06-12T13:25:08.950Z level=INFO source=types.go:71 msg="inference compute" id=0 library=rocm compute=gfx1033 driver=0.0 name=1002:163f total="1.0 GiB" available="1.0 GiB" [GIN] 2024/06/12 - 13:25:08 | 200 | 546.235µs | 127.0.0.1 | HEAD "/" [GIN] 2024/06/12 - 13:25:08 | 200 | 2.811165ms | 127.0.0.1 | POST "/api/show" [GIN] 2024/06/12 - 13:25:08 | 200 | 816.463µs | 127.0.0.1 | POST "/api/show" time=2024-06-12T13:25:08.959Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" time=2024-06-12T13:25:08.959Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* time=2024-06-12T13:25:08.959Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-06-12T13:25:08.960Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[] time=2024-06-12T13:25:08.960Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* time=2024-06-12T13:25:08.960Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama3572804540/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-06-12T13:25:08.962Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama3572804540/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 35 time=2024-06-12T13:25:08.963Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama3572804540/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2024-06-12T13:25:08.963Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" time=2024-06-12T13:25:08.963Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-12T13:25:08.963Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T13:25:08.963Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T13:25:08.963Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-06-12T13:25:08.963Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="1.0 GiB" time=2024-06-12T13:25:08.963Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="1.0 GiB" time=2024-06-12T13:25:08.963Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-06-12T13:25:08.964Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 time=2024-06-12T13:25:08.964Z level=DEBUG source=gguf.go:57 msg="model = &llm.gguf{containerGGUF:(*llm.containerGGUF)(0xc0003bd780), kv:llm.KV{}, tensors:[]*llm.Tensor(nil), parameters:0x0}" time=2024-06-12T13:25:10.432Z level=DEBUG source=sched.go:153 msg="loading first model" model=/root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa time=2024-06-12T13:25:10.432Z level=DEBUG source=memory.go:44 msg=evaluating library=rocm gpu_count=1 available="1.0 GiB" time=2024-06-12T13:25:10.432Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=0 memory.available="1.0 GiB" memory.required.full="5.0 GiB" memory.required.partial="1.2 GiB" memory.required.kv="256.0 MiB" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="164.0 MiB" memory.graph.partial="677.5 MiB" time=2024-06-12T13:25:10.432Z level=DEBUG source=memory.go:177 msg="insufficient VRAM to load any model layers" time=2024-06-12T13:25:10.432Z level=DEBUG source=memory.go:44 msg=evaluating library=rocm gpu_count=1 available="1.0 GiB" time=2024-06-12T13:25:10.433Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=0 memory.available="1.0 GiB" memory.required.full="5.0 GiB" memory.required.partial="1.2 GiB" memory.required.kv="256.0 MiB" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="164.0 MiB" memory.graph.partial="677.5 MiB" time=2024-06-12T13:25:10.433Z level=DEBUG source=memory.go:177 msg="insufficient VRAM to load any model layers" time=2024-06-12T13:25:10.433Z level=DEBUG source=memory.go:44 msg=evaluating library=rocm gpu_count=1 available="1.0 GiB" time=2024-06-12T13:25:10.433Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=0 memory.available="1.0 GiB" memory.required.full="5.0 GiB" memory.required.partial="1.2 GiB" memory.required.kv="256.0 MiB" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="164.0 MiB" memory.graph.partial="677.5 MiB" time=2024-06-12T13:25:10.433Z level=DEBUG source=memory.go:177 msg="insufficient VRAM to load any model layers" time=2024-06-12T13:25:10.433Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" time=2024-06-12T13:25:10.433Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3572804540/runners/cpu time=2024-06-12T13:25:10.434Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3572804540/runners/cpu_avx time=2024-06-12T13:25:10.434Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3572804540/runners/cpu_avx2 time=2024-06-12T13:25:10.434Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3572804540/runners/cuda_v11 time=2024-06-12T13:25:10.434Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3572804540/runners/rocm_v60002 time=2024-06-12T13:25:10.434Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3572804540/runners/cpu time=2024-06-12T13:25:10.434Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3572804540/runners/cpu_avx time=2024-06-12T13:25:10.434Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3572804540/runners/cpu_avx2 time=2024-06-12T13:25:10.434Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3572804540/runners/cuda_v11 time=2024-06-12T13:25:10.434Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3572804540/runners/rocm_v60002 time=2024-06-12T13:25:10.439Z level=INFO source=server.go:341 msg="starting llama server" cmd="/tmp/ollama3572804540/runners/cpu_avx2/ollama_llama_server --model /root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 2048 --batch-size 512 --embedding --log-disable --verbose --parallel 1 --port 44047" time=2024-06-12T13:25:10.439Z level=DEBUG source=server.go:356 msg=subprocess environment="[PATH=/opt/rh/devtoolset-7/root/usr/bin:/opt/rocm/hcc/bin:/opt/rocm/hip/bin:/opt/rocm/bin:/opt/rocm/hcc/bin::/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HSA_OVERRIDE_GFX_VERSION=gfx1030 LD_LIBRARY_PATH=/opt/rocm/lib:/tmp/ollama3572804540/runners/cpu_avx2:/opt/rocm/lib:/usr/local/lib:/opt/rh/devtoolset-7/root HIP_VISIBLE_DEVICES=0]" time=2024-06-12T13:25:10.440Z level=INFO source=sched.go:338 msg="loaded runners" count=1 time=2024-06-12T13:25:10.440Z level=INFO source=server.go:529 msg="waiting for llama runner to start responding" time=2024-06-12T13:25:10.440Z level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server error" INFO [main] build info | build=1 commit="5921b8f" tid="140359403939712" timestamp=1718198710 INFO [main] system info | n_threads=4 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="140359403939712" timestamp=1718198710 total_threads=8 INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="7" port="44047" tid="140359403939712" timestamp=1718198710 llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = Meta-Llama-3-8B-Instruct llama_model_loader: - kv 2: llama.block_count u32 = 32 llama_model_loader: - kv 3: llama.context_length u32 = 8192 llama_model_loader: - kv 4: llama.embedding_length u32 = 4096 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336 llama_model_loader: - kv 6: llama.attention.head_count u32 = 32 llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 10: general.file_type u32 = 2 llama_model_loader: - kv 11: llama.vocab_size u32 = 128256 llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 14: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... time=2024-06-12T13:25:10.692Z level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 20: tokenizer.chat_template str = {% set loop_messages = messages %}{% ... llama_model_loader: - kv 21: general.quantization_version u32 = 2 llama_model_loader: - type f32: 65 tensors llama_model_loader: - type q4_0: 225 tensors llama_model_loader: - type q6_K: 1 tensors llm_load_vocab: special tokens cache size = 256 llm_load_vocab: token to piece cache size = 1.5928 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 128256 llm_load_print_meta: n_merges = 280147 llm_load_print_meta: n_ctx_train = 8192 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 4 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 14336 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 500000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 8192 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: model type = 8B llm_load_print_meta: model ftype = Q4_0 llm_load_print_meta: model params = 8.03 B llm_load_print_meta: model size = 4.33 GiB (4.64 BPW) llm_load_print_meta: general.name = Meta-Llama-3-8B-Instruct llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' llm_load_print_meta: EOS token = 128009 '<|eot_id|>' llm_load_print_meta: LF token = 128 'Ä' llm_load_print_meta: EOT token = 128009 '<|eot_id|>' llm_load_tensors: ggml ctx size = 0.15 MiB llm_load_tensors: CPU buffer size = 4437.80 MiB llama_new_context_with_model: n_ctx = 2048 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 500000.0 llama_new_context_with_model: freq_scale = 1 time=2024-06-12T13:25:11.948Z level=DEBUG source=server.go:578 msg="model load progress 1.00" llama_kv_cache_init: CPU KV buffer size = 256.00 MiB llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB llama_new_context_with_model: CPU output buffer size = 0.50 MiB llama_new_context_with_model: CPU compute buffer size = 258.50 MiB llama_new_context_with_model: graph nodes = 1030 llama_new_context_with_model: graph splits = 1 time=2024-06-12T13:25:12.199Z level=DEBUG source=server.go:581 msg="model load completed, waiting for server to become available" status="llm server loading model" DEBUG [initialize] initializing slots | n_slots=1 tid="140359403939712" timestamp=1718198712 DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=0 tid="140359403939712" timestamp=1718198712 INFO [main] model loaded | tid="140359403939712" timestamp=1718198712 DEBUG [update_slots] all slots are idle and system prompt is empty, clear the KV cache | tid="140359403939712" timestamp=1718198712 DEBUG [process_single_task] slot data | n_idle_slots=1 n_processing_slots=0 task_id=0 tid="140359403939712" timestamp=1718198712 time=2024-06-12T13:25:12.458Z level=INFO source=server.go:572 msg="llama runner started in 2.02 seconds" time=2024-06-12T13:25:12.458Z level=DEBUG source=sched.go:351 msg="finished setting up runner" model=/root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa time=2024-06-12T13:25:12.458Z level=DEBUG source=prompt.go:172 msg="prompt now fits in context window" required=1 window=2048 [GIN] 2024/06/12 - 13:25:12 | 200 | 3.500021851s | 127.0.0.1 | POST "/api/chat" time=2024-06-12T13:25:12.458Z level=DEBUG source=sched.go:355 msg="context for request finished" time=2024-06-12T13:25:12.458Z level=DEBUG source=sched.go:237 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa duration=5m0s time=2024-06-12T13:25:12.458Z level=DEBUG source=sched.go:255 msg="after processing request finished event" modelPath=/root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa refCount=0 time=2024-06-12T13:25:23.515Z level=DEBUG source=sched.go:446 msg="evaluating already loaded" model=/root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa DEBUG [process_single_task] slot data | n_idle_slots=1 n_processing_slots=0 task_id=1 tid="140359403939712" timestamp=1718198723 DEBUG [process_single_task] slot data | n_idle_slots=1 n_processing_slots=0 task_id=2 tid="140359403939712" timestamp=1718198723 DEBUG [log_server_request] request | method="POST" params={} path="/tokenize" remote_addr="127.0.0.1" remote_port=34890 status=200 tid="140359355057920" timestamp=1718198723 time=2024-06-12T13:25:23.606Z level=DEBUG source=prompt.go:172 msg="prompt now fits in context window" required=13 window=2048 time=2024-06-12T13:25:23.606Z level=DEBUG source=routes.go:1305 msg="chat handler" prompt="<|start_header_id|>user<|end_header_id|>\n\nHullo<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" images=0 time=2024-06-12T13:25:23.606Z level=DEBUG source=server.go:668 msg="setting token limit to 10x num_ctx" num_ctx=2048 num_predict=20480 DEBUG [process_single_task] slot data | n_idle_slots=1 n_processing_slots=0 task_id=3 tid="140359403939712" timestamp=1718198723 DEBUG [launch_slot_with_data] slot is processing task | slot_id=0 task_id=4 tid="140359403939712" timestamp=1718198723 DEBUG [update_slots] slot progression | ga_i=0 n_past=0 n_past_se=0 n_prompt_tokens_processed=12 slot_id=0 task_id=4 tid="140359403939712" timestamp=1718198723 DEBUG [update_slots] kv cache rm [p0, end) | p0=0 slot_id=0 task_id=4 tid="140359403939712" timestamp=1718198723 DEBUG [print_timings] prompt eval time = 1455.98 ms / 12 tokens ( 121.33 ms per token, 8.24 tokens per second) | n_prompt_tokens_processed=12 n_tokens_second=8.241871454278218 slot_id=0 t_prompt_processing=1455.98 t_token=121.33166666666666 task_id=4 tid="140359403939712" timestamp=1718198731 DEBUG [print_timings] generation eval time = 6179.49 ms / 26 runs ( 237.67 ms per token, 4.21 tokens per second) | n_decoded=26 n_tokens_second=4.207464235744869 slot_id=0 t_token=237.67284615384614 t_token_generation=6179.494 task_id=4 tid="140359403939712" timestamp=1718198731 DEBUG [print_timings] total time = 7635.47 ms | slot_id=0 t_prompt_processing=1455.98 t_token_generation=6179.494 t_total=7635.474 task_id=4 tid="140359403939712" timestamp=1718198731 DEBUG [update_slots] slot released | n_cache_tokens=38 n_ctx=2048 n_past=37 n_system_tokens=0 slot_id=0 task_id=4 tid="140359403939712" timestamp=1718198731 truncated=false DEBUG [log_server_request] request | method="POST" params={} path="/completion" remote_addr="127.0.0.1" remote_port=34890 status=200 tid="140359355057920" timestamp=1718198731 [GIN] 2024/06/12 - 13:25:31 | 200 | 7.771807214s | 127.0.0.1 | POST "/api/chat" time=2024-06-12T13:25:31.286Z level=DEBUG source=sched.go:304 msg="context for request finished" time=2024-06-12T13:25:31.286Z level=DEBUG source=sched.go:237 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa duration=5m0s time=2024-06-12T13:25:31.286Z level=DEBUG source=sched.go:255 msg="after processing request finished event" modelPath=/root/.ollama/models/blobs/sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa refCount=0 ``` Edit: Increased the VRAM to 4GB. Ollama log reflect the change, but trying to run anything throws `Could not initialize Tensile host: No devices found`. Unlike before, at 1GB VRAM, it doesn't even run on the CPU. I can only run it on the CPU by setting invalid `HIP_VISIBLE_DEVICES`. ``` & podman exec -it ollama ollama run gemma:2b Error: llama runner process has terminated: signal: aborted (core dumped) error:Could not initialize Tensile host: No devices found ``` ``` bash ollama-oci.sh 2024/06/12 15:00:56 routes.go:1011: INFO server config env="map[OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST: OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS: OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_TMPDIR:]" time=2024-06-12T15:00:56.986Z level=INFO source=images.go:740 msg="total blobs: 10" time=2024-06-12T15:00:56.987Z level=INFO source=images.go:747 msg="total unused blobs removed: 0" time=2024-06-12T15:00:56.987Z level=INFO source=routes.go:1057 msg="Listening on [::]:11434 (version 0.1.43)" time=2024-06-12T15:00:56.990Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama689615480/runners time=2024-06-12T15:00:56.991Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz time=2024-06-12T15:00:56.991Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz time=2024-06-12T15:00:56.991Z level=DEBUG source=payload.go:180 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz time=2024-06-12T15:00:56.991Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz time=2024-06-12T15:00:56.991Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz time=2024-06-12T15:00:56.991Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz time=2024-06-12T15:00:56.991Z level=DEBUG source=payload.go:180 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz time=2024-06-12T15:00:56.991Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/deps.txt.gz time=2024-06-12T15:00:56.991Z level=DEBUG source=payload.go:180 msg=extracting variant=rocm_v60002 file=build/linux/x86_64/rocm_v60002/bin/ollama_llama_server.gz time=2024-06-12T15:01:01.992Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama689615480/runners/cpu time=2024-06-12T15:01:01.992Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama689615480/runners/cpu_avx time=2024-06-12T15:01:01.992Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama689615480/runners/cpu_avx2 time=2024-06-12T15:01:01.992Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama689615480/runners/cuda_v11 time=2024-06-12T15:01:01.992Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama689615480/runners/rocm_v60002 time=2024-06-12T15:01:01.992Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx2 cuda_v11 rocm_v60002 cpu cpu_avx]" time=2024-06-12T15:01:01.992Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" time=2024-06-12T15:01:01.992Z level=DEBUG source=sched.go:90 msg="starting llm scheduler" time=2024-06-12T15:01:01.992Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" time=2024-06-12T15:01:01.992Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* time=2024-06-12T15:01:01.992Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-06-12T15:01:02.024Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[] time=2024-06-12T15:01:02.025Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* time=2024-06-12T15:01:02.025Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-06-12T15:01:02.026Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 35 time=2024-06-12T15:01:02.027Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2024-06-12T15:01:02.027Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" time=2024-06-12T15:01:02.027Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-12T15:01:02.028Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:02.028Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:02.028Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-06-12T15:01:02.028Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB" time=2024-06-12T15:01:02.028Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB" time=2024-06-12T15:01:02.028Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-06-12T15:01:02.029Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 time=2024-06-12T15:01:02.029Z level=INFO source=types.go:71 msg="inference compute" id=0 library=rocm compute=gfx1033 driver=0.0 name=1002:163f total="4.0 GiB" available="4.0 GiB" [GIN] 2024/06/12 - 15:01:04 | 200 | 508.521µs | 127.0.0.1 | HEAD "/" [GIN] 2024/06/12 - 15:01:04 | 200 | 1.806745ms | 127.0.0.1 | POST "/api/show" [GIN] 2024/06/12 - 15:01:04 | 200 | 576.742µs | 127.0.0.1 | POST "/api/show" time=2024-06-12T15:01:04.227Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" time=2024-06-12T15:01:04.227Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* time=2024-06-12T15:01:04.227Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-06-12T15:01:04.229Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[] time=2024-06-12T15:01:04.229Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* time=2024-06-12T15:01:04.229Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-06-12T15:01:04.230Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 35 time=2024-06-12T15:01:04.231Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2024-06-12T15:01:04.231Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" time=2024-06-12T15:01:04.231Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-12T15:01:04.231Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:04.231Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:04.231Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-06-12T15:01:04.231Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB" time=2024-06-12T15:01:04.231Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB" time=2024-06-12T15:01:04.231Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-06-12T15:01:04.232Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 time=2024-06-12T15:01:04.232Z level=DEBUG source=gguf.go:57 msg="model = &llm.gguf{containerGGUF:(*llm.containerGGUF)(0xc000494780), kv:llm.KV{}, tensors:[]*llm.Tensor(nil), parameters:0x0}" time=2024-06-12T15:01:05.537Z level=DEBUG source=sched.go:153 msg="loading first model" model=/root/.ollama/models/blobs/sha256-c1864a5eb19305c40519da12cc543519e48a0697ecd30e15d5ac228644957d12 time=2024-06-12T15:01:05.537Z level=DEBUG source=memory.go:44 msg=evaluating library=rocm gpu_count=1 available="4.0 GiB" time=2024-06-12T15:01:05.537Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=19 memory.available="4.0 GiB" memory.required.full="2.6 GiB" memory.required.partial="2.6 GiB" memory.required.kv="36.0 MiB" memory.weights.total="1.6 GiB" memory.weights.repeating="1.0 GiB" memory.weights.nonrepeating="531.5 MiB" memory.graph.full="504.2 MiB" memory.graph.partial="918.6 MiB" time=2024-06-12T15:01:05.537Z level=DEBUG source=sched.go:563 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-c1864a5eb19305c40519da12cc543519e48a0697ecd30e15d5ac228644957d12 gpu=0 available=4294967296 required="2.6 GiB" time=2024-06-12T15:01:05.537Z level=DEBUG source=memory.go:44 msg=evaluating library=rocm gpu_count=1 available="4.0 GiB" time=2024-06-12T15:01:05.538Z level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=19 memory.available="4.0 GiB" memory.required.full="2.6 GiB" memory.required.partial="2.6 GiB" memory.required.kv="36.0 MiB" memory.weights.total="1.6 GiB" memory.weights.repeating="1.0 GiB" memory.weights.nonrepeating="531.5 MiB" memory.graph.full="504.2 MiB" memory.graph.partial="918.6 MiB" time=2024-06-12T15:01:05.538Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama689615480/runners/cpu time=2024-06-12T15:01:05.538Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama689615480/runners/cpu_avx time=2024-06-12T15:01:05.538Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama689615480/runners/cpu_avx2 time=2024-06-12T15:01:05.538Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama689615480/runners/cuda_v11 time=2024-06-12T15:01:05.538Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama689615480/runners/rocm_v60002 time=2024-06-12T15:01:05.538Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama689615480/runners/cpu time=2024-06-12T15:01:05.538Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama689615480/runners/cpu_avx time=2024-06-12T15:01:05.538Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama689615480/runners/cpu_avx2 time=2024-06-12T15:01:05.538Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama689615480/runners/cuda_v11 time=2024-06-12T15:01:05.538Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama689615480/runners/rocm_v60002 time=2024-06-12T15:01:05.538Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" time=2024-06-12T15:01:05.544Z level=INFO source=server.go:341 msg="starting llama server" cmd="/tmp/ollama689615480/runners/rocm_v60002/ollama_llama_server --model /root/.ollama/models/blobs/sha256-c1864a5eb19305c40519da12cc543519e48a0697ecd30e15d5ac228644957d12 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 19 --verbose --parallel 1 --port 40529" time=2024-06-12T15:01:05.544Z level=DEBUG source=server.go:356 msg=subprocess environment="[PATH=/opt/rh/devtoolset-7/root/usr/bin:/opt/rocm/hcc/bin:/opt/rocm/hip/bin:/opt/rocm/bin:/opt/rocm/hcc/bin::/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin HSA_OVERRIDE_GFX_VERSION=gfx1030 LD_LIBRARY_PATH=/opt/rocm/lib:/tmp/ollama689615480/runners/rocm_v60002:/opt/rocm/lib:/usr/local/lib:/opt/rh/devtoolset-7/root HIP_VISIBLE_DEVICES=0]" time=2024-06-12T15:01:05.544Z level=INFO source=sched.go:338 msg="loaded runners" count=1 time=2024-06-12T15:01:05.544Z level=INFO source=server.go:529 msg="waiting for llama runner to start responding" time=2024-06-12T15:01:05.544Z level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server error" INFO [main] build info | build=1 commit="5921b8f" tid="139643683318848" timestamp=1718204465 INFO [main] system info | n_threads=4 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="139643683318848" timestamp=1718204465 total_threads=8 INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="7" port="40529" tid="139643683318848" timestamp=1718204465 llama_model_loader: loaded meta data with 21 key-value pairs and 164 tensors from /root/.ollama/models/blobs/sha256-c1864a5eb19305c40519da12cc543519e48a0697ecd30e15d5ac228644957d12 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = gemma llama_model_loader: - kv 1: general.name str = gemma-2b-it llama_model_loader: - kv 2: gemma.context_length u32 = 8192 llama_model_loader: - kv 3: gemma.block_count u32 = 18 llama_model_loader: - kv 4: gemma.embedding_length u32 = 2048 llama_model_loader: - kv 5: gemma.feed_forward_length u32 = 16384 llama_model_loader: - kv 6: gemma.attention.head_count u32 = 8 llama_model_loader: - kv 7: gemma.attention.head_count_kv u32 = 1 llama_model_loader: - kv 8: gemma.attention.key_length u32 = 256 llama_model_loader: - kv 9: gemma.attention.value_length u32 = 256 llama_model_loader: - kv 10: gemma.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 11: tokenizer.ggml.model str = llama llama_model_loader: - kv 12: tokenizer.ggml.bos_token_id u32 = 2 llama_model_loader: - kv 13: tokenizer.ggml.eos_token_id u32 = 1 llama_model_loader: - kv 14: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 15: tokenizer.ggml.unknown_token_id u32 = 3 time=2024-06-12T15:01:05.796Z level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: - kv 16: tokenizer.ggml.tokens arr[str,256128] = ["<pad>", "<eos>", "<bos>", "<unk>", ... llama_model_loader: - kv 17: tokenizer.ggml.scores arr[f32,256128] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,256128] = [3, 3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 19: general.quantization_version u32 = 2 llama_model_loader: - kv 20: general.file_type u32 = 2 llama_model_loader: - type f32: 37 tensors llama_model_loader: - type q4_0: 126 tensors llama_model_loader: - type q8_0: 1 tensors llm_load_vocab: special tokens cache size = 388 llm_load_vocab: token to piece cache size = 3.2028 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = gemma llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 256128 llm_load_print_meta: n_merges = 0 llm_load_print_meta: n_ctx_train = 8192 llm_load_print_meta: n_embd = 2048 llm_load_print_meta: n_head = 8 llm_load_print_meta: n_head_kv = 1 llm_load_print_meta: n_layer = 18 llm_load_print_meta: n_rot = 256 llm_load_print_meta: n_embd_head_k = 256 llm_load_print_meta: n_embd_head_v = 256 llm_load_print_meta: n_gqa = 8 llm_load_print_meta: n_embd_k_gqa = 256 llm_load_print_meta: n_embd_v_gqa = 256 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-06 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 16384 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 2 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 8192 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: model type = 2B llm_load_print_meta: model ftype = Q4_0 llm_load_print_meta: model params = 2.51 B llm_load_print_meta: model size = 1.56 GiB (5.34 BPW) llm_load_print_meta: general.name = gemma-2b-it llm_load_print_meta: BOS token = 2 '<bos>' llm_load_print_meta: EOS token = 1 '<eos>' llm_load_print_meta: UNK token = 3 '<unk>' llm_load_print_meta: PAD token = 0 '<pad>' llm_load_print_meta: LF token = 227 '<0x0A>' llm_load_print_meta: EOT token = 107 '<end_of_turn>' rocBLAS error: Could not initialize Tensile host: No devices found time=2024-06-12T15:01:06.999Z level=INFO source=server.go:567 msg="waiting for server to become available" status="llm server not responding" time=2024-06-12T15:01:07.701Z level=ERROR source=sched.go:344 msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped) error:Could not initialize Tensile host: No devices found" time=2024-06-12T15:01:07.701Z level=DEBUG source=sched.go:347 msg="triggering expiration for failed load" model=/root/.ollama/models/blobs/sha256-c1864a5eb19305c40519da12cc543519e48a0697ecd30e15d5ac228644957d12 time=2024-06-12T15:01:07.701Z level=DEBUG source=sched.go:258 msg="runner expired event received" modelPath=/root/.ollama/models/blobs/sha256-c1864a5eb19305c40519da12cc543519e48a0697ecd30e15d5ac228644957d12 time=2024-06-12T15:01:07.701Z level=DEBUG source=sched.go:274 msg="got lock to unload" modelPath=/root/.ollama/models/blobs/sha256-c1864a5eb19305c40519da12cc543519e48a0697ecd30e15d5ac228644957d12 time=2024-06-12T15:01:07.701Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" time=2024-06-12T15:01:07.701Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* [GIN] 2024/06/12 - 15:01:07 | 500 | 3.474920334s | 127.0.0.1 | POST "/api/chat" time=2024-06-12T15:01:07.701Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-06-12T15:01:07.703Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[] time=2024-06-12T15:01:07.703Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* time=2024-06-12T15:01:07.703Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-06-12T15:01:07.705Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 35 time=2024-06-12T15:01:07.705Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2024-06-12T15:01:07.705Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" time=2024-06-12T15:01:07.705Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-12T15:01:07.705Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:07.705Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:07.705Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-06-12T15:01:07.706Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB" time=2024-06-12T15:01:07.706Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB" time=2024-06-12T15:01:07.706Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-06-12T15:01:07.706Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 time=2024-06-12T15:01:07.706Z level=DEBUG source=server.go:990 msg="stopping llama server" time=2024-06-12T15:01:07.706Z level=DEBUG source=sched.go:279 msg="runner released" modelPath=/root/.ollama/models/blobs/sha256-c1864a5eb19305c40519da12cc543519e48a0697ecd30e15d5ac228644957d12 time=2024-06-12T15:01:07.956Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" time=2024-06-12T15:01:07.956Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* time=2024-06-12T15:01:07.956Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-06-12T15:01:07.957Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[] time=2024-06-12T15:01:07.958Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* time=2024-06-12T15:01:07.958Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-06-12T15:01:07.959Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 35 time=2024-06-12T15:01:07.959Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2024-06-12T15:01:07.959Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" time=2024-06-12T15:01:07.959Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-12T15:01:07.959Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:07.959Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:07.959Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-06-12T15:01:07.960Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB" time=2024-06-12T15:01:07.960Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB" time=2024-06-12T15:01:07.960Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-06-12T15:01:07.960Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 time=2024-06-12T15:01:08.207Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" time=2024-06-12T15:01:08.207Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* time=2024-06-12T15:01:08.207Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-06-12T15:01:08.209Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[] time=2024-06-12T15:01:08.209Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* time=2024-06-12T15:01:08.209Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-06-12T15:01:08.210Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 35 time=2024-06-12T15:01:08.211Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2024-06-12T15:01:08.211Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" time=2024-06-12T15:01:08.211Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-12T15:01:08.211Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:08.211Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:08.211Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-06-12T15:01:08.211Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB" time=2024-06-12T15:01:08.211Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB" time=2024-06-12T15:01:08.211Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-06-12T15:01:08.211Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 time=2024-06-12T15:01:08.457Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" time=2024-06-12T15:01:08.457Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* time=2024-06-12T15:01:08.457Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-06-12T15:01:08.458Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[] time=2024-06-12T15:01:08.458Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* time=2024-06-12T15:01:08.458Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-06-12T15:01:08.460Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 35 time=2024-06-12T15:01:08.460Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2024-06-12T15:01:08.460Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" time=2024-06-12T15:01:08.460Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-12T15:01:08.460Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:08.461Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:08.461Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-06-12T15:01:08.461Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB" time=2024-06-12T15:01:08.461Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB" time=2024-06-12T15:01:08.461Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-06-12T15:01:08.461Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 time=2024-06-12T15:01:08.706Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" time=2024-06-12T15:01:08.706Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* time=2024-06-12T15:01:08.706Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-06-12T15:01:08.708Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[] time=2024-06-12T15:01:08.708Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* time=2024-06-12T15:01:08.708Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-06-12T15:01:08.710Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 35 time=2024-06-12T15:01:08.710Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2024-06-12T15:01:08.710Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" time=2024-06-12T15:01:08.710Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-12T15:01:08.710Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:08.710Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:08.710Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-06-12T15:01:08.711Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB" time=2024-06-12T15:01:08.711Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB" time=2024-06-12T15:01:08.711Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-06-12T15:01:08.711Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 time=2024-06-12T15:01:08.957Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" time=2024-06-12T15:01:08.957Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* time=2024-06-12T15:01:08.957Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-06-12T15:01:08.959Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[] time=2024-06-12T15:01:08.959Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* time=2024-06-12T15:01:08.959Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-06-12T15:01:08.961Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 35 time=2024-06-12T15:01:08.961Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2024-06-12T15:01:08.961Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" time=2024-06-12T15:01:08.961Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-12T15:01:08.961Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:08.961Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:08.961Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-06-12T15:01:08.962Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB" time=2024-06-12T15:01:08.962Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB" time=2024-06-12T15:01:08.962Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-06-12T15:01:08.962Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 time=2024-06-12T15:01:09.207Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" time=2024-06-12T15:01:09.207Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* time=2024-06-12T15:01:09.207Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-06-12T15:01:09.209Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[] time=2024-06-12T15:01:09.210Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* time=2024-06-12T15:01:09.210Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-06-12T15:01:09.211Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 35 time=2024-06-12T15:01:09.211Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2024-06-12T15:01:09.211Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" time=2024-06-12T15:01:09.211Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-12T15:01:09.212Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:09.212Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:09.212Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-06-12T15:01:09.212Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB" time=2024-06-12T15:01:09.212Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB" time=2024-06-12T15:01:09.212Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-06-12T15:01:09.212Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 time=2024-06-12T15:01:09.456Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" time=2024-06-12T15:01:09.456Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* time=2024-06-12T15:01:09.456Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-06-12T15:01:09.458Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[] time=2024-06-12T15:01:09.458Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* time=2024-06-12T15:01:09.458Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-06-12T15:01:09.460Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 35 time=2024-06-12T15:01:09.461Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2024-06-12T15:01:09.461Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" time=2024-06-12T15:01:09.461Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-12T15:01:09.461Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:09.461Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:09.461Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-06-12T15:01:09.461Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB" time=2024-06-12T15:01:09.461Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB" time=2024-06-12T15:01:09.461Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-06-12T15:01:09.461Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 time=2024-06-12T15:01:09.707Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" time=2024-06-12T15:01:09.707Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* time=2024-06-12T15:01:09.707Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-06-12T15:01:09.709Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[] time=2024-06-12T15:01:09.709Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* time=2024-06-12T15:01:09.709Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-06-12T15:01:09.710Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 35 time=2024-06-12T15:01:09.711Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2024-06-12T15:01:09.711Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" time=2024-06-12T15:01:09.711Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-12T15:01:09.711Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:09.711Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:09.711Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-06-12T15:01:09.711Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB" time=2024-06-12T15:01:09.711Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB" time=2024-06-12T15:01:09.711Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-06-12T15:01:09.712Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 time=2024-06-12T15:01:09.957Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" time=2024-06-12T15:01:09.957Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* time=2024-06-12T15:01:09.957Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-06-12T15:01:09.959Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[] time=2024-06-12T15:01:09.959Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* time=2024-06-12T15:01:09.959Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-06-12T15:01:09.960Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 35 time=2024-06-12T15:01:09.961Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2024-06-12T15:01:09.961Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" time=2024-06-12T15:01:09.961Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-12T15:01:09.961Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:09.961Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:09.961Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-06-12T15:01:09.961Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB" time=2024-06-12T15:01:09.961Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB" time=2024-06-12T15:01:09.962Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-06-12T15:01:09.962Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 time=2024-06-12T15:01:10.207Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" time=2024-06-12T15:01:10.207Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* time=2024-06-12T15:01:10.207Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-06-12T15:01:10.209Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[] time=2024-06-12T15:01:10.209Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* time=2024-06-12T15:01:10.209Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-06-12T15:01:10.210Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 35 time=2024-06-12T15:01:10.211Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2024-06-12T15:01:10.211Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" time=2024-06-12T15:01:10.211Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-12T15:01:10.211Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:10.211Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:10.211Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-06-12T15:01:10.211Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB" time=2024-06-12T15:01:10.211Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB" time=2024-06-12T15:01:10.211Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-06-12T15:01:10.211Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 time=2024-06-12T15:01:10.457Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" time=2024-06-12T15:01:10.457Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* time=2024-06-12T15:01:10.457Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-06-12T15:01:10.458Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[] time=2024-06-12T15:01:10.458Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* time=2024-06-12T15:01:10.458Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-06-12T15:01:10.460Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 35 time=2024-06-12T15:01:10.460Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2024-06-12T15:01:10.460Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" time=2024-06-12T15:01:10.461Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-12T15:01:10.461Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:10.461Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:10.461Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-06-12T15:01:10.461Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB" time=2024-06-12T15:01:10.461Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB" time=2024-06-12T15:01:10.461Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-06-12T15:01:10.461Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 time=2024-06-12T15:01:10.706Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" time=2024-06-12T15:01:10.706Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* time=2024-06-12T15:01:10.706Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-06-12T15:01:10.708Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[] time=2024-06-12T15:01:10.709Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* time=2024-06-12T15:01:10.709Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-06-12T15:01:10.710Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 35 time=2024-06-12T15:01:10.710Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2024-06-12T15:01:10.710Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" time=2024-06-12T15:01:10.710Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-12T15:01:10.711Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:10.711Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:10.711Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-06-12T15:01:10.711Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB" time=2024-06-12T15:01:10.711Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB" time=2024-06-12T15:01:10.711Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-06-12T15:01:10.711Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 time=2024-06-12T15:01:10.956Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" time=2024-06-12T15:01:10.956Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* time=2024-06-12T15:01:10.956Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-06-12T15:01:10.958Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[] time=2024-06-12T15:01:10.958Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* time=2024-06-12T15:01:10.958Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-06-12T15:01:10.960Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 35 time=2024-06-12T15:01:10.960Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2024-06-12T15:01:10.960Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" time=2024-06-12T15:01:10.961Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-12T15:01:10.961Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:10.961Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:10.961Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-06-12T15:01:10.961Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB" time=2024-06-12T15:01:10.961Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB" time=2024-06-12T15:01:10.961Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-06-12T15:01:10.962Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 time=2024-06-12T15:01:11.206Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" time=2024-06-12T15:01:11.207Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* time=2024-06-12T15:01:11.207Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-06-12T15:01:11.209Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[] time=2024-06-12T15:01:11.209Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* time=2024-06-12T15:01:11.209Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-06-12T15:01:11.210Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 35 time=2024-06-12T15:01:11.211Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2024-06-12T15:01:11.211Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" time=2024-06-12T15:01:11.211Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-12T15:01:11.211Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:11.211Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:11.211Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-06-12T15:01:11.212Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB" time=2024-06-12T15:01:11.212Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB" time=2024-06-12T15:01:11.212Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-06-12T15:01:11.212Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 time=2024-06-12T15:01:11.457Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" time=2024-06-12T15:01:11.457Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* time=2024-06-12T15:01:11.457Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-06-12T15:01:11.459Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[] time=2024-06-12T15:01:11.459Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* time=2024-06-12T15:01:11.459Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-06-12T15:01:11.461Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 35 time=2024-06-12T15:01:11.461Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2024-06-12T15:01:11.461Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" time=2024-06-12T15:01:11.461Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-12T15:01:11.462Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:11.462Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:11.462Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-06-12T15:01:11.462Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB" time=2024-06-12T15:01:11.462Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB" time=2024-06-12T15:01:11.462Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-06-12T15:01:11.463Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 time=2024-06-12T15:01:11.706Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" time=2024-06-12T15:01:11.706Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* time=2024-06-12T15:01:11.706Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-06-12T15:01:11.709Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[] time=2024-06-12T15:01:11.709Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* time=2024-06-12T15:01:11.709Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-06-12T15:01:11.710Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 35 time=2024-06-12T15:01:11.711Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2024-06-12T15:01:11.711Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" time=2024-06-12T15:01:11.711Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-12T15:01:11.711Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:11.711Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:11.711Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-06-12T15:01:11.712Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB" time=2024-06-12T15:01:11.712Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB" time=2024-06-12T15:01:11.712Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-06-12T15:01:11.712Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 time=2024-06-12T15:01:11.957Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" time=2024-06-12T15:01:11.957Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* time=2024-06-12T15:01:11.957Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-06-12T15:01:11.959Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[] time=2024-06-12T15:01:11.959Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* time=2024-06-12T15:01:11.959Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-06-12T15:01:11.961Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 35 time=2024-06-12T15:01:11.962Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2024-06-12T15:01:11.962Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" time=2024-06-12T15:01:11.962Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-12T15:01:11.962Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:11.962Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:11.962Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-06-12T15:01:11.963Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB" time=2024-06-12T15:01:11.963Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB" time=2024-06-12T15:01:11.963Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-06-12T15:01:11.963Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 time=2024-06-12T15:01:12.207Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" time=2024-06-12T15:01:12.207Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* time=2024-06-12T15:01:12.207Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-06-12T15:01:12.209Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[] time=2024-06-12T15:01:12.209Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* time=2024-06-12T15:01:12.209Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-06-12T15:01:12.210Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 35 time=2024-06-12T15:01:12.211Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2024-06-12T15:01:12.211Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" time=2024-06-12T15:01:12.211Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-12T15:01:12.211Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:12.211Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:12.211Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-06-12T15:01:12.212Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB" time=2024-06-12T15:01:12.212Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB" time=2024-06-12T15:01:12.212Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-06-12T15:01:12.212Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 time=2024-06-12T15:01:12.457Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" time=2024-06-12T15:01:12.457Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* time=2024-06-12T15:01:12.457Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-06-12T15:01:12.459Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[] time=2024-06-12T15:01:12.459Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* time=2024-06-12T15:01:12.459Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-06-12T15:01:12.461Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 35 time=2024-06-12T15:01:12.461Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2024-06-12T15:01:12.462Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" time=2024-06-12T15:01:12.462Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-12T15:01:12.462Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:12.462Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:12.462Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-06-12T15:01:12.462Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB" time=2024-06-12T15:01:12.462Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB" time=2024-06-12T15:01:12.462Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-06-12T15:01:12.463Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 time=2024-06-12T15:01:12.707Z level=WARN source=sched.go:511 msg="gpu VRAM usage didn't recover within timeout" seconds=5.005717187 time=2024-06-12T15:01:12.707Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" time=2024-06-12T15:01:12.707Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* time=2024-06-12T15:01:12.707Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-06-12T15:01:12.707Z level=DEBUG source=sched.go:283 msg="sending an unloaded event" modelPath=/root/.ollama/models/blobs/sha256-c1864a5eb19305c40519da12cc543519e48a0697ecd30e15d5ac228644957d12 time=2024-06-12T15:01:12.707Z level=DEBUG source=sched.go:206 msg="ignoring unload event with no pending requests" time=2024-06-12T15:01:12.709Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[] time=2024-06-12T15:01:12.709Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* time=2024-06-12T15:01:12.709Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-06-12T15:01:12.711Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 35 time=2024-06-12T15:01:12.711Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2024-06-12T15:01:12.711Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" time=2024-06-12T15:01:12.712Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-12T15:01:12.712Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:12.712Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:12.712Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-06-12T15:01:12.712Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB" time=2024-06-12T15:01:12.712Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB" time=2024-06-12T15:01:12.712Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-06-12T15:01:12.713Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 time=2024-06-12T15:01:12.957Z level=WARN source=sched.go:511 msg="gpu VRAM usage didn't recover within timeout" seconds=5.255826171 time=2024-06-12T15:01:12.957Z level=DEBUG source=gpu.go:132 msg="Detecting GPUs" time=2024-06-12T15:01:12.957Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcuda.so* time=2024-06-12T15:01:12.957Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcuda.so** /usr/local/lib/libcuda.so** /opt/rh/devtoolset-7/root/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-06-12T15:01:12.959Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[] time=2024-06-12T15:01:12.959Z level=DEBUG source=gpu.go:274 msg="Searching for GPU library" name=libcudart.so* time=2024-06-12T15:01:12.959Z level=DEBUG source=gpu.go:293 msg="gpu library search" globs="[/opt/rocm/lib/libcudart.so** /usr/local/lib/libcudart.so** /opt/rh/devtoolset-7/root/libcudart.so** /tmp/ollama689615480/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-06-12T15:01:12.960Z level=DEBUG source=gpu.go:326 msg="discovered GPU libraries" paths=[/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0] cudaSetDevice err: 35 time=2024-06-12T15:01:12.961Z level=DEBUG source=gpu.go:338 msg="Unable to load cudart" library=/tmp/ollama689615480/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" time=2024-06-12T15:01:12.961Z level=DEBUG source=cpu_common.go:11 msg="CPU has AVX2" time=2024-06-12T15:01:12.961Z level=WARN source=amd_linux.go:48 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2024-06-12T15:01:12.961Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:12.961Z level=DEBUG source=amd_linux.go:102 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-06-12T15:01:12.961Z level=DEBUG source=amd_linux.go:77 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-06-12T15:01:12.961Z level=DEBUG source=amd_linux.go:242 msg="amdgpu memory" gpu=0 total="4.0 GiB" time=2024-06-12T15:01:12.961Z level=DEBUG source=amd_linux.go:243 msg="amdgpu memory" gpu=0 available="4.0 GiB" time=2024-06-12T15:01:12.961Z level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-06-12T15:01:12.962Z level=INFO source=amd_linux.go:304 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 time=2024-06-12T15:01:13.207Z level=WARN source=sched.go:511 msg="gpu VRAM usage didn't recover within timeout" seconds=5.505716674 ```
Author
Owner

@sebastianlutter commented on GitHub (Jun 13, 2024):

I'll wait until SteamOS 3.6 is released, I'm pretty sure it will work then. I failed with some attempts to build the current ollama in a rocm/dev-ubuntu-22.04:5.4.2 docker container.

Btw: I was able to run Llama3 8B on GPU using mlc.ai and vulkan:

<!-- gh-comment-id:2164919043 --> @sebastianlutter commented on GitHub (Jun 13, 2024): I'll wait until SteamOS 3.6 is released, I'm pretty sure it will work then. I failed with some attempts to build the current ollama in a `rocm/dev-ubuntu-22.04:5.4.2` docker container. Btw: I was able to run Llama3 8B on GPU using mlc.ai and vulkan: - https://llm.mlc.ai/docs/install/mlc_llm - https://github.com/mlc-ai/mlc-llm I like the idea of using vulkan to access the GPU because this will also work with Intel GPUs. But I expect rocm to have better performance (and I really hate the need to use anaconda to get it working)
Author
Owner

@dhiltgen commented on GitHub (Jun 19, 2024):

Release 0.1.45 will have a ROCm version bump to v6.1.1 which might help compatibility between ROCm and the driver.

<!-- gh-comment-id:2179509379 --> @dhiltgen commented on GitHub (Jun 19, 2024): Release 0.1.45 will have a ROCm version bump to v6.1.1 which might help compatibility between ROCm and the driver.
Author
Owner

@Talleyrand-34 commented on GitHub (Jun 26, 2024):

Sorry if this is not ollama area but when will this new steamos update allow to have ollama rocm?
Now it is going out SteamOS 3.6.7, so in which version of steamos we can expect ollama 0.1.45

<!-- gh-comment-id:2192231745 --> @Talleyrand-34 commented on GitHub (Jun 26, 2024): Sorry if this is not ollama area but when will this new steamos update allow to have ollama rocm? Now it is going out [SteamOS 3.6.7](https://store.steampowered.com/news/app/1675200?emclan=103582791470414830&emgid=4194621169493696713), so in which version of steamos we can expect ollama 0.1.45
Author
Owner

@sebastianlutter commented on GitHub (Jul 3, 2024):

@Talleyrand-34 The current stable Steam OS is 3.5.x, currently 3.6.7 is in preview, 3.6.6 is in beta. When 3.6.x gets stable and shipped to normal users then most probably the current llama build will work. The 3.5.x has a too old kernel and rocm driver for current ollama to work. The updated rocm version in 3.6.x should do the trick. But this is guessing, not science ;)

<!-- gh-comment-id:2206952413 --> @sebastianlutter commented on GitHub (Jul 3, 2024): @Talleyrand-34 The current stable Steam OS is 3.5.x, currently 3.6.7 is in preview, 3.6.6 is in beta. When 3.6.x gets stable and shipped to normal users then most probably the current llama build will work. The 3.5.x has a too old kernel and rocm driver for current ollama to work. The updated rocm version in 3.6.x should do the trick. But this is guessing, not science ;)
Author
Owner

@luckydonald commented on GitHub (Jul 18, 2024):

With podman being installed per default (as claimed above), would it make sense to adapt the install script to handle installing on steam deck via podman?
All it currently does is complain about /usr/local/bin/ollama being a read only file system.

<!-- gh-comment-id:2237108979 --> @luckydonald commented on GitHub (Jul 18, 2024): With podman being installed per default (as claimed above), would it make sense to adapt the install script to handle installing on steam deck via podman? All it currently does is complain about /usr/local/bin/ollama being a read only file system.
Author
Owner

@dhiltgen commented on GitHub (Jul 22, 2024):

@luckydonald there's already an official ollama rocm container image ollama/ollama:rocm - see https://hub.docker.com/r/ollama/ollama

<!-- gh-comment-id:2243363164 --> @dhiltgen commented on GitHub (Jul 22, 2024): @luckydonald there's already an official ollama rocm container image `ollama/ollama:rocm` - see https://hub.docker.com/r/ollama/ollama
Author
Owner

@Mushoz commented on GitHub (Oct 25, 2024):

SteamOS 3.6 is now in stable with the newer kernel. Any updates on this ticket? Does it work properly now?

<!-- gh-comment-id:2437793396 --> @Mushoz commented on GitHub (Oct 25, 2024): SteamOS 3.6 is now in stable with the newer kernel. Any updates on this ticket? Does it work properly now?
Author
Owner

@sebastianlutter commented on GitHub (Oct 26, 2024):

https://www.gamingonlinux.com/2024/10/steam-deck-steamos-36-officially-out-with-improved-performance-mura-compensation-lots-more/

Hope I can find time to run a test the next days

<!-- gh-comment-id:2439618443 --> @sebastianlutter commented on GitHub (Oct 26, 2024): https://www.gamingonlinux.com/2024/10/steam-deck-steamos-36-officially-out-with-improved-performance-mura-compensation-lots-more/ Hope I can find time to run a test the next days
Author
Owner

@sebastianlutter commented on GitHub (Oct 26, 2024):

First test:

  • Using podman like this
podman run --pull newer -d --replace -e "HSA_OVERRIDE_GFX_VERSION=gfx1030"  -e OLLAMA_DEBUG=1 --device /dev/kfd --device /dev/dri -v $(pwd)/ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:rocm
  • Then loaded llama3.2:3b in the running container
podman exec -it $(podman ps --filter "name=ollama" --format "{{.ID}}") ollama run llama3.2:3b
  • relevant logs
time=2024-10-26T16:56:33.601Z level=INFO source=gpu.go:221 msg="looking for compatible GPUs"
[ . . . ]
time=2024-10-26T16:56:33.617Z level=DEBUG source=amd_linux.go:104 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-10-26T16:56:33.617Z level=DEBUG source=amd_linux.go:215 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=5695 unique_id=0
time=2024-10-26T16:56:33.617Z level=DEBUG source=amd_linux.go:249 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card0/device
time=2024-10-26T16:56:33.618Z level=DEBUG source=amd_linux.go:315 msg="amdgpu memory" gpu=0 total="1.0 GiB"
time=2024-10-26T16:56:33.618Z level=DEBUG source=amd_linux.go:316 msg="amdgpu memory" gpu=0 available="373.1 MiB"
time=2024-10-26T16:59:45.368Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="14.5 GiB" before.free="10.6 GiB" before.free_swap="8.2 GiB" now.total="14.5 GiB" now.free="10.5 GiB" now.free_swap="8.2 GiB"
time=2024-10-26T16:59:45.368Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="373.1 MiB" now="376.1 MiB"
time=2024-10-26T16:59:45.368Z level=DEBUG source=sched.go:181 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=0x80fd00 gpu_count=1
time=2024-10-26T16:59:45.436Z level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[376.1 MiB]"
time=2024-10-26T16:59:45.368Z level=DEBUG source=sched.go:181 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=0x80fd00 gpu_count=1
time=2024-10-26T16:59:45.436Z level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[376.1 MiB]"
time=2024-10-26T16:59:45.437Z level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=0 library=rocm variant="" compute=gfx1033 driver=0.0 name=1002:163f total="1.0 GiB" available="376.1 MiB" minimum_memory=479199232 layer_size="93.0 MiB" gpu_zer_overhead="0 B" partial_offload="570.7 MiB" full_offload="424.0 MiB"
time=2024-10-26T16:59:45.437Z level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[376.1 MiB]"
time=2024-10-26T16:59:45.438Z level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=0 library=rocm variant="" compute=gfx1033 driver=0.0 name=1002:163f total="1.0 GiB" available="376.1 MiB" minimum_memory=479199232 layer_size="69.0 MiB" gpu_zer_overhead="0 B" partial_offload="570.7 MiB" full_offload="256.5 MiB"
time=2024-10-26T16:59:45.438Z level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[376.1 MiB]"
time=2024-10-26T16:59:45.438Z level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=0 library=rocm variant="" compute=gfx1033 driver=0.0 name=1002:163f total="1.0 GiB" available="376.1 MiB" minimum_memory=479199232 layer_size="93.0 MiB" gpu_zer_overhead="0 B" partial_offload="570.7 MiB" full_offload="424.0 MiB"
time=2024-10-26T16:59:45.438Z level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[376.1 MiB]"
time=2024-10-26T16:59:45.439Z level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=0 library=rocm variant="" compute=gfx1033 driver=0.0 name=1002:163f total="1.0 GiB" available="376.1 MiB" minimum_memory=479199232 layer_size="69.0 MiB" gpu_zer_overhead="0 B" partial_offload="570.7 MiB" full_offload="256.5 MiB"
time=2024-10-26T16:59:45.439Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="14.5 GiB" before.free="10.5 GiB" before.free_swap="8.2 GiB" now.total="14.5 GiB" now.free="10.5 GiB" now.free_swap="8.2 GiB"
time=2024-10-26T16:59:45.439Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="376.1 MiB" now="376.1 MiB"
time=2024-10-26T16:59:45.439Z level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[376.1 MiB]"
time=2024-10-26T16:59:45.440Z level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=0 library=rocm variant="" compute=gfx1033 driver=0.0 name=1002:163f total="1.0 GiB" available="376.1 MiB" minimum_memory=479199232 layer_size="69.0 MiB" gpu_zer_overhead="0 B" partial_offload="570.7 MiB" full_offload="256.5 MiB"
time=2024-10-26T16:59:45.440Z level=INFO source=memory.go:326 msg="offload to rocm" layers.requested=-1 layers.model=29 layers.offload=0 layers.split="" memory.available="[376.1 MiB]" memory.gpu_overhead="0 B" memory.required.full="2.2 GiB" memory.required.partial="0 B" memory.required.kv="224.0 MiB" memory.required.allocations="[0 B]" memory.weights.total="1.8 GiB" memory.weights.repeating="1.5 GiB" memory.weights.nonrepeating="308.2 MiB" memory.graph.full="256.5 MiB" memory.graph.partial="570.7 MiB"

The gpu was found and it tried to use it, but had too less memory available to put any layer to the GPU memory. But promising so far . . .

Time to give the GPU more memory I guess: https://pimylifeup.com/steam-deck-bios/

To inspect the gpu usage I found nvtop (installable via pacman) very handy

<!-- gh-comment-id:2439668878 --> @sebastianlutter commented on GitHub (Oct 26, 2024): First test: * Using podman like this ``` podman run --pull newer -d --replace -e "HSA_OVERRIDE_GFX_VERSION=gfx1030" -e OLLAMA_DEBUG=1 --device /dev/kfd --device /dev/dri -v $(pwd)/ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:rocm ``` * Then loaded llama3.2:3b in the running container ``` podman exec -it $(podman ps --filter "name=ollama" --format "{{.ID}}") ollama run llama3.2:3b ``` * relevant logs ``` time=2024-10-26T16:56:33.601Z level=INFO source=gpu.go:221 msg="looking for compatible GPUs" [ . . . ] time=2024-10-26T16:56:33.617Z level=DEBUG source=amd_linux.go:104 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-10-26T16:56:33.617Z level=DEBUG source=amd_linux.go:215 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=5695 unique_id=0 time=2024-10-26T16:56:33.617Z level=DEBUG source=amd_linux.go:249 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card0/device time=2024-10-26T16:56:33.618Z level=DEBUG source=amd_linux.go:315 msg="amdgpu memory" gpu=0 total="1.0 GiB" time=2024-10-26T16:56:33.618Z level=DEBUG source=amd_linux.go:316 msg="amdgpu memory" gpu=0 available="373.1 MiB" time=2024-10-26T16:59:45.368Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="14.5 GiB" before.free="10.6 GiB" before.free_swap="8.2 GiB" now.total="14.5 GiB" now.free="10.5 GiB" now.free_swap="8.2 GiB" time=2024-10-26T16:59:45.368Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="373.1 MiB" now="376.1 MiB" time=2024-10-26T16:59:45.368Z level=DEBUG source=sched.go:181 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=0x80fd00 gpu_count=1 time=2024-10-26T16:59:45.436Z level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[376.1 MiB]" time=2024-10-26T16:59:45.368Z level=DEBUG source=sched.go:181 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=0x80fd00 gpu_count=1 time=2024-10-26T16:59:45.436Z level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[376.1 MiB]" time=2024-10-26T16:59:45.437Z level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=0 library=rocm variant="" compute=gfx1033 driver=0.0 name=1002:163f total="1.0 GiB" available="376.1 MiB" minimum_memory=479199232 layer_size="93.0 MiB" gpu_zer_overhead="0 B" partial_offload="570.7 MiB" full_offload="424.0 MiB" time=2024-10-26T16:59:45.437Z level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[376.1 MiB]" time=2024-10-26T16:59:45.438Z level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=0 library=rocm variant="" compute=gfx1033 driver=0.0 name=1002:163f total="1.0 GiB" available="376.1 MiB" minimum_memory=479199232 layer_size="69.0 MiB" gpu_zer_overhead="0 B" partial_offload="570.7 MiB" full_offload="256.5 MiB" time=2024-10-26T16:59:45.438Z level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[376.1 MiB]" time=2024-10-26T16:59:45.438Z level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=0 library=rocm variant="" compute=gfx1033 driver=0.0 name=1002:163f total="1.0 GiB" available="376.1 MiB" minimum_memory=479199232 layer_size="93.0 MiB" gpu_zer_overhead="0 B" partial_offload="570.7 MiB" full_offload="424.0 MiB" time=2024-10-26T16:59:45.438Z level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[376.1 MiB]" time=2024-10-26T16:59:45.439Z level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=0 library=rocm variant="" compute=gfx1033 driver=0.0 name=1002:163f total="1.0 GiB" available="376.1 MiB" minimum_memory=479199232 layer_size="69.0 MiB" gpu_zer_overhead="0 B" partial_offload="570.7 MiB" full_offload="256.5 MiB" time=2024-10-26T16:59:45.439Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="14.5 GiB" before.free="10.5 GiB" before.free_swap="8.2 GiB" now.total="14.5 GiB" now.free="10.5 GiB" now.free_swap="8.2 GiB" time=2024-10-26T16:59:45.439Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="376.1 MiB" now="376.1 MiB" time=2024-10-26T16:59:45.439Z level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[376.1 MiB]" time=2024-10-26T16:59:45.440Z level=DEBUG source=memory.go:170 msg="gpu has too little memory to allocate any layers" id=0 library=rocm variant="" compute=gfx1033 driver=0.0 name=1002:163f total="1.0 GiB" available="376.1 MiB" minimum_memory=479199232 layer_size="69.0 MiB" gpu_zer_overhead="0 B" partial_offload="570.7 MiB" full_offload="256.5 MiB" time=2024-10-26T16:59:45.440Z level=INFO source=memory.go:326 msg="offload to rocm" layers.requested=-1 layers.model=29 layers.offload=0 layers.split="" memory.available="[376.1 MiB]" memory.gpu_overhead="0 B" memory.required.full="2.2 GiB" memory.required.partial="0 B" memory.required.kv="224.0 MiB" memory.required.allocations="[0 B]" memory.weights.total="1.8 GiB" memory.weights.repeating="1.5 GiB" memory.weights.nonrepeating="308.2 MiB" memory.graph.full="256.5 MiB" memory.graph.partial="570.7 MiB" ``` The gpu was found and it tried to use it, but had too less memory available to put any layer to the GPU memory. But promising so far . . . Time to give the GPU more memory I guess: https://pimylifeup.com/steam-deck-bios/ To inspect the gpu usage I found `nvtop` (installable via pacman) very handy
Author
Owner

@sebastianlutter commented on GitHub (Oct 26, 2024):

I increased the GPU memory to 4GB, then tested again with llama3.2:1b

  • it decides to use the GPU but fails
msg=evaluating library=rocm gpu_count=1 available="[3.4 GiB]"
time=2024-10-26T17:52:44.880Z level=INFO source=sched.go:714 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 gpu=0 parallel=4 available=3654434816 required="2.5 GiB"
[ . . . ]
/opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory

rocBLAS error: Could not initialize Tensile host: No devices found
[ . . . ]
[ . . . ]
time=2024-10-26T17:52:44.879Z level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[3.4 GiB]"
time=2024-10-26T17:52:44.880Z level=INFO source=sched.go:714 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 gpu=0 parallel=4 available=3654434816 required="2.5 GiB"
time=2024-10-26T17:52:44.881Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.0 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.0 GiB" now.free_swap="6.8 GiB"
time=2024-10-26T17:52:44.881Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB"
time=2024-10-26T17:52:44.881Z level=INFO source=server.go:105 msg="system memory" total="11.5 GiB" free="8.0 GiB" free_swap="6.8 GiB"
time=2024-10-26T17:52:44.881Z level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[3.4 GiB]"
time=2024-10-26T17:52:44.881Z level=INFO source=memory.go:326 msg="offload to rocm" layers.requested=-1 layers.model=17 layers.offload=17 layers.split="" memory.available="[3.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="2.5 GiB" memory.required.partial="2.5 GiB" memory.required.kv="256.0 MiB" memory.required.allocations="[2.5 GiB]" memory.weights.total="1.2 GiB" memory.weights.repeating="976.1 MiB" memory.weights.nonrepeating="266.2 MiB" memory.graph.full="544.0 MiB" memory.graph.partial="554.3 MiB"
time=2024-10-26T17:52:44.882Z level=DEBUG source=common.go:294 msg="availableServers : found" file=/usr/lib/ollama/runners/cpu/ollama_llama_server
time=2024-10-26T17:52:44.882Z level=DEBUG source=common.go:294 msg="availableServers : found" file=/usr/lib/ollama/runners/cpu_avx/ollama_llama_server
time=2024-10-26T17:52:44.882Z level=DEBUG source=common.go:294 msg="availableServers : found" file=/usr/lib/ollama/runners/cpu_avx2/ollama_llama_server
time=2024-10-26T17:52:44.882Z level=DEBUG source=common.go:294 msg="availableServers : found" file=/usr/lib/ollama/runners/rocm_v60102/ollama_llama_server
time=2024-10-26T17:52:44.882Z level=DEBUG source=common.go:294 msg="availableServers : found" file=/usr/lib/ollama/runners/cpu/ollama_llama_server
time=2024-10-26T17:52:44.882Z level=DEBUG source=common.go:294 msg="availableServers : found" file=/usr/lib/ollama/runners/cpu_avx/ollama_llama_server
time=2024-10-26T17:52:44.882Z level=DEBUG source=common.go:294 msg="availableServers : found" file=/usr/lib/ollama/runners/cpu_avx2/ollama_llama_server
time=2024-10-26T17:52:44.882Z level=DEBUG source=common.go:294 msg="availableServers : found" file=/usr/lib/ollama/runners/rocm_v60102/ollama_llama_server
time=2024-10-26T17:52:44.883Z level=INFO source=server.go:388 msg="starting llama server" cmd="/usr/lib/ollama/runners/rocm_v60102/ollama_llama_server --model /root/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 --ctx-size 8192 --batch-size 512 --embedding --n-gpu-layers 17 --verbose --threads 4 --parallel 4 --port 40561"
time=2024-10-26T17:52:44.883Z level=DEBUG source=server.go:405 msg=subprocess environment="[HSA_OVERRIDE_GFX_VERSION=gfx1030 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/runners/rocm_v60102 HIP_VISIBLE_DEVICES=0]"
time=2024-10-26T17:52:44.883Z level=INFO source=sched.go:449 msg="loaded runners" count=1
time=2024-10-26T17:52:44.883Z level=INFO source=server.go:587 msg="waiting for llama runner to start responding"
time=2024-10-26T17:52:44.884Z level=INFO source=server.go:621 msg="waiting for server to become available" status="llm server error"
INFO [main] starting c++ runner | tid="140419050369856" timestamp=1729965164
INFO [main] build info | build=10 commit="a04710e" tid="140419050369856" timestamp=1729965164
INFO [main] system info | n_threads=4 n_threads_batch=4 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="140419050369856" timestamp=1729965164 total_threads=8
INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="7" port="40561" tid="140419050369856" timestamp=1729965164
llama_model_loader: loaded meta data with 30 key-value pairs and 147 tensors from /root/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Llama 3.2 1B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Llama-3.2
llama_model_loader: - kv   5:                         general.size_label str              = 1B
llama_model_loader: - kv   6:                               general.tags arr[str,6]       = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv   7:                          general.languages arr[str,8]       = ["en", "de", "fr", "it", "pt", "hi", ...
llama_model_loader: - kv   8:                          llama.block_count u32              = 16
llama_model_loader: - kv   9:                       llama.context_length u32              = 131072
llama_model_loader: - kv  10:                     llama.embedding_length u32              = 2048
llama_model_loader: - kv  11:                  llama.feed_forward_length u32              = 8192
llama_model_loader: - kv  12:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv  13:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  14:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  15:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  16:                 llama.attention.key_length u32              = 64
llama_model_loader: - kv  17:               llama.attention.value_length u32              = 64
llama_model_loader: - kv  18:                          general.file_type u32              = 7
llama_model_loader: - kv  19:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  20:                 llama.rope.dimension_count u32              = 64
llama_model_loader: - kv  21:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  22:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  23:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  24:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
time=2024-10-26T17:52:45.135Z level=INFO source=server.go:621 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: - kv  25:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  26:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  27:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  28:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv  29:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   34 tensors
llama_model_loader: - type q8_0:  113 tensors
llm_load_vocab: special tokens cache size = 256
llm_load_vocab: token to piece cache size = 0.7999 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 131072
llm_load_print_meta: n_embd           = 2048
llm_load_print_meta: n_layer          = 16
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_rot            = 64
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 64
llm_load_print_meta: n_embd_head_v    = 64
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: n_embd_k_gqa     = 512
llm_load_print_meta: n_embd_v_gqa     = 512
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 8192
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 131072
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: ssm_dt_b_c_rms   = 0
llm_load_print_meta: model type       = 1B
llm_load_print_meta: model ftype      = Q8_0
llm_load_print_meta: model params     = 1.24 B
llm_load_print_meta: model size       = 1.22 GiB (8.50 BPW) 
llm_load_print_meta: general.name     = Llama 3.2 1B Instruct
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128009 '<|eot_id|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'
llm_load_print_meta: EOM token        = 128008 '<|eom_id|>'
llm_load_print_meta: EOG token        = 128008 '<|eom_id|>'
llm_load_print_meta: EOG token        = 128009 '<|eot_id|>'
llm_load_print_meta: max token length = 256
/opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory

rocBLAS error: Could not initialize Tensile host: No devices found
time=2024-10-26T17:52:46.339Z level=INFO source=server.go:621 msg="waiting for server to become available" status="llm server not responding"
time=2024-10-26T17:52:46.809Z level=DEBUG source=server.go:428 msg="llama runner terminated" error="signal: aborted (core dumped)"
time=2024-10-26T17:52:47.041Z level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: error:Could not initialize Tensile host: No devices found"
time=2024-10-26T17:52:47.041Z level=DEBUG source=sched.go:458 msg="triggering expiration for failed load" model=/root/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45
time=2024-10-26T17:52:47.041Z level=DEBUG source=sched.go:360 msg="runner expired event received" modelPath=/root/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45
time=2024-10-26T17:52:47.041Z level=DEBUG source=sched.go:375 msg="got lock to unload" modelPath=/root/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45
[GIN] 2024/10/26 - 17:52:47 | 500 |  2.264259318s |       127.0.0.1 | POST     "/api/generate"
time=2024-10-26T17:52:47.041Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.0 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.0 GiB" now.free_swap="6.8 GiB"
time=2024-10-26T17:52:47.041Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB"
time=2024-10-26T17:52:47.041Z level=DEBUG source=server.go:1086 msg="stopping llama server"
time=2024-10-26T17:52:47.041Z level=DEBUG source=sched.go:380 msg="runner released" modelPath=/root/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45
time=2024-10-26T17:52:47.292Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.0 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB"
time=2024-10-26T17:52:47.292Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB"
time=2024-10-26T17:52:47.542Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB"
time=2024-10-26T17:52:47.542Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB"
time=2024-10-26T17:52:47.792Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB"
time=2024-10-26T17:52:47.792Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB"
time=2024-10-26T17:52:48.042Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB"
time=2024-10-26T17:52:48.042Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB"
time=2024-10-26T17:52:48.292Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB"
time=2024-10-26T17:52:48.292Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB"
time=2024-10-26T17:52:48.542Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB"
time=2024-10-26T17:52:48.543Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB"
time=2024-10-26T17:52:48.792Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB"
time=2024-10-26T17:52:48.792Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB"
time=2024-10-26T17:52:49.042Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB"
time=2024-10-26T17:52:49.042Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB"
time=2024-10-26T17:52:49.292Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB"
time=2024-10-26T17:52:49.292Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB"
time=2024-10-26T17:52:49.542Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB"
time=2024-10-26T17:52:49.542Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB"
time=2024-10-26T17:52:49.792Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB"
time=2024-10-26T17:52:49.792Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB"
time=2024-10-26T17:52:50.042Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB"
time=2024-10-26T17:52:50.042Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB"
time=2024-10-26T17:52:50.292Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB"
time=2024-10-26T17:52:50.292Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB"
time=2024-10-26T17:52:50.542Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB"
time=2024-10-26T17:52:50.542Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB"
time=2024-10-26T17:52:50.792Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB"
time=2024-10-26T17:52:50.792Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB"
time=2024-10-26T17:52:51.042Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB"
time=2024-10-26T17:52:51.042Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB"
time=2024-10-26T17:52:51.293Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB"
time=2024-10-26T17:52:51.293Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB"
time=2024-10-26T17:52:51.542Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB"
time=2024-10-26T17:52:51.543Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB"
time=2024-10-26T17:52:51.792Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB"
time=2024-10-26T17:52:51.792Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB"
time=2024-10-26T17:52:52.042Z level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=5.000772676 model=/root/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45
time=2024-10-26T17:52:52.042Z level=DEBUG source=sched.go:384 msg="sending an unloaded event" modelPath=/root/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45
time=2024-10-26T17:52:52.042Z level=DEBUG source=sched.go:308 msg="ignoring unload event with no pending requests"
time=2024-10-26T17:52:52.042Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB"
time=2024-10-26T17:52:52.042Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB"
time=2024-10-26T17:52:52.292Z level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=5.250770057 model=/root/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45
<!-- gh-comment-id:2439676420 --> @sebastianlutter commented on GitHub (Oct 26, 2024): I increased the GPU memory to 4GB, then tested again with llama3.2:1b * it decides to use the GPU but fails ``` msg=evaluating library=rocm gpu_count=1 available="[3.4 GiB]" time=2024-10-26T17:52:44.880Z level=INFO source=sched.go:714 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 gpu=0 parallel=4 available=3654434816 required="2.5 GiB" [ . . . ] /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory rocBLAS error: Could not initialize Tensile host: No devices found [ . . . ] ``` * maybe related: https://github.com/ollama/ollama/issues/6685 * more logs ``` [ . . . ] time=2024-10-26T17:52:44.879Z level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[3.4 GiB]" time=2024-10-26T17:52:44.880Z level=INFO source=sched.go:714 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 gpu=0 parallel=4 available=3654434816 required="2.5 GiB" time=2024-10-26T17:52:44.881Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.0 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.0 GiB" now.free_swap="6.8 GiB" time=2024-10-26T17:52:44.881Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB" time=2024-10-26T17:52:44.881Z level=INFO source=server.go:105 msg="system memory" total="11.5 GiB" free="8.0 GiB" free_swap="6.8 GiB" time=2024-10-26T17:52:44.881Z level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[3.4 GiB]" time=2024-10-26T17:52:44.881Z level=INFO source=memory.go:326 msg="offload to rocm" layers.requested=-1 layers.model=17 layers.offload=17 layers.split="" memory.available="[3.4 GiB]" memory.gpu_overhead="0 B" memory.required.full="2.5 GiB" memory.required.partial="2.5 GiB" memory.required.kv="256.0 MiB" memory.required.allocations="[2.5 GiB]" memory.weights.total="1.2 GiB" memory.weights.repeating="976.1 MiB" memory.weights.nonrepeating="266.2 MiB" memory.graph.full="544.0 MiB" memory.graph.partial="554.3 MiB" time=2024-10-26T17:52:44.882Z level=DEBUG source=common.go:294 msg="availableServers : found" file=/usr/lib/ollama/runners/cpu/ollama_llama_server time=2024-10-26T17:52:44.882Z level=DEBUG source=common.go:294 msg="availableServers : found" file=/usr/lib/ollama/runners/cpu_avx/ollama_llama_server time=2024-10-26T17:52:44.882Z level=DEBUG source=common.go:294 msg="availableServers : found" file=/usr/lib/ollama/runners/cpu_avx2/ollama_llama_server time=2024-10-26T17:52:44.882Z level=DEBUG source=common.go:294 msg="availableServers : found" file=/usr/lib/ollama/runners/rocm_v60102/ollama_llama_server time=2024-10-26T17:52:44.882Z level=DEBUG source=common.go:294 msg="availableServers : found" file=/usr/lib/ollama/runners/cpu/ollama_llama_server time=2024-10-26T17:52:44.882Z level=DEBUG source=common.go:294 msg="availableServers : found" file=/usr/lib/ollama/runners/cpu_avx/ollama_llama_server time=2024-10-26T17:52:44.882Z level=DEBUG source=common.go:294 msg="availableServers : found" file=/usr/lib/ollama/runners/cpu_avx2/ollama_llama_server time=2024-10-26T17:52:44.882Z level=DEBUG source=common.go:294 msg="availableServers : found" file=/usr/lib/ollama/runners/rocm_v60102/ollama_llama_server time=2024-10-26T17:52:44.883Z level=INFO source=server.go:388 msg="starting llama server" cmd="/usr/lib/ollama/runners/rocm_v60102/ollama_llama_server --model /root/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 --ctx-size 8192 --batch-size 512 --embedding --n-gpu-layers 17 --verbose --threads 4 --parallel 4 --port 40561" time=2024-10-26T17:52:44.883Z level=DEBUG source=server.go:405 msg=subprocess environment="[HSA_OVERRIDE_GFX_VERSION=gfx1030 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/runners/rocm_v60102 HIP_VISIBLE_DEVICES=0]" time=2024-10-26T17:52:44.883Z level=INFO source=sched.go:449 msg="loaded runners" count=1 time=2024-10-26T17:52:44.883Z level=INFO source=server.go:587 msg="waiting for llama runner to start responding" time=2024-10-26T17:52:44.884Z level=INFO source=server.go:621 msg="waiting for server to become available" status="llm server error" INFO [main] starting c++ runner | tid="140419050369856" timestamp=1729965164 INFO [main] build info | build=10 commit="a04710e" tid="140419050369856" timestamp=1729965164 INFO [main] system info | n_threads=4 n_threads_batch=4 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="140419050369856" timestamp=1729965164 total_threads=8 INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="7" port="40561" tid="140419050369856" timestamp=1729965164 llama_model_loader: loaded meta data with 30 key-value pairs and 147 tensors from /root/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Llama 3.2 1B Instruct llama_model_loader: - kv 3: general.finetune str = Instruct llama_model_loader: - kv 4: general.basename str = Llama-3.2 llama_model_loader: - kv 5: general.size_label str = 1B llama_model_loader: - kv 6: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam... llama_model_loader: - kv 7: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ... llama_model_loader: - kv 8: llama.block_count u32 = 16 llama_model_loader: - kv 9: llama.context_length u32 = 131072 llama_model_loader: - kv 10: llama.embedding_length u32 = 2048 llama_model_loader: - kv 11: llama.feed_forward_length u32 = 8192 llama_model_loader: - kv 12: llama.attention.head_count u32 = 32 llama_model_loader: - kv 13: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 14: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 15: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 16: llama.attention.key_length u32 = 64 llama_model_loader: - kv 17: llama.attention.value_length u32 = 64 llama_model_loader: - kv 18: general.file_type u32 = 7 llama_model_loader: - kv 19: llama.vocab_size u32 = 128256 llama_model_loader: - kv 20: llama.rope.dimension_count u32 = 64 llama_model_loader: - kv 21: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 22: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 23: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 24: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... time=2024-10-26T17:52:45.135Z level=INFO source=server.go:621 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: - kv 25: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 26: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 27: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 28: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... llama_model_loader: - kv 29: general.quantization_version u32 = 2 llama_model_loader: - type f32: 34 tensors llama_model_loader: - type q8_0: 113 tensors llm_load_vocab: special tokens cache size = 256 llm_load_vocab: token to piece cache size = 0.7999 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 128256 llm_load_print_meta: n_merges = 280147 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 131072 llm_load_print_meta: n_embd = 2048 llm_load_print_meta: n_layer = 16 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_rot = 64 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 64 llm_load_print_meta: n_embd_head_v = 64 llm_load_print_meta: n_gqa = 4 llm_load_print_meta: n_embd_k_gqa = 512 llm_load_print_meta: n_embd_v_gqa = 512 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 8192 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 500000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 131072 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: ssm_dt_b_c_rms = 0 llm_load_print_meta: model type = 1B llm_load_print_meta: model ftype = Q8_0 llm_load_print_meta: model params = 1.24 B llm_load_print_meta: model size = 1.22 GiB (8.50 BPW) llm_load_print_meta: general.name = Llama 3.2 1B Instruct llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' llm_load_print_meta: EOS token = 128009 '<|eot_id|>' llm_load_print_meta: LF token = 128 'Ä' llm_load_print_meta: EOT token = 128009 '<|eot_id|>' llm_load_print_meta: EOM token = 128008 '<|eom_id|>' llm_load_print_meta: EOG token = 128008 '<|eom_id|>' llm_load_print_meta: EOG token = 128009 '<|eot_id|>' llm_load_print_meta: max token length = 256 /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory rocBLAS error: Could not initialize Tensile host: No devices found time=2024-10-26T17:52:46.339Z level=INFO source=server.go:621 msg="waiting for server to become available" status="llm server not responding" time=2024-10-26T17:52:46.809Z level=DEBUG source=server.go:428 msg="llama runner terminated" error="signal: aborted (core dumped)" time=2024-10-26T17:52:47.041Z level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner process has terminated: error:Could not initialize Tensile host: No devices found" time=2024-10-26T17:52:47.041Z level=DEBUG source=sched.go:458 msg="triggering expiration for failed load" model=/root/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 time=2024-10-26T17:52:47.041Z level=DEBUG source=sched.go:360 msg="runner expired event received" modelPath=/root/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 time=2024-10-26T17:52:47.041Z level=DEBUG source=sched.go:375 msg="got lock to unload" modelPath=/root/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 [GIN] 2024/10/26 - 17:52:47 | 500 | 2.264259318s | 127.0.0.1 | POST "/api/generate" time=2024-10-26T17:52:47.041Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.0 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.0 GiB" now.free_swap="6.8 GiB" time=2024-10-26T17:52:47.041Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB" time=2024-10-26T17:52:47.041Z level=DEBUG source=server.go:1086 msg="stopping llama server" time=2024-10-26T17:52:47.041Z level=DEBUG source=sched.go:380 msg="runner released" modelPath=/root/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 time=2024-10-26T17:52:47.292Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.0 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB" time=2024-10-26T17:52:47.292Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB" time=2024-10-26T17:52:47.542Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB" time=2024-10-26T17:52:47.542Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB" time=2024-10-26T17:52:47.792Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB" time=2024-10-26T17:52:47.792Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB" time=2024-10-26T17:52:48.042Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB" time=2024-10-26T17:52:48.042Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB" time=2024-10-26T17:52:48.292Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB" time=2024-10-26T17:52:48.292Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB" time=2024-10-26T17:52:48.542Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB" time=2024-10-26T17:52:48.543Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB" time=2024-10-26T17:52:48.792Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB" time=2024-10-26T17:52:48.792Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB" time=2024-10-26T17:52:49.042Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB" time=2024-10-26T17:52:49.042Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB" time=2024-10-26T17:52:49.292Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB" time=2024-10-26T17:52:49.292Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB" time=2024-10-26T17:52:49.542Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB" time=2024-10-26T17:52:49.542Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB" time=2024-10-26T17:52:49.792Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB" time=2024-10-26T17:52:49.792Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB" time=2024-10-26T17:52:50.042Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB" time=2024-10-26T17:52:50.042Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB" time=2024-10-26T17:52:50.292Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB" time=2024-10-26T17:52:50.292Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB" time=2024-10-26T17:52:50.542Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB" time=2024-10-26T17:52:50.542Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB" time=2024-10-26T17:52:50.792Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB" time=2024-10-26T17:52:50.792Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB" time=2024-10-26T17:52:51.042Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB" time=2024-10-26T17:52:51.042Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB" time=2024-10-26T17:52:51.293Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB" time=2024-10-26T17:52:51.293Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB" time=2024-10-26T17:52:51.542Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB" time=2024-10-26T17:52:51.543Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB" time=2024-10-26T17:52:51.792Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB" time=2024-10-26T17:52:51.792Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB" time=2024-10-26T17:52:52.042Z level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=5.000772676 model=/root/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 time=2024-10-26T17:52:52.042Z level=DEBUG source=sched.go:384 msg="sending an unloaded event" modelPath=/root/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 time=2024-10-26T17:52:52.042Z level=DEBUG source=sched.go:308 msg="ignoring unload event with no pending requests" time=2024-10-26T17:52:52.042Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.1 GiB" before.free_swap="6.8 GiB" now.total="11.5 GiB" now.free="8.1 GiB" now.free_swap="6.8 GiB" time=2024-10-26T17:52:52.042Z level=DEBUG source=amd_linux.go:485 msg="updating rocm free memory" gpu=0 name=1002:163f before="3.4 GiB" now="3.4 GiB" time=2024-10-26T17:52:52.292Z level=WARN source=sched.go:646 msg="gpu VRAM usage didn't recover within timeout" seconds=5.250770057 model=/root/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 ```
Author
Owner

@7h145 commented on GitHub (Oct 27, 2024):

Can confirm "won't work" on Steam Deck, SteamOS 3.6.19 in "desktop mode", running podman in rootless mode with either of

  • docker.io/ollama/ollama:0.3.14-rocm
  • docker.io/ollama/ollama:0.4.0-rc5-rocm

My logs below are from ollama:0.4.0-rc5-rocm.

The exact mode of "won't work" is different for HSA_OVERRIDE_GFX_VERSION=10.3.0 vs HSA_OVERRIDE_GFX_VERSION=gfx1030; the ollama server does run at first with either one (see below), but both fail on inference: while 10.3.0 does load the model very slowly and fails afterwards, gfx1030 fails immediately with Could not initialize Tensile host.

In both cases the ollama server starts up "more or less fine":

  1. with HSA_OVERRIDE_GFX_VERSION=10.3.0
  2. with HSA_OVERRIDE_GFX_VERSION=gfx1030

The diff (without timestamp differences) is...

#> diff <(sed -E 's/time=[^ ]+//' HSA_OVERRIDE_GFX_VERSION-10.3.0) <(sed -E 's/time=[^ ]+//' HSA_OVERRIDE_GFX_VERSION-gfx1030)
1c1
< 2024/10/27 09:47:49 routes.go:1170: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:10.3.0 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
---
> 2024/10/27 09:48:04 routes.go:1170: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:gfx1030 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
9c9
<  level=INFO source=common.go:82 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2 rocm]"
---
>  level=INFO source=common.go:82 msg="Dynamic LLM libraries" runners="[rocm cpu cpu_avx cpu_avx2]"
30c30
<  level=INFO source=amd_linux.go:386 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=10.3.0
---
>  level=INFO source=amd_linux.go:386 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030

As in "nothing to see here"? Maybe this "loads more or less fine" is what people reported as success earlier?

But on actual inference (e.g. podman exec -it ollama ollama run llama3.2:1b), these two variants behave indeed differently:

The HSA_OVERRIDE_GFX_VERSION=10.3.0 does find the device and loads the model, but does so very, very slowly, then hangs (indefinitely?)

ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon Graphics, compute capability 10.3, VMM: no
llm_load_tensors: ggml ctx size =    0.14 MiB
llm_load_tensors: offloading 16 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 17/17 layers to GPU
llm_load_tensors:      ROCm0 buffer size =  1252.42 MiB
llm_load_tensors:        CPU buffer size =   266.16 MiB
time=2024-10-27T10:03:03.603Z level=DEBUG source=llama-server.go:579 msg="model load progress 0.00"
time=2024-10-27T10:04:03.641Z level=DEBUG source=llama-server.go:579 msg="model load progress 0.18"
[...]
time=2024-10-27T10:29:45.402Z level=DEBUG source=llama-server.go:579 msg="model load progress 0.49"
time=2024-10-27T10:30:02.472Z level=DEBUG source=llama-server.go:579 msg="model load progress 0.50"
[...]
time=2024-10-27T10:30:46.157Z level=DEBUG source=llama-server.go:579 msg="model load progress 0.75"
time=2024-10-27T10:30:46.408Z level=DEBUG source=llama-server.go:579 msg="model load progress 0.82"
llama_new_context_with_model: n_ctx      = 8192
llama_new_context_with_model: n_batch    = 2048
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      ROCm0 KV buffer size =   256.00 MiB
llama_new_context_with_model: KV self size  =  256.00 MiB, K (f16):  128.00 MiB, V (f16):  128.00 MiB
llama_new_context_with_model:  ROCm_Host  output buffer size =     1.99 MiB
llama_new_context_with_model:      ROCm0 compute buffer size =   544.00 MiB
llama_new_context_with_model:  ROCm_Host compute buffer size =    20.01 MiB
llama_new_context_with_model: graph nodes  = 518
llama_new_context_with_model: graph splits = 2
time=2024-10-27T10:30:46.659Z level=INFO source=llama-server.go:573 msg="llama runner started in 1666.82 seconds"

This seems to be a memory allocation thing; the actual slowness of the loading seems to differ depending on previously allocated UMA frame buffer "VRAM". The example above is on "cold" VRAM, taking 30 minutes for an 1b model.

But after all this tedious loading, the inference just hangs indefinitely with

time=2024-10-27T10:33:40.238Z level=DEBUG source=routes.go:1434 msg="chat request" images=0 prompt="<|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\ntest<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
time=2024-10-27T10:33:40.239Z level=DEBUG source=cache.go:105 msg="loading cache slot" id=0 cache=0 prompt=26 used=0 remaining=26

The HSA_OVERRIDE_GFX_VERSION=gfx1030 fails immediately

 rocBLAS error: Could not initialize Tensile host: No devices found
time=2024-10-27T10:00:17.671Z level=INFO source=llama-server.go:568 msg="waiting for server to become available" status="llm server not responding"
time=2024-10-27T10:00:18.296Z level=DEBUG source=llama-server.go:395 msg="llama runner terminated" error="signal: aborted (core dumped)"
time=2024-10-27T10:00:18.372Z level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: error:Could not initialize Tensile host: No devices found"

This error has been reported previously; it appears to me that gfx1030 is not helpful here and the documentation indicates that this is not even a recognized value for HSA_OVERRIDE_GFX_VERSION. I'm pretty sure that this is not a podman permission issue as speculated above (at least in my case).

I can provide more logs, but this gets byzantine fast, especially with AMD_LOG_LEVEL=3.

It seems to me that this is an incompatibility between current ROCm and the "AMD Custom APU 0405" in the Steam Deck?

Ps: Suggestion for easier debugging: maybe include rocminfo and rocm-smi in the ollama:rocm images?

<!-- gh-comment-id:2439969319 --> @7h145 commented on GitHub (Oct 27, 2024): Can confirm "won't work" on Steam Deck, SteamOS 3.6.19 in "desktop mode", running podman in rootless mode with either of * docker.io/ollama/ollama:0.3.14-rocm * docker.io/ollama/ollama:0.4.0-rc5-rocm My logs below are from ollama:0.4.0-rc5-rocm. The exact mode of "won't work" is different for `HSA_OVERRIDE_GFX_VERSION=10.3.0` vs `HSA_OVERRIDE_GFX_VERSION=gfx1030`; the ollama server does run at first with either one (see below), but both fail on inference: while `10.3.0` does load the model *very slowly* and fails afterwards, `gfx1030` fails immediately with `Could not initialize Tensile host`. In both cases the ollama server starts up "more or less fine": 1. [with `HSA_OVERRIDE_GFX_VERSION=10.3.0`](https://gist.github.com/7h145/780a5b56480252a85668814e3d2409ee) 2. [with `HSA_OVERRIDE_GFX_VERSION=gfx1030`](https://gist.github.com/7h145/c22665afd80b75e68f7ceb6e838860d8) The diff (without timestamp differences) is... ```diff #> diff <(sed -E 's/time=[^ ]+//' HSA_OVERRIDE_GFX_VERSION-10.3.0) <(sed -E 's/time=[^ ]+//' HSA_OVERRIDE_GFX_VERSION-gfx1030) 1c1 < 2024/10/27 09:47:49 routes.go:1170: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:10.3.0 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" --- > 2024/10/27 09:48:04 routes.go:1170: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:gfx1030 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" 9c9 < level=INFO source=common.go:82 msg="Dynamic LLM libraries" runners="[cpu cpu_avx cpu_avx2 rocm]" --- > level=INFO source=common.go:82 msg="Dynamic LLM libraries" runners="[rocm cpu cpu_avx cpu_avx2]" 30c30 < level=INFO source=amd_linux.go:386 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=10.3.0 --- > level=INFO source=amd_linux.go:386 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=gfx1030 ``` As in "nothing to see here"? Maybe this "loads more or less fine" is what people reported as success earlier? But on actual inference (e.g. `podman exec -it ollama ollama run llama3.2:1b`), these two variants behave indeed differently: ### The `HSA_OVERRIDE_GFX_VERSION=10.3.0` does find the device and loads the model, but does so *very, very slowly*, then hangs (indefinitely?) ``` ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon Graphics, compute capability 10.3, VMM: no llm_load_tensors: ggml ctx size = 0.14 MiB llm_load_tensors: offloading 16 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 17/17 layers to GPU llm_load_tensors: ROCm0 buffer size = 1252.42 MiB llm_load_tensors: CPU buffer size = 266.16 MiB time=2024-10-27T10:03:03.603Z level=DEBUG source=llama-server.go:579 msg="model load progress 0.00" time=2024-10-27T10:04:03.641Z level=DEBUG source=llama-server.go:579 msg="model load progress 0.18" [...] time=2024-10-27T10:29:45.402Z level=DEBUG source=llama-server.go:579 msg="model load progress 0.49" time=2024-10-27T10:30:02.472Z level=DEBUG source=llama-server.go:579 msg="model load progress 0.50" [...] time=2024-10-27T10:30:46.157Z level=DEBUG source=llama-server.go:579 msg="model load progress 0.75" time=2024-10-27T10:30:46.408Z level=DEBUG source=llama-server.go:579 msg="model load progress 0.82" llama_new_context_with_model: n_ctx = 8192 llama_new_context_with_model: n_batch = 2048 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: flash_attn = 0 llama_new_context_with_model: freq_base = 500000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: ROCm0 KV buffer size = 256.00 MiB llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB llama_new_context_with_model: ROCm_Host output buffer size = 1.99 MiB llama_new_context_with_model: ROCm0 compute buffer size = 544.00 MiB llama_new_context_with_model: ROCm_Host compute buffer size = 20.01 MiB llama_new_context_with_model: graph nodes = 518 llama_new_context_with_model: graph splits = 2 time=2024-10-27T10:30:46.659Z level=INFO source=llama-server.go:573 msg="llama runner started in 1666.82 seconds" ``` This seems to be a memory allocation thing; the actual slowness of the loading seems to differ depending on previously allocated UMA frame buffer "VRAM". The example above is on "cold" VRAM, taking 30 minutes for an 1b model. But after all this tedious loading, the inference just hangs indefinitely with ``` time=2024-10-27T10:33:40.238Z level=DEBUG source=routes.go:1434 msg="chat request" images=0 prompt="<|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\ntest<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" time=2024-10-27T10:33:40.239Z level=DEBUG source=cache.go:105 msg="loading cache slot" id=0 cache=0 prompt=26 used=0 remaining=26 ``` ### The `HSA_OVERRIDE_GFX_VERSION=gfx1030` fails immediately ``` rocBLAS error: Could not initialize Tensile host: No devices found time=2024-10-27T10:00:17.671Z level=INFO source=llama-server.go:568 msg="waiting for server to become available" status="llm server not responding" time=2024-10-27T10:00:18.296Z level=DEBUG source=llama-server.go:395 msg="llama runner terminated" error="signal: aborted (core dumped)" time=2024-10-27T10:00:18.372Z level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: error:Could not initialize Tensile host: No devices found" ``` This error has been reported previously; it appears to me that `gfx1030` is not helpful here and [the documentation indicates that this is not even a recognized value for `HSA_OVERRIDE_GFX_VERSION`](https://github.com/ollama/ollama/blob/main/docs/gpu.md#overrides-on-linux). I'm pretty sure that this is not a `podman` permission issue as speculated above (at least in my case). I can provide more logs, but this gets byzantine fast, especially with `AMD_LOG_LEVEL=3`. It seems to me that this is an incompatibility between current ROCm and the "AMD Custom APU 0405" in the Steam Deck? Ps: Suggestion for easier debugging: maybe include `rocminfo` and `rocm-smi` in the `ollama:rocm` images?
Author
Owner

@7h145 commented on GitHub (Oct 27, 2024):

Addendum ... Logs with OLLAMA_DEBUG=1 and AMD_LOG_LEVEL=3.

Same setup as before: Steam Deck, SteamOS 3.6.19 in "desktop mode", podman rootless. Using only HSA_OVERRIDE_GFX_VERSION=10.3.0 here.

Running ollama (at 2024-10-27T12:51:19) with

podman run \
  --name ollama \
  --device /dev/kfd --device /dev/dri \
  --volume 'ollama:/root/.ollama' \
  --publish '11434:11434' \
  --env 'HSA_OVERRIDE_GFX_VERSION=10.3.0' \
  --env 'OLLAMA_DEBUG=1' \
  --env 'AMD_LOG_LEVEL=3' \
  docker.io/ollama/ollama:0.4.0-rc5-rocm

and then running inference (at 2024-10-27T12:51:26) with

podman exec -it ollama ollama run llama3.2:1b test --keepalive -1m --verbose 

yields the following rather lengthy (300kb) log; the llama3.2:1b model needs about 55 minutes to load, the inference hangs indefinitely after that.

The server startup is complete at 2024-10-27T12:51:19

time=2024-10-27T12:51:19.940Z level=INFO source=types.go:123 msg="inference compute" id=0 library=rocm variant="" compute=gfx1033 driver=0.0 name=1002:163f total="4.0 GiB" available="3.6 GiB"

The inference starts at 2024-10-27T12:51:26

time=2024-10-27T12:51:26.668Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.9 GiB" before.free_swap="13.8 GiB" now.total="11.5 GiB" now.free="8.8 GiB" now.free_swap="13.8 GiB"

The model is successfully loaded at 2024-10-27T13:47:01 after about 55 Minutes

:3:hip_stream.cpp           :460 : 15635089234 us: [pid:22    tid:0x7f4e72bfd640] hipStreamSynchronize: Returned hipSuccess :
llama_new_context_with_model:      ROCm0 compute buffer size =   544.00 MiB
llama_new_context_with_model:  ROCm_Host compute buffer size =    20.01 MiB
llama_new_context_with_model: graph nodes  = 518
llama_new_context_with_model: graph splits = 2
time=2024-10-27T13:47:01.670Z level=INFO source=llama-server.go:573 msg="llama runner started in 3334.93 seconds"

At this point, the inference just hangs/does "nothing".

Stopping the inference via ^C at 2024-10-27T13:51:50 yields

time=2024-10-27T13:51:50.211Z level=DEBUG source=sched.go:340 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/root/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 duration=2562047h47m16.854775807s
time=2024-10-27T13:51:50.211Z level=DEBUG source=sched.go:358 msg="after processing request finished event" modelPath=/root/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 refCount=0
[GIN] 2024/10/27 - 13:51:50 | 200 |       1h0m23s |       127.0.0.1 | POST     "/api/generate"

Remark: Kind of a toy problem as long as the "VRAM" on the Deck is limited to 4g (and ultra-slow). This is all somewhat interesting and "should work" IMHO, but the real world use cases for something like this are limited. The up to 4g UMA frame buffer "VRAM" the Steam Deck APU provides can offload only small models which will in most cases run more or less acceptable on the Steam Deck CPU.

<!-- gh-comment-id:2440034379 --> @7h145 commented on GitHub (Oct 27, 2024): Addendum ... Logs with `OLLAMA_DEBUG=1` and `AMD_LOG_LEVEL=3`. Same setup as before: Steam Deck, SteamOS 3.6.19 in "desktop mode", podman rootless. Using only `HSA_OVERRIDE_GFX_VERSION=10.3.0` here. Running ollama (at 2024-10-27T12:51:19) with ```sh podman run \ --name ollama \ --device /dev/kfd --device /dev/dri \ --volume 'ollama:/root/.ollama' \ --publish '11434:11434' \ --env 'HSA_OVERRIDE_GFX_VERSION=10.3.0' \ --env 'OLLAMA_DEBUG=1' \ --env 'AMD_LOG_LEVEL=3' \ docker.io/ollama/ollama:0.4.0-rc5-rocm ``` and then running inference (at 2024-10-27T12:51:26) with ```sh podman exec -it ollama ollama run llama3.2:1b test --keepalive -1m --verbose ``` [yields the following rather lengthy (300kb) log](https://gist.github.com/7h145/95ec34850def90fc209ef9bbbf68e56c); the llama3.2:1b model needs about 55 minutes to load, the inference hangs indefinitely after that. The server startup is complete at 2024-10-27T12:51:19 ``` time=2024-10-27T12:51:19.940Z level=INFO source=types.go:123 msg="inference compute" id=0 library=rocm variant="" compute=gfx1033 driver=0.0 name=1002:163f total="4.0 GiB" available="3.6 GiB" ``` The inference starts at 2024-10-27T12:51:26 ``` time=2024-10-27T12:51:26.668Z level=DEBUG source=gpu.go:396 msg="updating system memory data" before.total="11.5 GiB" before.free="8.9 GiB" before.free_swap="13.8 GiB" now.total="11.5 GiB" now.free="8.8 GiB" now.free_swap="13.8 GiB" ``` The model is successfully loaded at 2024-10-27T13:47:01 after about 55 Minutes ``` :3:hip_stream.cpp :460 : 15635089234 us: [pid:22 tid:0x7f4e72bfd640] hipStreamSynchronize: Returned hipSuccess : llama_new_context_with_model: ROCm0 compute buffer size = 544.00 MiB llama_new_context_with_model: ROCm_Host compute buffer size = 20.01 MiB llama_new_context_with_model: graph nodes = 518 llama_new_context_with_model: graph splits = 2 time=2024-10-27T13:47:01.670Z level=INFO source=llama-server.go:573 msg="llama runner started in 3334.93 seconds" ``` At this point, the inference just hangs/does "nothing". Stopping the inference via `^C` at 2024-10-27T13:51:50 yields ``` time=2024-10-27T13:51:50.211Z level=DEBUG source=sched.go:340 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/root/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 duration=2562047h47m16.854775807s time=2024-10-27T13:51:50.211Z level=DEBUG source=sched.go:358 msg="after processing request finished event" modelPath=/root/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45 refCount=0 [GIN] 2024/10/27 - 13:51:50 | 200 | 1h0m23s | 127.0.0.1 | POST "/api/generate" ``` Remark: Kind of a toy problem as long as the "VRAM" on the Deck is limited to 4g (and ultra-slow). This is all somewhat interesting and "should work" IMHO, but the real world use cases for something like this are limited. The up to 4g UMA frame buffer "VRAM" the Steam Deck APU provides can offload only small models which will in most cases run more or less acceptable on the Steam Deck CPU.
Author
Owner

@ehartford commented on GitHub (Nov 8, 2024):

Anyone had success with this?

<!-- gh-comment-id:2463774425 --> @ehartford commented on GitHub (Nov 8, 2024): Anyone had success with this?
Author
Owner

@Talleyrand-34 commented on GitHub (Nov 15, 2024):

Anyone had success with this?

In my case the rocm docker container with podman (you can use docker)

<!-- gh-comment-id:2479758419 --> @Talleyrand-34 commented on GitHub (Nov 15, 2024): > Anyone had success with this? In my case the rocm docker container with podman (you can use docker)
Author
Owner

@sebastianlutter commented on GitHub (Nov 18, 2024):

Have had no success with ollama docker yet, and lacking time for further tests

<!-- gh-comment-id:2482602058 --> @sebastianlutter commented on GitHub (Nov 18, 2024): Have had no success with ollama docker yet, and lacking time for further tests
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#64037