[GH-ISSUE #6144] Automated install script breaks ollama installation on AMD Vega 64 #29598

Closed
opened 2026-04-22 08:36:21 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @mathatan on GitHub (Aug 2, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6144

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

After debugging for a while (see https://github.com/ollama/ollama/issues/5143#issuecomment-2265824021 and https://github.com/ollama/ollama/issues/5143#issuecomment-2265892604 for details) I came to realize that Ollama install.sh breaks an upgrade (on my setup at least), unless I edit it so that it exit 0's after line trap install_success EXIT (currently line 81)

I'm not entirely sure what happens, but the gist of the matter is that manually copying the new Ollama binary to /usr/local/bin works just fine but trying to run the install script will break everything permanently. The only way I have been able to recover my Ollama installation has been to use a hard rollback on ZFS.

Also, I have tried multiple times to install everything from scratch and have been positively unable to get Ollama working. While I don't know what is going on, here are the details of my system and steps to repeat installation:

CPU: AMD Ryzen 9 3900X 12-Core Processor
GPU: Radeon RX Vega 64 (gfx900)
OS: Ubuntu 22.04.4 LTS (in a jail within TrueNAS-SCALE-24.04.2)

Rocm libs:

Package: rocm-libs
Version: 6.1.2.60102-119~22.04
Priority: optional
Section: devel
Maintainer: ROCm Dev Support <rocm-dev.support@amd.com>
Installed-Size: 13.3 kB
Depends: hipblas (= 2.1.0.60102-119~22.04),
 hipblaslt (= 0.7.0.60102-119~22.04),
 hipfft (= 1.0.14.60102-119~22.04),
 hipsolver (= 2.1.1.60102-119~22.04),
 hipsparse (= 3.0.1.60102-119~22.04),
 hiptensor (= 1.2.0.60102-119~22.04),
 miopen-hip (= 3.1.0.60102-119~22.04),
 half (= 1.12.0.60102-119~22.04),
 rccl (= 2.18.6.60102-119~22.04),
 rocalution (= 3.1.1.60102-119~22.04),
 rocblas (= 4.1.2.60102-119~22.04),
 rocfft (= 1.0.27.60102-119~22.04),
 rocrand (= 3.0.1.60102-119~22.04),
 hiprand (= 2.10.16.60102-119~22.04),
 rocsolver (= 3.25.0.60102-119~22.04),
 rocsparse (= 3.1.2.60102-119~22.04),
 rocm-core (= 6.1.2.60102-119~22.04),
 hipsparselt (= 0.2.0.60102-119~22.04),
 composablekernel-dev (= 1.1.0.60102-119~22.04),
 hipblas-dev (= 2.1.0.60102-119~22.04),
 hipblaslt-dev (= 0.7.0.60102-119~22.04),
 hipcub-dev (= 3.1.0.60102-119~22.04),
 hipfft-dev (= 1.0.14.60102-119~22.04),
 hipsolver-dev (= 2.1.1.60102-119~22.04),
 hipsparse-dev (= 3.0.1.60102-119~22.04),
 hiptensor-dev (= 1.2.0.60102-119~22.04),
 miopen-hip-dev (= 3.1.0.60102-119~22.04),
 rccl-dev (= 2.18.6.60102-119~22.04),
 rocalution-dev (= 3.1.1.60102-119~22.04),
 rocblas-dev (= 4.1.2.60102-119~22.04),
 rocfft-dev (= 1.0.27.60102-119~22.04),
 rocprim-dev (= 3.1.0.60102-119~22.04),
 rocrand-dev (= 3.0.1.60102-119~22.04),
 hiprand-dev (= 2.10.16.60102-119~22.04),
 rocsolver-dev (= 3.25.0.60102-119~22.04),
 rocsparse-dev (= 3.1.2.60102-119~22.04),
 rocthrust-dev (= 3.0.1.60102-119~22.04),
 rocwmma-dev (= 1.4.0.60102-119~22.04),
 hipsparselt-dev (= 0.2.0.60102-119~22.04)
Homepage: https://github.com/RadeonOpenCompute/ROCm
Download-Size: 1,068 B
APT-Sources: https://repo.radeon.com/rocm/apt/6.1.2 jammy/main amd64 Packages
Description: Radeon Open Compute (ROCm) Runtime software stack

Jail installation (requires the amazing https://github.com/Jip-Hop/jailmaker script)
Truenas:

$ jlmkr create --distro=ubuntu --release=jammy ollama-ubuntu-jammy --bind=/dev/dri --bind=/dev/kfd
$ jlmkr edit ollama-ubuntu-jammy

(add)
systemd_nspawn_user_args=--bind=/dev/dri
            --bind=/dev/kfd
+            --bind='/mnt/NVME/jails/ollama:/usr/share/ollama/.ollama'
+            --property=DeviceAllow="/dev/kfd rw"

$ jlmkr start ollama-ubuntu-jammy
$ jlmkr shell ollama-ubuntu-jammy

Jail:

$ apt install curl
$ curl -O https://repo.radeon.com/amdgpu-install/6.0.2/ubuntu/jammy/amdgpu-install_6.0.60002-1_all.deb
$ apt install ./amdgpu-install_6.0.60002-1_all.deb
$ apt update
$ apt install amdgpu-dkms rocm radeontop
$ reboot

(reconnect)

$ curl -fsSL https://ollama.com/install.sh | sh
$ systemctl edit ollama.service

(add)

[Service]
Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_LLM_LIBRARY=rocm_v60002"
Environment="HSA_OVERRIDE_GFX_VERSION=9.0.0"
Environment="OLLAMA_NOHISTORY=1"
Environment="HSA_ENABLE_SDMA=0"
Environment="OLLAMA_DEBUG=1"

$ systemctl restart ollama.service

It should be noted that there might be something I'm missing on these steps like creating the user and groups (jail only has root user by-default), which I'm not entirely sure how it went, but as the install.sh breaks a working setup I doubt that any missing step here is an actual issue. Also if logs are needed you can check them from the previously linked comments.

These are the working logs:

Aug  2 22:01:33 ollama-ubuntu-jammy systemd[1]: Started Ollama Service.
Aug  2 22:01:33 ollama-ubuntu-jammy ollama[387]: 2024/08/02 22:01:33 routes.go:1109: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:9.0.0 OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY:rocm_v60002 OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:true OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
Aug  2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.855+03:00 level=INFO source=images.go:781 msg="total blobs: 77"
Aug  2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.857+03:00 level=INFO source=images.go:788 msg="total unused blobs removed: 0"
Aug  2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=INFO source=routes.go:1156 msg="Listening on [::]:11434 (version 0.3.2)"
Aug  2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3196240257/runners
Aug  2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz
Aug  2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz
Aug  2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz
Aug  2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz
Aug  2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz
Aug  2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz
Aug  2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz
Aug  2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/deps.txt.gz
Aug  2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/ollama_llama_server.gz
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu/ollama_llama_server
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu_avx/ollama_llama_server
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu_avx2/ollama_llama_server
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cuda_v11/ollama_llama_server
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/rocm_v60102/ollama_llama_server
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60102]"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=DEBUG source=sched.go:105 msg="starting llm scheduler"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=INFO source=gpu.go:205 msg="looking for compatible GPUs"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.873+03:00 level=DEBUG source=gpu.go:91 msg="searching for GPU discovery libraries for NVIDIA"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.873+03:00 level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcuda.so*
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.873+03:00 level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.874+03:00 level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[]
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.874+03:00 level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcudart.so*
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.874+03:00 level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/libcudart.so** /tmp/ollama3196240257/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.874+03:00 level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[/tmp/ollama3196240257/runners/cuda_v11/libcudart.so.11.0]
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: cudaSetDevice err: 35
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=gpu.go:533 msg="Unable to load cudart" library=/tmp/ollama3196240257/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=WARN source=amd_linux.go:59 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_linux.go:127 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_linux.go:217 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=26751 unique_id=150031596308672932
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_linux.go:251 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card0/device
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_linux.go:283 msg="amdgpu memory" gpu=0 total="8.0 GiB"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_linux.go:284 msg="amdgpu memory" gpu=0 available="8.0 GiB"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/local/bin/rocm"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.876+03:00 level=INFO source=amd_linux.go:348 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=9.0.0
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.876+03:00 level=INFO source=types.go:105 msg="inference compute" id=0 library=rocm compute=gfx900 driver=0.0 name=1002:687f total="8.0 GiB" available="8.0 GiB"
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: [GIN] 2024/08/02 - 22:01:39 | 200 |      24.509µs |       127.0.0.1 | HEAD     "/"
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: [GIN] 2024/08/02 - 22:01:39 | 200 |    4.530772ms |       127.0.0.1 | POST     "/api/show"
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.389+03:00 level=DEBUG source=gpu.go:358 msg="updating system memory data" before.total="62.7 GiB" before.free="50.4 GiB" before.free_swap="0 B" now.total="62.7 GiB" now.free="50.4 GiB" now.free_swap="0 B"
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.389+03:00 level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:687f before="8.0 GiB" now="8.0 GiB"
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.389+03:00 level=DEBUG source=sched.go:181 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=0x816020 gpu_count=1
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.397+03:00 level=DEBUG source=sched.go:219 msg="loading first model" model=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.397+03:00 level=DEBUG source=memory.go:101 msg=evaluating library=rocm gpu_count=1 available="[8.0 GiB]"
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.397+03:00 level=INFO source=sched.go:710 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf gpu=0 parallel=4 available=8564838400 required="6.1 GiB"
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.397+03:00 level=DEBUG source=server.go:100 msg="system memory" total="62.7 GiB" free="50.4 GiB" free_swap="0 B"
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.397+03:00 level=DEBUG source=memory.go:101 msg=evaluating library=rocm gpu_count=1 available="[8.0 GiB]"
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=INFO source=memory.go:309 msg="offload to rocm" layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[8.0 GiB]" memory.required.full="6.1 GiB" memory.required.partial="6.1 GiB" memory.required.kv="3.0 GiB" memory.required.allocations="[6.1 GiB]" memory.weights.total="4.9 GiB" memory.weights.repeating="4.8 GiB" memory.weights.nonrepeating="77.1 MiB" memory.graph.full="512.0 MiB" memory.graph.partial="512.0 MiB"
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu/ollama_llama_server
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu_avx/ollama_llama_server
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu_avx2/ollama_llama_server
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cuda_v11/ollama_llama_server
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/rocm_v60102/ollama_llama_server
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu/ollama_llama_server
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu_avx/ollama_llama_server
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu_avx2/ollama_llama_server
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cuda_v11/ollama_llama_server
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/rocm_v60102/ollama_llama_server
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=INFO source=server.go:170 msg="Invalid OLLAMA_LLM_LIBRARY rocm_v60002 - not found"
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.399+03:00 level=INFO source=server.go:384 msg="starting llama server" cmd="/tmp/ollama3196240257/runners/rocm_v60102/ollama_llama_server --model /root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 33 --verbose --parallel 4 --port 46879"
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.399+03:00 level=DEBUG source=server.go:401 msg=subprocess environment="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin HSA_OVERRIDE_GFX_VERSION=9.0.0 HSA_ENABLE_SDMA=0 LD_LIBRARY_PATH=/opt/rocm/lib:/tmp/ollama3196240257/runners/rocm_v60102:/tmp/ollama3196240257/runners HIP_VISIBLE_DEVICES=0]"
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.399+03:00 level=INFO source=sched.go:445 msg="loaded runners" count=1
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.399+03:00 level=INFO source=server.go:584 msg="waiting for llama runner to start responding"
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.399+03:00 level=INFO source=server.go:618 msg="waiting for server to become available" status="llm server error"
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[423]: INFO [main] build info | build=1 commit="6eeaeba" tid="139675947496256" timestamp=1722625299
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[423]: INFO [main] system info | n_threads=12 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="139675947496256" timestamp=1722625299 total_threads=24
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[423]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="23" port="46879" tid="139675947496256" timestamp=1722625299
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: loaded meta data with 36 key-value pairs and 197 tensors from /root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf (version GGUF V3 (latest))
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv   0:                       general.architecture str              = phi3
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv   1:                               general.type str              = model
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv   2:                               general.name str              = Phi 3 Mini 128k Instruct
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv   3:                           general.finetune str              = 128k-instruct
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv   4:                           general.basename str              = Phi-3
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv   5:                         general.size_label str              = mini
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv   6:                            general.license str              = mit
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv   7:                       general.license.link str              = https://huggingface.co/microsoft/Phi-...
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv   8:                               general.tags arr[str,3]       = ["nlp", "code", "text-generation"]
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv   9:                          general.languages arr[str,1]       = ["en"]
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  10:                        phi3.context_length u32              = 131072
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  11:  phi3.rope.scaling.original_context_length u32              = 4096
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  12:                      phi3.embedding_length u32              = 3072
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  13:                   phi3.feed_forward_length u32              = 8192
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  14:                           phi3.block_count u32              = 32
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  15:                  phi3.attention.head_count u32              = 32
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  16:               phi3.attention.head_count_kv u32              = 32
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  17:      phi3.attention.layer_norm_rms_epsilon f32              = 0.000010
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  18:                  phi3.rope.dimension_count u32              = 96
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  19:                        phi3.rope.freq_base f32              = 10000.000000
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  20:                          general.file_type u32              = 2
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  21:              phi3.attention.sliding_window u32              = 262144
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  22:              phi3.rope.scaling.attn_factor f32              = 1.190238
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = llama
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = default
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,32064]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  26:                      tokenizer.ggml.scores arr[f32,32064]   = [-1000.000000, -1000.000000, -1000.00...
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  27:                  tokenizer.ggml.token_type arr[i32,32064]   = [3, 3, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  28:                tokenizer.ggml.bos_token_id u32              = 1
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  29:                tokenizer.ggml.eos_token_id u32              = 32000
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  30:            tokenizer.ggml.unknown_token_id u32              = 0
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  31:            tokenizer.ggml.padding_token_id u32              = 32000
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  32:               tokenizer.ggml.add_bos_token bool             = false
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  33:               tokenizer.ggml.add_eos_token bool             = false
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  34:                    tokenizer.chat_template str              = {% for message in messages %}{% if me...
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  35:               general.quantization_version u32              = 2
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - type  f32:   67 tensors
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - type q4_0:  129 tensors
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - type q6_K:    1 tensors
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_vocab: special tokens cache size = 14
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_vocab: token to piece cache size = 0.1685 MB
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: format           = GGUF V3 (latest)
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: arch             = phi3
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: vocab type       = SPM
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_vocab          = 32064
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_merges         = 0
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: vocab_only       = 0
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_ctx_train      = 131072
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_embd           = 3072
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_layer          = 32
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_head           = 32
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_head_kv        = 32
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_rot            = 96
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_swa            = 262144
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_embd_head_k    = 96
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_embd_head_v    = 96
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_gqa            = 1
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_embd_k_gqa     = 3072
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_embd_v_gqa     = 3072
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: f_norm_eps       = 0.0e+00
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: f_logit_scale    = 0.0e+00
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_ff             = 8192
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_expert         = 0
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_expert_used    = 0
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: causal attn      = 1
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: pooling type     = 0
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: rope type        = 2
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: rope scaling     = linear
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: freq_base_train  = 10000.0
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: freq_scale_train = 1
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_ctx_orig_yarn  = 4096
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: rope_finetuned   = unknown
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: ssm_d_conv       = 0
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: ssm_d_inner      = 0
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: ssm_d_state      = 0
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: ssm_dt_rank      = 0
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: model type       = 3B
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: model ftype      = Q4_0
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: model params     = 3.82 B
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: model size       = 2.03 GiB (4.55 BPW)
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: general.name     = Phi 3 Mini 128k Instruct
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: BOS token        = 1 '<s>'
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: EOS token        = 32000 '<|endoftext|>'
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: UNK token        = 0 '<unk>'
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: PAD token        = 32000 '<|endoftext|>'
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: LF token         = 13 '<0x0A>'
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: EOT token        = 32007 '<|end|>'
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: max token length = 48
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.651+03:00 level=INFO source=server.go:618 msg="waiting for server to become available" status="llm server loading model"
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: ggml_cuda_init: found 1 ROCm devices:
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]:   Device 0: Radeon RX Vega, compute capability 9.0, VMM: no
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llm_load_tensors: ggml ctx size =    0.21 MiB
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llm_load_tensors: offloading 32 repeating layers to GPU
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llm_load_tensors: offloading non-repeating layers to GPU
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llm_load_tensors: offloaded 33/33 layers to GPU
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llm_load_tensors:      ROCm0 buffer size =  2021.84 MiB
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llm_load_tensors:        CPU buffer size =    52.84 MiB
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:40.403+03:00 level=DEBUG source=server.go:629 msg="model load progress 0.08"
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:40.654+03:00 level=DEBUG source=server.go:629 msg="model load progress 0.57"
Aug  2 22:01:40 ollama-ubuntu-jammy systemd[1]: Starting Daily apt upgrade and clean activities...
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:40.905+03:00 level=DEBUG source=server.go:629 msg="model load progress 0.97"
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: n_ctx      = 8192
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: n_batch    = 512
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: n_ubatch   = 512
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: flash_attn = 0
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: freq_base  = 10000.0
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: freq_scale = 1
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_kv_cache_init:      ROCm0 KV buffer size =  3072.00 MiB
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: KV self size  = 3072.00 MiB, K (f16): 1536.00 MiB, V (f16): 1536.00 MiB
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model:  ROCm_Host  output buffer size =     0.54 MiB
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model:      ROCm0 compute buffer size =   564.00 MiB
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model:  ROCm_Host compute buffer size =    22.01 MiB
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: graph nodes  = 1286
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: graph splits = 2
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[423]: DEBUG [initialize] initializing slots | n_slots=4 tid="139675947496256" timestamp=1722625300
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[423]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=0 tid="139675947496256" timestamp=1722625300
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[423]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=1 tid="139675947496256" timestamp=1722625300
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[423]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=2 tid="139675947496256" timestamp=1722625300
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[423]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=3 tid="139675947496256" timestamp=1722625300
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[423]: INFO [main] model loaded | tid="139675947496256" timestamp=1722625300
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[423]: DEBUG [update_slots] all slots are idle and system prompt is empty, clear the KV cache | tid="139675947496256" timestamp=1722625300
Aug  2 22:01:41 ollama-ubuntu-jammy systemd[1]: apt-daily-upgrade.service: Deactivated successfully.
Aug  2 22:01:41 ollama-ubuntu-jammy systemd[1]: Finished Daily apt upgrade and clean activities.
Aug  2 22:01:41 ollama-ubuntu-jammy ollama[423]: DEBUG [process_single_task] slot data | n_idle_slots=4 n_processing_slots=0 task_id=0 tid="139675947496256" timestamp=1722625301
Aug  2 22:01:41 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:41.156+03:00 level=INFO source=server.go:623 msg="llama runner started in 1.76 seconds"
Aug  2 22:01:41 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:41.156+03:00 level=DEBUG source=sched.go:458 msg="finished setting up runner" model=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf
Aug  2 22:01:41 ollama-ubuntu-jammy ollama[387]: [GIN] 2024/08/02 - 22:01:41 | 200 |  1.775479674s |       127.0.0.1 | POST     "/api/chat"
Aug  2 22:01:41 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:41.156+03:00 level=DEBUG source=sched.go:462 msg="context for request finished"
Aug  2 22:01:41 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:41.156+03:00 level=DEBUG source=sched.go:334 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf duration=5m0s
Aug  2 22:01:41 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:41.156+03:00 level=DEBUG source=sched.go:352 msg="after processing request finished event" modelPath=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf refCount=0
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:45.117+03:00 level=DEBUG source=sched.go:571 msg="evaluating already loaded" model=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [process_single_task] slot data | n_idle_slots=4 n_processing_slots=0 task_id=1 tid="139675947496256" timestamp=1722625305
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:45.118+03:00 level=DEBUG source=routes.go:1347 msg="chat request" images=0 prompt="<|user|>\nHey there!<|end|>\n<|assistant|>\n"
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [process_single_task] slot data | n_idle_slots=4 n_processing_slots=0 task_id=2 tid="139675947496256" timestamp=1722625305
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [launch_slot_with_data] slot is processing task | slot_id=0 task_id=3 tid="139675947496256" timestamp=1722625305
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [update_slots] slot progression | ga_i=0 n_past=0 n_past_se=0 n_prompt_tokens_processed=13 slot_id=0 task_id=3 tid="139675947496256" timestamp=1722625305
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [update_slots] kv cache rm [p0, end) | p0=0 slot_id=0 task_id=3 tid="139675947496256" timestamp=1722625305
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [print_timings] prompt eval time     =      98.30 ms /    13 tokens (    7.56 ms per token,   132.25 tokens per second) | n_prompt_tokens_processed=13 n_tokens_second=132.25091049665303 slot_id=0 t_prompt_processing=98.298 t_token=7.561384615384616 task_id=3 tid="139675947496256" timestamp=1722625305
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [print_timings] generation eval time =     171.00 ms /    10 runs   (   17.10 ms per token,    58.48 tokens per second) | n_decoded=10 n_tokens_second=58.48021614287887 slot_id=0 t_token=17.0998 t_token_generation=170.998 task_id=3 tid="139675947496256" timestamp=1722625305
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [print_timings]           total time =     269.30 ms | slot_id=0 t_prompt_processing=98.298 t_token_generation=170.998 t_total=269.296 task_id=3 tid="139675947496256" timestamp=1722625305
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [update_slots] slot released | n_cache_tokens=23 n_ctx=8192 n_past=22 n_system_tokens=0 slot_id=0 task_id=3 tid="139675947496256" timestamp=1722625305 truncated=false
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [log_server_request] request | method="POST" params={} path="/completion" remote_addr="127.0.0.1" remote_port=48122 status=200 tid="139675929642560" timestamp=1722625305
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[387]: [GIN] 2024/08/02 - 22:01:45 | 200 |  319.564933ms |       127.0.0.1 | POST     "/api/chat"
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:45.429+03:00 level=DEBUG source=sched.go:403 msg="context for request finished"
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:45.429+03:00 level=DEBUG source=sched.go:334 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf duration=5m0s
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:45.429+03:00 level=DEBUG source=sched.go:352 msg="after processing request finished event" modelPath=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf refCount=0

Thanks for an amazing software/app, I hope someone can wrap their head around this issue!

OS

Linux

GPU

AMD

CPU

AMD

Ollama version

from 0.1.48 to 0.3.x

Originally created by @mathatan on GitHub (Aug 2, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6144 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? After debugging for a while (see https://github.com/ollama/ollama/issues/5143#issuecomment-2265824021 and https://github.com/ollama/ollama/issues/5143#issuecomment-2265892604 for details) I came to realize that Ollama install.sh breaks an upgrade (on my setup at least), unless I edit it so that it `exit 0`'s after line `trap install_success EXIT` (currently line 81) I'm not entirely sure what happens, but the gist of the matter is that manually copying the new Ollama binary to `/usr/local/bin` works just fine but trying to run the install script will break everything permanently. The only way I have been able to recover my Ollama installation has been to use a hard rollback on ZFS. Also, I have tried multiple times to install everything from scratch and have been positively unable to get Ollama working. While I don't know what is going on, here are the details of my system and steps to repeat installation: CPU: AMD Ryzen 9 3900X 12-Core Processor GPU: Radeon RX Vega 64 (gfx900) OS: Ubuntu 22.04.4 LTS (in a jail within TrueNAS-SCALE-24.04.2) Rocm libs: ``` Package: rocm-libs Version: 6.1.2.60102-119~22.04 Priority: optional Section: devel Maintainer: ROCm Dev Support <rocm-dev.support@amd.com> Installed-Size: 13.3 kB Depends: hipblas (= 2.1.0.60102-119~22.04), hipblaslt (= 0.7.0.60102-119~22.04), hipfft (= 1.0.14.60102-119~22.04), hipsolver (= 2.1.1.60102-119~22.04), hipsparse (= 3.0.1.60102-119~22.04), hiptensor (= 1.2.0.60102-119~22.04), miopen-hip (= 3.1.0.60102-119~22.04), half (= 1.12.0.60102-119~22.04), rccl (= 2.18.6.60102-119~22.04), rocalution (= 3.1.1.60102-119~22.04), rocblas (= 4.1.2.60102-119~22.04), rocfft (= 1.0.27.60102-119~22.04), rocrand (= 3.0.1.60102-119~22.04), hiprand (= 2.10.16.60102-119~22.04), rocsolver (= 3.25.0.60102-119~22.04), rocsparse (= 3.1.2.60102-119~22.04), rocm-core (= 6.1.2.60102-119~22.04), hipsparselt (= 0.2.0.60102-119~22.04), composablekernel-dev (= 1.1.0.60102-119~22.04), hipblas-dev (= 2.1.0.60102-119~22.04), hipblaslt-dev (= 0.7.0.60102-119~22.04), hipcub-dev (= 3.1.0.60102-119~22.04), hipfft-dev (= 1.0.14.60102-119~22.04), hipsolver-dev (= 2.1.1.60102-119~22.04), hipsparse-dev (= 3.0.1.60102-119~22.04), hiptensor-dev (= 1.2.0.60102-119~22.04), miopen-hip-dev (= 3.1.0.60102-119~22.04), rccl-dev (= 2.18.6.60102-119~22.04), rocalution-dev (= 3.1.1.60102-119~22.04), rocblas-dev (= 4.1.2.60102-119~22.04), rocfft-dev (= 1.0.27.60102-119~22.04), rocprim-dev (= 3.1.0.60102-119~22.04), rocrand-dev (= 3.0.1.60102-119~22.04), hiprand-dev (= 2.10.16.60102-119~22.04), rocsolver-dev (= 3.25.0.60102-119~22.04), rocsparse-dev (= 3.1.2.60102-119~22.04), rocthrust-dev (= 3.0.1.60102-119~22.04), rocwmma-dev (= 1.4.0.60102-119~22.04), hipsparselt-dev (= 0.2.0.60102-119~22.04) Homepage: https://github.com/RadeonOpenCompute/ROCm Download-Size: 1,068 B APT-Sources: https://repo.radeon.com/rocm/apt/6.1.2 jammy/main amd64 Packages Description: Radeon Open Compute (ROCm) Runtime software stack ``` Jail installation (requires the amazing https://github.com/Jip-Hop/jailmaker script) Truenas: ``` $ jlmkr create --distro=ubuntu --release=jammy ollama-ubuntu-jammy --bind=/dev/dri --bind=/dev/kfd $ jlmkr edit ollama-ubuntu-jammy (add) systemd_nspawn_user_args=--bind=/dev/dri --bind=/dev/kfd + --bind='/mnt/NVME/jails/ollama:/usr/share/ollama/.ollama' + --property=DeviceAllow="/dev/kfd rw" $ jlmkr start ollama-ubuntu-jammy $ jlmkr shell ollama-ubuntu-jammy ``` Jail: ``` $ apt install curl $ curl -O https://repo.radeon.com/amdgpu-install/6.0.2/ubuntu/jammy/amdgpu-install_6.0.60002-1_all.deb $ apt install ./amdgpu-install_6.0.60002-1_all.deb $ apt update $ apt install amdgpu-dkms rocm radeontop $ reboot (reconnect) $ curl -fsSL https://ollama.com/install.sh | sh $ systemctl edit ollama.service (add) [Service] Environment="OLLAMA_HOST=0.0.0.0" Environment="OLLAMA_LLM_LIBRARY=rocm_v60002" Environment="HSA_OVERRIDE_GFX_VERSION=9.0.0" Environment="OLLAMA_NOHISTORY=1" Environment="HSA_ENABLE_SDMA=0" Environment="OLLAMA_DEBUG=1" $ systemctl restart ollama.service ``` It should be noted that there might be something I'm missing on these steps like creating the user and groups (jail only has root user by-default), which I'm not entirely sure how it went, but as the install.sh breaks a working setup I doubt that any missing step here is an actual issue. Also if logs are needed you can check them from the previously linked comments. These are the **working** logs: ``` Aug 2 22:01:33 ollama-ubuntu-jammy systemd[1]: Started Ollama Service. Aug 2 22:01:33 ollama-ubuntu-jammy ollama[387]: 2024/08/02 22:01:33 routes.go:1109: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:9.0.0 OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY:rocm_v60002 OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:true OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]" Aug 2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.855+03:00 level=INFO source=images.go:781 msg="total blobs: 77" Aug 2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.857+03:00 level=INFO source=images.go:788 msg="total unused blobs removed: 0" Aug 2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=INFO source=routes.go:1156 msg="Listening on [::]:11434 (version 0.3.2)" Aug 2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3196240257/runners Aug 2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz Aug 2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz Aug 2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz Aug 2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz Aug 2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz Aug 2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz Aug 2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz Aug 2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/deps.txt.gz Aug 2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/ollama_llama_server.gz Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu/ollama_llama_server Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu_avx/ollama_llama_server Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu_avx2/ollama_llama_server Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cuda_v11/ollama_llama_server Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/rocm_v60102/ollama_llama_server Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60102]" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=DEBUG source=sched.go:105 msg="starting llm scheduler" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=INFO source=gpu.go:205 msg="looking for compatible GPUs" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.873+03:00 level=DEBUG source=gpu.go:91 msg="searching for GPU discovery libraries for NVIDIA" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.873+03:00 level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcuda.so* Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.873+03:00 level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.874+03:00 level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[] Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.874+03:00 level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcudart.so* Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.874+03:00 level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/libcudart.so** /tmp/ollama3196240257/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.874+03:00 level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[/tmp/ollama3196240257/runners/cuda_v11/libcudart.so.11.0] Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: cudaSetDevice err: 35 Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=gpu.go:533 msg="Unable to load cudart" library=/tmp/ollama3196240257/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=WARN source=amd_linux.go:59 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_linux.go:127 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_linux.go:217 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=26751 unique_id=150031596308672932 Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_linux.go:251 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card0/device Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_linux.go:283 msg="amdgpu memory" gpu=0 total="8.0 GiB" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_linux.go:284 msg="amdgpu memory" gpu=0 available="8.0 GiB" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/local/bin/rocm" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.876+03:00 level=INFO source=amd_linux.go:348 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=9.0.0 Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.876+03:00 level=INFO source=types.go:105 msg="inference compute" id=0 library=rocm compute=gfx900 driver=0.0 name=1002:687f total="8.0 GiB" available="8.0 GiB" Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: [GIN] 2024/08/02 - 22:01:39 | 200 | 24.509µs | 127.0.0.1 | HEAD "/" Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: [GIN] 2024/08/02 - 22:01:39 | 200 | 4.530772ms | 127.0.0.1 | POST "/api/show" Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.389+03:00 level=DEBUG source=gpu.go:358 msg="updating system memory data" before.total="62.7 GiB" before.free="50.4 GiB" before.free_swap="0 B" now.total="62.7 GiB" now.free="50.4 GiB" now.free_swap="0 B" Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.389+03:00 level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:687f before="8.0 GiB" now="8.0 GiB" Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.389+03:00 level=DEBUG source=sched.go:181 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=0x816020 gpu_count=1 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.397+03:00 level=DEBUG source=sched.go:219 msg="loading first model" model=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.397+03:00 level=DEBUG source=memory.go:101 msg=evaluating library=rocm gpu_count=1 available="[8.0 GiB]" Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.397+03:00 level=INFO source=sched.go:710 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf gpu=0 parallel=4 available=8564838400 required="6.1 GiB" Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.397+03:00 level=DEBUG source=server.go:100 msg="system memory" total="62.7 GiB" free="50.4 GiB" free_swap="0 B" Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.397+03:00 level=DEBUG source=memory.go:101 msg=evaluating library=rocm gpu_count=1 available="[8.0 GiB]" Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=INFO source=memory.go:309 msg="offload to rocm" layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[8.0 GiB]" memory.required.full="6.1 GiB" memory.required.partial="6.1 GiB" memory.required.kv="3.0 GiB" memory.required.allocations="[6.1 GiB]" memory.weights.total="4.9 GiB" memory.weights.repeating="4.8 GiB" memory.weights.nonrepeating="77.1 MiB" memory.graph.full="512.0 MiB" memory.graph.partial="512.0 MiB" Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu/ollama_llama_server Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu_avx/ollama_llama_server Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu_avx2/ollama_llama_server Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cuda_v11/ollama_llama_server Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/rocm_v60102/ollama_llama_server Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu/ollama_llama_server Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu_avx/ollama_llama_server Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu_avx2/ollama_llama_server Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cuda_v11/ollama_llama_server Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/rocm_v60102/ollama_llama_server Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=INFO source=server.go:170 msg="Invalid OLLAMA_LLM_LIBRARY rocm_v60002 - not found" Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.399+03:00 level=INFO source=server.go:384 msg="starting llama server" cmd="/tmp/ollama3196240257/runners/rocm_v60102/ollama_llama_server --model /root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 33 --verbose --parallel 4 --port 46879" Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.399+03:00 level=DEBUG source=server.go:401 msg=subprocess environment="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin HSA_OVERRIDE_GFX_VERSION=9.0.0 HSA_ENABLE_SDMA=0 LD_LIBRARY_PATH=/opt/rocm/lib:/tmp/ollama3196240257/runners/rocm_v60102:/tmp/ollama3196240257/runners HIP_VISIBLE_DEVICES=0]" Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.399+03:00 level=INFO source=sched.go:445 msg="loaded runners" count=1 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.399+03:00 level=INFO source=server.go:584 msg="waiting for llama runner to start responding" Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.399+03:00 level=INFO source=server.go:618 msg="waiting for server to become available" status="llm server error" Aug 2 22:01:39 ollama-ubuntu-jammy ollama[423]: INFO [main] build info | build=1 commit="6eeaeba" tid="139675947496256" timestamp=1722625299 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[423]: INFO [main] system info | n_threads=12 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="139675947496256" timestamp=1722625299 total_threads=24 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[423]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="23" port="46879" tid="139675947496256" timestamp=1722625299 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: loaded meta data with 36 key-value pairs and 197 tensors from /root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf (version GGUF V3 (latest)) Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 0: general.architecture str = phi3 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 1: general.type str = model Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 2: general.name str = Phi 3 Mini 128k Instruct Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 3: general.finetune str = 128k-instruct Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 4: general.basename str = Phi-3 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 5: general.size_label str = mini Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 6: general.license str = mit Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/microsoft/Phi-... Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 8: general.tags arr[str,3] = ["nlp", "code", "text-generation"] Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 9: general.languages arr[str,1] = ["en"] Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 10: phi3.context_length u32 = 131072 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 11: phi3.rope.scaling.original_context_length u32 = 4096 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 12: phi3.embedding_length u32 = 3072 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 13: phi3.feed_forward_length u32 = 8192 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 14: phi3.block_count u32 = 32 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 15: phi3.attention.head_count u32 = 32 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 16: phi3.attention.head_count_kv u32 = 32 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 17: phi3.attention.layer_norm_rms_epsilon f32 = 0.000010 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 18: phi3.rope.dimension_count u32 = 96 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 19: phi3.rope.freq_base f32 = 10000.000000 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 20: general.file_type u32 = 2 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 21: phi3.attention.sliding_window u32 = 262144 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 22: phi3.rope.scaling.attn_factor f32 = 1.190238 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 23: tokenizer.ggml.model str = llama Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 24: tokenizer.ggml.pre str = default Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,32064] = ["<unk>", "<s>", "</s>", "<0x00>", "<... Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 26: tokenizer.ggml.scores arr[f32,32064] = [-1000.000000, -1000.000000, -1000.00... Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 27: tokenizer.ggml.token_type arr[i32,32064] = [3, 3, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 28: tokenizer.ggml.bos_token_id u32 = 1 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 29: tokenizer.ggml.eos_token_id u32 = 32000 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 30: tokenizer.ggml.unknown_token_id u32 = 0 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 31: tokenizer.ggml.padding_token_id u32 = 32000 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 32: tokenizer.ggml.add_bos_token bool = false Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 33: tokenizer.ggml.add_eos_token bool = false Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 34: tokenizer.chat_template str = {% for message in messages %}{% if me... Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 35: general.quantization_version u32 = 2 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - type f32: 67 tensors Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - type q4_0: 129 tensors Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - type q6_K: 1 tensors Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_vocab: special tokens cache size = 14 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_vocab: token to piece cache size = 0.1685 MB Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: format = GGUF V3 (latest) Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: arch = phi3 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: vocab type = SPM Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_vocab = 32064 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_merges = 0 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: vocab_only = 0 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_ctx_train = 131072 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_embd = 3072 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_layer = 32 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_head = 32 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_head_kv = 32 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_rot = 96 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_swa = 262144 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_embd_head_k = 96 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_embd_head_v = 96 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_gqa = 1 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_embd_k_gqa = 3072 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_embd_v_gqa = 3072 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: f_norm_eps = 0.0e+00 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: f_logit_scale = 0.0e+00 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_ff = 8192 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_expert = 0 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_expert_used = 0 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: causal attn = 1 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: pooling type = 0 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: rope type = 2 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: rope scaling = linear Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: freq_base_train = 10000.0 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: freq_scale_train = 1 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_ctx_orig_yarn = 4096 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: rope_finetuned = unknown Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: ssm_d_conv = 0 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: ssm_d_inner = 0 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: ssm_d_state = 0 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: ssm_dt_rank = 0 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: model type = 3B Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: model ftype = Q4_0 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: model params = 3.82 B Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: model size = 2.03 GiB (4.55 BPW) Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: general.name = Phi 3 Mini 128k Instruct Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: BOS token = 1 '<s>' Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: EOS token = 32000 '<|endoftext|>' Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: UNK token = 0 '<unk>' Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: PAD token = 32000 '<|endoftext|>' Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: LF token = 13 '<0x0A>' Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: EOT token = 32007 '<|end|>' Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: max token length = 48 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.651+03:00 level=INFO source=server.go:618 msg="waiting for server to become available" status="llm server loading model" Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: ggml_cuda_init: found 1 ROCm devices: Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: Device 0: Radeon RX Vega, compute capability 9.0, VMM: no Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llm_load_tensors: ggml ctx size = 0.21 MiB Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llm_load_tensors: offloading 32 repeating layers to GPU Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llm_load_tensors: offloading non-repeating layers to GPU Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llm_load_tensors: offloaded 33/33 layers to GPU Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llm_load_tensors: ROCm0 buffer size = 2021.84 MiB Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llm_load_tensors: CPU buffer size = 52.84 MiB Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:40.403+03:00 level=DEBUG source=server.go:629 msg="model load progress 0.08" Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:40.654+03:00 level=DEBUG source=server.go:629 msg="model load progress 0.57" Aug 2 22:01:40 ollama-ubuntu-jammy systemd[1]: Starting Daily apt upgrade and clean activities... Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:40.905+03:00 level=DEBUG source=server.go:629 msg="model load progress 0.97" Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: n_ctx = 8192 Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: n_batch = 512 Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: n_ubatch = 512 Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: flash_attn = 0 Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: freq_base = 10000.0 Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: freq_scale = 1 Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_kv_cache_init: ROCm0 KV buffer size = 3072.00 MiB Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: KV self size = 3072.00 MiB, K (f16): 1536.00 MiB, V (f16): 1536.00 MiB Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: ROCm_Host output buffer size = 0.54 MiB Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: ROCm0 compute buffer size = 564.00 MiB Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: ROCm_Host compute buffer size = 22.01 MiB Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: graph nodes = 1286 Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: graph splits = 2 Aug 2 22:01:40 ollama-ubuntu-jammy ollama[423]: DEBUG [initialize] initializing slots | n_slots=4 tid="139675947496256" timestamp=1722625300 Aug 2 22:01:40 ollama-ubuntu-jammy ollama[423]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=0 tid="139675947496256" timestamp=1722625300 Aug 2 22:01:40 ollama-ubuntu-jammy ollama[423]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=1 tid="139675947496256" timestamp=1722625300 Aug 2 22:01:40 ollama-ubuntu-jammy ollama[423]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=2 tid="139675947496256" timestamp=1722625300 Aug 2 22:01:40 ollama-ubuntu-jammy ollama[423]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=3 tid="139675947496256" timestamp=1722625300 Aug 2 22:01:40 ollama-ubuntu-jammy ollama[423]: INFO [main] model loaded | tid="139675947496256" timestamp=1722625300 Aug 2 22:01:40 ollama-ubuntu-jammy ollama[423]: DEBUG [update_slots] all slots are idle and system prompt is empty, clear the KV cache | tid="139675947496256" timestamp=1722625300 Aug 2 22:01:41 ollama-ubuntu-jammy systemd[1]: apt-daily-upgrade.service: Deactivated successfully. Aug 2 22:01:41 ollama-ubuntu-jammy systemd[1]: Finished Daily apt upgrade and clean activities. Aug 2 22:01:41 ollama-ubuntu-jammy ollama[423]: DEBUG [process_single_task] slot data | n_idle_slots=4 n_processing_slots=0 task_id=0 tid="139675947496256" timestamp=1722625301 Aug 2 22:01:41 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:41.156+03:00 level=INFO source=server.go:623 msg="llama runner started in 1.76 seconds" Aug 2 22:01:41 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:41.156+03:00 level=DEBUG source=sched.go:458 msg="finished setting up runner" model=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf Aug 2 22:01:41 ollama-ubuntu-jammy ollama[387]: [GIN] 2024/08/02 - 22:01:41 | 200 | 1.775479674s | 127.0.0.1 | POST "/api/chat" Aug 2 22:01:41 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:41.156+03:00 level=DEBUG source=sched.go:462 msg="context for request finished" Aug 2 22:01:41 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:41.156+03:00 level=DEBUG source=sched.go:334 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf duration=5m0s Aug 2 22:01:41 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:41.156+03:00 level=DEBUG source=sched.go:352 msg="after processing request finished event" modelPath=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf refCount=0 Aug 2 22:01:45 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:45.117+03:00 level=DEBUG source=sched.go:571 msg="evaluating already loaded" model=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf Aug 2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [process_single_task] slot data | n_idle_slots=4 n_processing_slots=0 task_id=1 tid="139675947496256" timestamp=1722625305 Aug 2 22:01:45 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:45.118+03:00 level=DEBUG source=routes.go:1347 msg="chat request" images=0 prompt="<|user|>\nHey there!<|end|>\n<|assistant|>\n" Aug 2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [process_single_task] slot data | n_idle_slots=4 n_processing_slots=0 task_id=2 tid="139675947496256" timestamp=1722625305 Aug 2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [launch_slot_with_data] slot is processing task | slot_id=0 task_id=3 tid="139675947496256" timestamp=1722625305 Aug 2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [update_slots] slot progression | ga_i=0 n_past=0 n_past_se=0 n_prompt_tokens_processed=13 slot_id=0 task_id=3 tid="139675947496256" timestamp=1722625305 Aug 2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [update_slots] kv cache rm [p0, end) | p0=0 slot_id=0 task_id=3 tid="139675947496256" timestamp=1722625305 Aug 2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [print_timings] prompt eval time = 98.30 ms / 13 tokens ( 7.56 ms per token, 132.25 tokens per second) | n_prompt_tokens_processed=13 n_tokens_second=132.25091049665303 slot_id=0 t_prompt_processing=98.298 t_token=7.561384615384616 task_id=3 tid="139675947496256" timestamp=1722625305 Aug 2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [print_timings] generation eval time = 171.00 ms / 10 runs ( 17.10 ms per token, 58.48 tokens per second) | n_decoded=10 n_tokens_second=58.48021614287887 slot_id=0 t_token=17.0998 t_token_generation=170.998 task_id=3 tid="139675947496256" timestamp=1722625305 Aug 2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [print_timings] total time = 269.30 ms | slot_id=0 t_prompt_processing=98.298 t_token_generation=170.998 t_total=269.296 task_id=3 tid="139675947496256" timestamp=1722625305 Aug 2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [update_slots] slot released | n_cache_tokens=23 n_ctx=8192 n_past=22 n_system_tokens=0 slot_id=0 task_id=3 tid="139675947496256" timestamp=1722625305 truncated=false Aug 2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [log_server_request] request | method="POST" params={} path="/completion" remote_addr="127.0.0.1" remote_port=48122 status=200 tid="139675929642560" timestamp=1722625305 Aug 2 22:01:45 ollama-ubuntu-jammy ollama[387]: [GIN] 2024/08/02 - 22:01:45 | 200 | 319.564933ms | 127.0.0.1 | POST "/api/chat" Aug 2 22:01:45 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:45.429+03:00 level=DEBUG source=sched.go:403 msg="context for request finished" Aug 2 22:01:45 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:45.429+03:00 level=DEBUG source=sched.go:334 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf duration=5m0s Aug 2 22:01:45 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:45.429+03:00 level=DEBUG source=sched.go:352 msg="after processing request finished event" modelPath=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf refCount=0 ``` Thanks for an amazing software/app, I hope someone can wrap their head around this issue! ### OS Linux ### GPU AMD ### CPU AMD ### Ollama version from 0.1.48 to 0.3.x
GiteaMirror added the installlinuxamdbug labels 2026-04-22 08:36:21 -05:00
Author
Owner

@dhiltgen commented on GitHub (Aug 2, 2024):

Can you clarify "Ollama install.sh breaks an upgrade" ? Did the install fail during install, or is Ollama crashing after upgrade, or is it failing to find your GPU, or something else? Are there server logs for the failure after upgrade?

Between 0.1.48 and 0.3.x we've bumped ROCm versions from 6.1.1 to 6.1.2 and we've been shifting to a model of favoring our bundled ROCm version.

<!-- gh-comment-id:2266129416 --> @dhiltgen commented on GitHub (Aug 2, 2024): Can you clarify "Ollama install.sh breaks an upgrade" ? Did the install fail during install, or is Ollama crashing after upgrade, or is it failing to find your GPU, or something else? Are there server logs for the failure after upgrade? Between 0.1.48 and 0.3.x we've bumped ROCm versions from 6.1.1 to 6.1.2 and we've been shifting to a model of favoring our bundled ROCm version.
Author
Owner

@mathatan commented on GitHub (Aug 3, 2024):

Oh, right. It was starting to get late and I didn't even remember to report the actual issue.

After running the install script and trying to load a model (via ollama run) I get:
rocBLAS error: Could not initialize Tensile host: No devices found

The device is available from, as it's working perfectly fine otherwise. Also, it's not possible to revert the installation, i.e. installing an older version of Ollama will not fix things. If I just replace the executable manually (or by editing the install script) everything works fine.

Here are the detailed info and logs (copy pasted from the original link):

Environment:

[Service]
Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_LLM_LIBRARY=rocm_v60002"
Environment="HSA_OVERRIDE_GFX_VERSION=9.0.0"
Environment="OLLAMA_NOHISTORY=1"
Environment="HSA_ENABLE_SDMA=0"
Environment="OLLAMA_DEBUG=1"
Environment="AMD_SERIALIZE_KERNEL=3"

Logs:

Aug  2 20:05:43 ollama-ubuntu-jammy-2 systemd[1]: Started Ollama Service.
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: 2024/08/02 20:05:43 routes.go:1109: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:9.0.0 OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY:rocm_v60002 OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_NOHISTORY:true OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.405+03:00 level=INFO source=images.go:781 msg="total blobs: 77"
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.407+03:00 level=INFO source=images.go:788 msg="total unused blobs removed: 0"
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=INFO source=routes.go:1156 msg="Listening on [::]:11434 (version 0.3.2)"
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama802441264/runners
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/deps.txt.gz
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/ollama_llama_server.gz
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu/ollama_llama_server
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx/ollama_llama_server
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx2/ollama_llama_server
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cuda_v11/ollama_llama_server
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/rocm_v60102/ollama_llama_server
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cuda_v11 rocm_v60102 cpu]"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=sched.go:105 msg="starting llm scheduler"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=INFO source=gpu.go:205 msg="looking for compatible GPUs"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=gpu.go:91 msg="searching for GPU discovery libraries for NVIDIA"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcuda.so*
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.278+03:00 level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[]
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.278+03:00 level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcudart.so*
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.278+03:00 level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/libcudart.so** /tmp/ollama802441264/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.278+03:00 level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[/tmp/ollama802441264/runners/cuda_v11/libcudart.so.11.0]
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: cudaSetDevice err: 35
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=gpu.go:533 msg="Unable to load cudart" library=/tmp/ollama802441264/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=WARN source=amd_linux.go:59 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:127 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:217 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=26751 unique_id=150031596308672932
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:251 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card0/device
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:283 msg="amdgpu memory" gpu=0 total="8.0 GiB"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:284 msg="amdgpu memory" gpu=0 available="8.0 GiB"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/local/bin/rocm"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.280+03:00 level=INFO source=amd_linux.go:348 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=9.0.0
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.280+03:00 level=INFO source=types.go:105 msg="inference compute" id=0 library=rocm compute=gfx900 driver=0.0 name=1002:687f total="8.0 GiB" available="8.0 GiB"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: [GIN] 2024/08/02 - 20:05:50 | 200 |      42.189µs |       127.0.0.1 | HEAD     "/"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: [GIN] 2024/08/02 - 20:05:50 | 200 |    4.888907ms |       127.0.0.1 | POST     "/api/show"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.541+03:00 level=DEBUG source=gpu.go:358 msg="updating system memory data" before.total="62.7 GiB" before.free="44.4 GiB" before.free_swap="0 B" now.total="62.7 GiB" now.free="44.3 GiB" now.free_swap="0 B"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.541+03:00 level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:687f before="8.0 GiB" now="8.0 GiB"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.541+03:00 level=DEBUG source=sched.go:181 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=0x816020 gpu_count=1
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=sched.go:219 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=memory.go:101 msg=evaluating library=rocm gpu_count=1 available="[8.0 GiB]"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=INFO source=sched.go:710 msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf gpu=0 parallel=4 available=8564838400 required="6.1 GiB"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=server.go:100 msg="system memory" total="62.7 GiB" free="44.3 GiB" free_swap="0 B"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=memory.go:101 msg=evaluating library=rocm gpu_count=1 available="[8.0 GiB]"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=INFO source=memory.go:309 msg="offload to rocm" layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[8.0 GiB]" memory.required.full="6.1 GiB" memory.required.partial="6.1 GiB" memory.required.kv="3.0 GiB" memory.required.allocations="[6.1 GiB]" memory.weights.total="4.9 GiB" memory.weights.repeating="4.8 GiB" memory.weights.nonrepeating="77.1 MiB" memory.graph.full="512.0 MiB" memory.graph.partial="512.0 MiB"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu/ollama_llama_server
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx/ollama_llama_server
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx2/ollama_llama_server
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cuda_v11/ollama_llama_server
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/rocm_v60102/ollama_llama_server
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu/ollama_llama_server
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx/ollama_llama_server
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx2/ollama_llama_server
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cuda_v11/ollama_llama_server
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/rocm_v60102/ollama_llama_server
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=INFO source=server.go:170 msg="Invalid OLLAMA_LLM_LIBRARY rocm_v60002 - not found"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.546+03:00 level=INFO source=server.go:384 msg="starting llama server" cmd="/tmp/ollama802441264/runners/rocm_v60102/ollama_llama_server --model /usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 33 --verbose --parallel 4 --port 46445"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.546+03:00 level=DEBUG source=server.go:401 msg=subprocess environment="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin HSA_OVERRIDE_GFX_VERSION=9.0.0 HSA_ENABLE_SDMA=0 LD_LIBRARY_PATH=/opt/rocm/lib:/tmp/ollama802441264/runners/rocm_v60102:/tmp/ollama802441264/runners HIP_VISIBLE_DEVICES=0]"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.546+03:00 level=INFO source=sched.go:445 msg="loaded runners" count=1
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.546+03:00 level=INFO source=server.go:584 msg="waiting for llama runner to start responding"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.546+03:00 level=INFO source=server.go:618 msg="waiting for server to become available" status="llm server error"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[482]: INFO [main] build info | build=1 commit="6eeaeba" tid="139832002468928" timestamp=1722618350
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[482]: INFO [main] system info | n_threads=12 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="139832002468928" timestamp=1722618350 total_threads=24
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[482]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="23" port="46445" tid="139832002468928" timestamp=1722618350
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: loaded meta data with 36 key-value pairs and 197 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf (version GGUF V3 (latest))
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv   0:                       general.architecture str              = phi3
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv   1:                               general.type str              = model
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv   2:                               general.name str              = Phi 3 Mini 128k Instruct
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv   3:                           general.finetune str              = 128k-instruct
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv   4:                           general.basename str              = Phi-3
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv   5:                         general.size_label str              = mini
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv   6:                            general.license str              = mit
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv   7:                       general.license.link str              = https://huggingface.co/microsoft/Phi-...
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv   8:                               general.tags arr[str,3]       = ["nlp", "code", "text-generation"]
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv   9:                          general.languages arr[str,1]       = ["en"]
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  10:                        phi3.context_length u32              = 131072
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  11:  phi3.rope.scaling.original_context_length u32              = 4096
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  12:                      phi3.embedding_length u32              = 3072
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  13:                   phi3.feed_forward_length u32              = 8192
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  14:                           phi3.block_count u32              = 32
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  15:                  phi3.attention.head_count u32              = 32
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  16:               phi3.attention.head_count_kv u32              = 32
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  17:      phi3.attention.layer_norm_rms_epsilon f32              = 0.000010
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  18:                  phi3.rope.dimension_count u32              = 96
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  19:                        phi3.rope.freq_base f32              = 10000.000000
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  20:                          general.file_type u32              = 2
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  21:              phi3.attention.sliding_window u32              = 262144
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  22:              phi3.rope.scaling.attn_factor f32              = 1.190238
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = llama
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = default
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,32064]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  26:                      tokenizer.ggml.scores arr[f32,32064]   = [-1000.000000, -1000.000000, -1000.00...
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  27:                  tokenizer.ggml.token_type arr[i32,32064]   = [3, 3, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  28:                tokenizer.ggml.bos_token_id u32              = 1
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  29:                tokenizer.ggml.eos_token_id u32              = 32000
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  30:            tokenizer.ggml.unknown_token_id u32              = 0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  31:            tokenizer.ggml.padding_token_id u32              = 32000
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  32:               tokenizer.ggml.add_bos_token bool             = false
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  33:               tokenizer.ggml.add_eos_token bool             = false
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  34:                    tokenizer.chat_template str              = {% for message in messages %}{% if me...
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  35:               general.quantization_version u32              = 2
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - type  f32:   67 tensors
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - type q4_0:  129 tensors
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - type q6_K:    1 tensors
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_vocab: special tokens cache size = 14
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_vocab: token to piece cache size = 0.1685 MB
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: format           = GGUF V3 (latest)
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: arch             = phi3
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: vocab type       = SPM
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_vocab          = 32064
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_merges         = 0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: vocab_only       = 0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_ctx_train      = 131072
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_embd           = 3072
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_layer          = 32
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_head           = 32
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_head_kv        = 32
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_rot            = 96
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_swa            = 262144
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_embd_head_k    = 96
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_embd_head_v    = 96
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_gqa            = 1
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_embd_k_gqa     = 3072
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_embd_v_gqa     = 3072
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: f_norm_eps       = 0.0e+00
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: f_logit_scale    = 0.0e+00
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_ff             = 8192
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_expert         = 0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_expert_used    = 0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: causal attn      = 1
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: pooling type     = 0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: rope type        = 2
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: rope scaling     = linear
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: freq_base_train  = 10000.0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: freq_scale_train = 1
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_ctx_orig_yarn  = 4096
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: rope_finetuned   = unknown
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: ssm_d_conv       = 0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: ssm_d_inner      = 0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: ssm_d_state      = 0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: ssm_dt_rank      = 0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: model type       = 3B
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: model ftype      = Q4_0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: model params     = 3.82 B
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: model size       = 2.03 GiB (4.55 BPW)
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: general.name     = Phi 3 Mini 128k Instruct
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: BOS token        = 1 '<s>'
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: EOS token        = 32000 '<|endoftext|>'
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: UNK token        = 0 '<unk>'
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: PAD token        = 32000 '<|endoftext|>'
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: LF token         = 13 '<0x0A>'
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: EOT token        = 32007 '<|end|>'
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: max token length = 48
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: rocBLAS error: Could not initialize Tensile host: No devices found
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.997+03:00 level=INFO source=server.go:618 msg="waiting for server to become available" status="llm server not responding"
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.086+03:00 level=DEBUG source=server.go:424 msg="llama runner terminated" error="signal: aborted (core dumped)"
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=ERROR source=sched.go:451 msg="error loading llama server" error="llama runner process has terminated: error:Could not initialize Tensile host: No devices found"
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=sched.go:454 msg="triggering expiration for failed load" model=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=sched.go:355 msg="runner expired event received" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=sched.go:371 msg="got lock to unload" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: [GIN] 2024/08/02 - 20:05:52 | 500 |  1.613482132s |       127.0.0.1 | POST     "/api/chat"
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=gpu.go:358 msg="updating system memory data" before.total="62.7 GiB" before.free="44.3 GiB" before.free_swap="0 B" now.total="62.7 GiB" now.free="44.3 GiB" now.free_swap="0 B"
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:687f before="8.0 GiB" now="8.0 GiB"
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=server.go:1042 msg="stopping llama server"
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=sched.go:376 msg="runner released" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.401+03:00 level=DEBUG source=gpu.go:358 msg="updating system memory data" before.total="62.7 GiB" before.free="44.3 GiB" before.free_swap="0 B" now.total="62.7 GiB" now.free="44.3 GiB" now.free_swap="0 B"
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.401+03:00 level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:687f before="8.0 GiB" now="8.0 GiB"

As can be seen, it detects the GPU perfectly fine and tries to use it, but for some reason rocBLAS fails.

ROCM-SMI:

======================================= ROCm System Management Interface =======================================
================================================= Concise Info =================================================
Device  [Model : Revision]    Temp    Power     Partitions      SCLK    MCLK    Fan  Perf  PwrCap  VRAM%  GPU%
        Name (20 chars)       (Edge)  (Socket)  (Mem, Compute)
================================================================================================================
0       [0x2308 : 0xc1]       45.0°C  4.0W      N/A, N/A        852Mhz  167Mhz  0%   auto  247.0W    0%   0%
        Vega 10 XL/XT [Radeo
================================================================================================================
============================================= End of ROCm SMI Log ==============================================

rocminfo:

ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE
System Endianness:       LITTLE
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========
HSA Agents
==========
*******
Agent 1
*******
  Name:                    AMD Ryzen 9 3900X 12-Core Processor
  Uuid:                    CPU-XX
  Marketing Name:          AMD Ryzen 9 3900X 12-Core Processor
  Vendor Name:             CPU
  Feature:                 None specified
  Profile:                 FULL_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        0(0x0)
  Queue Min Size:          0(0x0)
  Queue Max Size:          0(0x0)
  Queue Type:              MULTI
  Node:                    0
  Device Type:             CPU
  Cache Info:
    L1:                      32768(0x8000) KB
  Chip ID:                 0(0x0)
  ASIC Revision:           0(0x0)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   3800
  BDFID:                   0
  Internal Node ID:        0
  Compute Unit:            24
  SIMDs per CU:            0
  Shader Engines:          0
  Shader Arrs. per Eng.:   0
  WatchPts on Addr. Ranges:1
  Features:                None
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: FINE GRAINED
      Size:                    65771684(0x3eb98a4) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 2
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    65771684(0x3eb98a4) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 3
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    65771684(0x3eb98a4) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
  ISA Info:
*******
Agent 2
*******
  Name:                    gfx900
  Uuid:                    GPU-021504f1231031a4
  Marketing Name:          Radeon RX Vega
  Vendor Name:             AMD
  Feature:                 KERNEL_DISPATCH
  Profile:                 BASE_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        128(0x80)
  Queue Min Size:          64(0x40)
  Queue Max Size:          131072(0x20000)
  Queue Type:              MULTI
  Node:                    1
  Device Type:             GPU
  Cache Info:
    L1:                      16(0x10) KB
    L2:                      4096(0x1000) KB
  Chip ID:                 26751(0x687f)
  ASIC Revision:           1(0x1)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   1630
  BDFID:                   2560
  Internal Node ID:        1
  Compute Unit:            64
  SIMDs per CU:            4
  Shader Engines:          4
  Shader Arrs. per Eng.:   1
  WatchPts on Addr. Ranges:4
  Coherent Host Access:    FALSE
  Features:                KERNEL_DISPATCH
  Fast F16 Operation:      TRUE
  Wavefront Size:          64(0x40)
  Workgroup Max Size:      1024(0x400)
  Workgroup Max Size per Dimension:
    x                        1024(0x400)
    y                        1024(0x400)
    z                        1024(0x400)
  Max Waves Per CU:        40(0x28)
  Max Work-item Per CU:    2560(0xa00)
  Grid Max Size:           4294967295(0xffffffff)
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)
    y                        4294967295(0xffffffff)
    z                        4294967295(0xffffffff)
  Max fbarriers/Workgrp:   32
  Packet Processor uCode:: 468
  SDMA engine uCode::      434
  IOMMU Support::          None
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    8372224(0x7fc000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 2
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    8372224(0x7fc000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 3
      Segment:                 GROUP
      Size:                    64(0x40) KB
      Allocatable:             FALSE
      Alloc Granule:           0KB
      Alloc Alignment:         0KB
      Accessible by all:       FALSE
  ISA Info:
    ISA 1
      Name:                    amdgcn-amd-amdhsa--gfx900:xnack-
      Machine Models:          HSA_MACHINE_MODEL_LARGE
      Profiles:                HSA_PROFILE_BASE
      Default Rounding Mode:   NEAR
      Default Rounding Mode:   NEAR
      Fast f16:                TRUE
      Workgroup Max Size:      1024(0x400)
      Workgroup Max Size per Dimension:
        x                        1024(0x400)
        y                        1024(0x400)
        z                        1024(0x400)
      Grid Max Size:           4294967295(0xffffffff)
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)
        y                        4294967295(0xffffffff)
        z                        4294967295(0xffffffff)
      FBarrier Max Size:       32
*** Done ***

apt show rocm-libs -a (version)

Package: rocm-libs
Version: 6.0.2.60002-115~22.04
Priority: optional
Section: devel
Maintainer: ROCm Dev Support <rocm-dev.support@amd.com>
Installed-Size: 13.3 kB
Depends: hipblas (= 2.0.0.60002-115~22.04), hipblaslt (= 0.6.0.60002-115~22.04), hipfft (= 1.0.13.60002-115~22.04), hipsolver (= 2.0.0.60002-115~22.04), hipsparse (= 3.0.0.60002-115~22.04), hiptensor (= 1.1.0.60002-115~22.04), miopen-hip (= 3.00.0.60002-115~22.04), half (= 1.12.0.60002-115~22.04), rccl (= 2.18.3.60002-115~22.04), rocalution (= 3.0.3.60002-115~22.04), rocblas (= 4.0.0.60002-115~22.04), rocfft (= 1.0.25.60002-115~22.04), rocrand (= 3.0.0.60002-115~22.04), hiprand (= 2.10.16.60002-115~22.04), rocsolver (= 3.24.0.60002-115~22.04), rocsparse (= 3.0.2.60002-115~22.04), rocm-core (= 6.0.2.60002-115~22.04), composablekernel-dev (= 1.1.0.60002-115~22.04), hipblas-dev (= 2.0.0.60002-115~22.04), hipblaslt-dev (= 0.6.0.60002-115~22.04), hipcub-dev (= 3.0.0.60002-115~22.04), hipfft-dev (= 1.0.13.60002-115~22.04), hipsolver-dev (= 2.0.0.60002-115~22.04), hipsparse-dev (= 3.0.0.60002-115~22.04), hiptensor-dev (= 1.1.0.60002-115~22.04), miopen-hip-dev (= 3.00.0.60002-115~22.04), rccl-dev (= 2.18.3.60002-115~22.04), rocalution-dev (= 3.0.3.60002-115~22.04), rocblas-dev (= 4.0.0.60002-115~22.04), rocfft-dev (= 1.0.25.60002-115~22.04), rocprim-dev (= 3.0.0.60002-115~22.04), rocrand-dev (= 3.0.0.60002-115~22.04), hiprand-dev (= 2.10.16.60002-115~22.04), rocsolver-dev (= 3.24.0.60002-115~22.04), rocsparse-dev (= 3.0.2.60002-115~22.04), rocthrust-dev (= 3.0.0.60002-115~22.04), rocwmma-dev (= 1.3.0.60002-115~22.04)
Homepage: https://github.com/RadeonOpenCompute/ROCm
Download-Size: 1,050 B
APT-Sources: https://repo.radeon.com/rocm/apt/6.0.2 jammy/main amd64 Packages
Description: Radeon Open Compute (ROCm) Runtime software stack
<!-- gh-comment-id:2266398776 --> @mathatan commented on GitHub (Aug 3, 2024): Oh, right. It was starting to get late and I didn't even remember to report the actual issue. After running the install script and trying to load a model (via `ollama run`) I get: `rocBLAS error: Could not initialize Tensile host: No devices found` The device is available from, as it's working perfectly fine otherwise. Also, it's not possible to revert the installation, i.e. installing an older version of Ollama will not fix things. If I just replace the executable manually (or by editing the install script) everything works fine. Here are the detailed info and logs (copy pasted from the original link): Environment: ``` [Service] Environment="OLLAMA_HOST=0.0.0.0" Environment="OLLAMA_LLM_LIBRARY=rocm_v60002" Environment="HSA_OVERRIDE_GFX_VERSION=9.0.0" Environment="OLLAMA_NOHISTORY=1" Environment="HSA_ENABLE_SDMA=0" Environment="OLLAMA_DEBUG=1" Environment="AMD_SERIALIZE_KERNEL=3" ``` Logs: ``` Aug 2 20:05:43 ollama-ubuntu-jammy-2 systemd[1]: Started Ollama Service. Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: 2024/08/02 20:05:43 routes.go:1109: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:9.0.0 OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY:rocm_v60002 OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_NOHISTORY:true OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]" Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.405+03:00 level=INFO source=images.go:781 msg="total blobs: 77" Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.407+03:00 level=INFO source=images.go:788 msg="total unused blobs removed: 0" Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=INFO source=routes.go:1156 msg="Listening on [::]:11434 (version 0.3.2)" Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama802441264/runners Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/deps.txt.gz Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/ollama_llama_server.gz Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu/ollama_llama_server Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx/ollama_llama_server Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx2/ollama_llama_server Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cuda_v11/ollama_llama_server Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/rocm_v60102/ollama_llama_server Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cuda_v11 rocm_v60102 cpu]" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=sched.go:105 msg="starting llm scheduler" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=INFO source=gpu.go:205 msg="looking for compatible GPUs" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=gpu.go:91 msg="searching for GPU discovery libraries for NVIDIA" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcuda.so* Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.278+03:00 level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[] Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.278+03:00 level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcudart.so* Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.278+03:00 level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/libcudart.so** /tmp/ollama802441264/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.278+03:00 level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[/tmp/ollama802441264/runners/cuda_v11/libcudart.so.11.0] Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: cudaSetDevice err: 35 Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=gpu.go:533 msg="Unable to load cudart" library=/tmp/ollama802441264/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=WARN source=amd_linux.go:59 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:127 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:217 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=26751 unique_id=150031596308672932 Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:251 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card0/device Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:283 msg="amdgpu memory" gpu=0 total="8.0 GiB" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:284 msg="amdgpu memory" gpu=0 available="8.0 GiB" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/local/bin/rocm" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.280+03:00 level=INFO source=amd_linux.go:348 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=9.0.0 Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.280+03:00 level=INFO source=types.go:105 msg="inference compute" id=0 library=rocm compute=gfx900 driver=0.0 name=1002:687f total="8.0 GiB" available="8.0 GiB" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: [GIN] 2024/08/02 - 20:05:50 | 200 | 42.189µs | 127.0.0.1 | HEAD "/" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: [GIN] 2024/08/02 - 20:05:50 | 200 | 4.888907ms | 127.0.0.1 | POST "/api/show" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.541+03:00 level=DEBUG source=gpu.go:358 msg="updating system memory data" before.total="62.7 GiB" before.free="44.4 GiB" before.free_swap="0 B" now.total="62.7 GiB" now.free="44.3 GiB" now.free_swap="0 B" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.541+03:00 level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:687f before="8.0 GiB" now="8.0 GiB" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.541+03:00 level=DEBUG source=sched.go:181 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=0x816020 gpu_count=1 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=sched.go:219 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=memory.go:101 msg=evaluating library=rocm gpu_count=1 available="[8.0 GiB]" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=INFO source=sched.go:710 msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf gpu=0 parallel=4 available=8564838400 required="6.1 GiB" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=server.go:100 msg="system memory" total="62.7 GiB" free="44.3 GiB" free_swap="0 B" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=memory.go:101 msg=evaluating library=rocm gpu_count=1 available="[8.0 GiB]" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=INFO source=memory.go:309 msg="offload to rocm" layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[8.0 GiB]" memory.required.full="6.1 GiB" memory.required.partial="6.1 GiB" memory.required.kv="3.0 GiB" memory.required.allocations="[6.1 GiB]" memory.weights.total="4.9 GiB" memory.weights.repeating="4.8 GiB" memory.weights.nonrepeating="77.1 MiB" memory.graph.full="512.0 MiB" memory.graph.partial="512.0 MiB" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu/ollama_llama_server Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx/ollama_llama_server Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx2/ollama_llama_server Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cuda_v11/ollama_llama_server Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/rocm_v60102/ollama_llama_server Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu/ollama_llama_server Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx/ollama_llama_server Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx2/ollama_llama_server Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cuda_v11/ollama_llama_server Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/rocm_v60102/ollama_llama_server Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=INFO source=server.go:170 msg="Invalid OLLAMA_LLM_LIBRARY rocm_v60002 - not found" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.546+03:00 level=INFO source=server.go:384 msg="starting llama server" cmd="/tmp/ollama802441264/runners/rocm_v60102/ollama_llama_server --model /usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 33 --verbose --parallel 4 --port 46445" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.546+03:00 level=DEBUG source=server.go:401 msg=subprocess environment="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin HSA_OVERRIDE_GFX_VERSION=9.0.0 HSA_ENABLE_SDMA=0 LD_LIBRARY_PATH=/opt/rocm/lib:/tmp/ollama802441264/runners/rocm_v60102:/tmp/ollama802441264/runners HIP_VISIBLE_DEVICES=0]" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.546+03:00 level=INFO source=sched.go:445 msg="loaded runners" count=1 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.546+03:00 level=INFO source=server.go:584 msg="waiting for llama runner to start responding" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.546+03:00 level=INFO source=server.go:618 msg="waiting for server to become available" status="llm server error" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[482]: INFO [main] build info | build=1 commit="6eeaeba" tid="139832002468928" timestamp=1722618350 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[482]: INFO [main] system info | n_threads=12 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="139832002468928" timestamp=1722618350 total_threads=24 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[482]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="23" port="46445" tid="139832002468928" timestamp=1722618350 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: loaded meta data with 36 key-value pairs and 197 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf (version GGUF V3 (latest)) Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 0: general.architecture str = phi3 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 1: general.type str = model Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 2: general.name str = Phi 3 Mini 128k Instruct Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 3: general.finetune str = 128k-instruct Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 4: general.basename str = Phi-3 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 5: general.size_label str = mini Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 6: general.license str = mit Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/microsoft/Phi-... Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 8: general.tags arr[str,3] = ["nlp", "code", "text-generation"] Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 9: general.languages arr[str,1] = ["en"] Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 10: phi3.context_length u32 = 131072 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 11: phi3.rope.scaling.original_context_length u32 = 4096 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 12: phi3.embedding_length u32 = 3072 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 13: phi3.feed_forward_length u32 = 8192 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 14: phi3.block_count u32 = 32 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 15: phi3.attention.head_count u32 = 32 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 16: phi3.attention.head_count_kv u32 = 32 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 17: phi3.attention.layer_norm_rms_epsilon f32 = 0.000010 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 18: phi3.rope.dimension_count u32 = 96 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 19: phi3.rope.freq_base f32 = 10000.000000 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 20: general.file_type u32 = 2 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 21: phi3.attention.sliding_window u32 = 262144 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 22: phi3.rope.scaling.attn_factor f32 = 1.190238 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 23: tokenizer.ggml.model str = llama Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 24: tokenizer.ggml.pre str = default Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,32064] = ["<unk>", "<s>", "</s>", "<0x00>", "<... Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 26: tokenizer.ggml.scores arr[f32,32064] = [-1000.000000, -1000.000000, -1000.00... Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 27: tokenizer.ggml.token_type arr[i32,32064] = [3, 3, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 28: tokenizer.ggml.bos_token_id u32 = 1 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 29: tokenizer.ggml.eos_token_id u32 = 32000 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 30: tokenizer.ggml.unknown_token_id u32 = 0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 31: tokenizer.ggml.padding_token_id u32 = 32000 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 32: tokenizer.ggml.add_bos_token bool = false Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 33: tokenizer.ggml.add_eos_token bool = false Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 34: tokenizer.chat_template str = {% for message in messages %}{% if me... Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 35: general.quantization_version u32 = 2 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - type f32: 67 tensors Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - type q4_0: 129 tensors Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - type q6_K: 1 tensors Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_vocab: special tokens cache size = 14 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_vocab: token to piece cache size = 0.1685 MB Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: format = GGUF V3 (latest) Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: arch = phi3 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: vocab type = SPM Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_vocab = 32064 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_merges = 0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: vocab_only = 0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_ctx_train = 131072 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_embd = 3072 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_layer = 32 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_head = 32 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_head_kv = 32 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_rot = 96 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_swa = 262144 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_embd_head_k = 96 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_embd_head_v = 96 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_gqa = 1 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_embd_k_gqa = 3072 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_embd_v_gqa = 3072 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: f_norm_eps = 0.0e+00 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: f_logit_scale = 0.0e+00 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_ff = 8192 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_expert = 0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_expert_used = 0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: causal attn = 1 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: pooling type = 0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: rope type = 2 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: rope scaling = linear Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: freq_base_train = 10000.0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: freq_scale_train = 1 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_ctx_orig_yarn = 4096 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: rope_finetuned = unknown Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: ssm_d_conv = 0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: ssm_d_inner = 0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: ssm_d_state = 0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: ssm_dt_rank = 0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: model type = 3B Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: model ftype = Q4_0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: model params = 3.82 B Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: model size = 2.03 GiB (4.55 BPW) Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: general.name = Phi 3 Mini 128k Instruct Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: BOS token = 1 '<s>' Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: EOS token = 32000 '<|endoftext|>' Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: UNK token = 0 '<unk>' Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: PAD token = 32000 '<|endoftext|>' Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: LF token = 13 '<0x0A>' Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: EOT token = 32007 '<|end|>' Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: max token length = 48 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: rocBLAS error: Could not initialize Tensile host: No devices found Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.997+03:00 level=INFO source=server.go:618 msg="waiting for server to become available" status="llm server not responding" Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.086+03:00 level=DEBUG source=server.go:424 msg="llama runner terminated" error="signal: aborted (core dumped)" Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=ERROR source=sched.go:451 msg="error loading llama server" error="llama runner process has terminated: error:Could not initialize Tensile host: No devices found" Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=sched.go:454 msg="triggering expiration for failed load" model=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=sched.go:355 msg="runner expired event received" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=sched.go:371 msg="got lock to unload" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: [GIN] 2024/08/02 - 20:05:52 | 500 | 1.613482132s | 127.0.0.1 | POST "/api/chat" Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=gpu.go:358 msg="updating system memory data" before.total="62.7 GiB" before.free="44.3 GiB" before.free_swap="0 B" now.total="62.7 GiB" now.free="44.3 GiB" now.free_swap="0 B" Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:687f before="8.0 GiB" now="8.0 GiB" Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=server.go:1042 msg="stopping llama server" Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=sched.go:376 msg="runner released" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.401+03:00 level=DEBUG source=gpu.go:358 msg="updating system memory data" before.total="62.7 GiB" before.free="44.3 GiB" before.free_swap="0 B" now.total="62.7 GiB" now.free="44.3 GiB" now.free_swap="0 B" Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.401+03:00 level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:687f before="8.0 GiB" now="8.0 GiB" ``` As can be seen, it detects the GPU perfectly fine and tries to use it, but for some reason rocBLAS fails. ROCM-SMI: ``` ======================================= ROCm System Management Interface ======================================= ================================================= Concise Info ================================================= Device [Model : Revision] Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU% Name (20 chars) (Edge) (Socket) (Mem, Compute) ================================================================================================================ 0 [0x2308 : 0xc1] 45.0°C 4.0W N/A, N/A 852Mhz 167Mhz 0% auto 247.0W 0% 0% Vega 10 XL/XT [Radeo ================================================================================================================ ============================================= End of ROCm SMI Log ============================================== ``` rocminfo: ``` ROCk module is loaded ===================== HSA System Attributes ===================== Runtime Version: 1.1 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE Mwaitx: DISABLED DMAbuf Support: YES ========== HSA Agents ========== ******* Agent 1 ******* Name: AMD Ryzen 9 3900X 12-Core Processor Uuid: CPU-XX Marketing Name: AMD Ryzen 9 3900X 12-Core Processor Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 3800 BDFID: 0 Internal Node ID: 0 Compute Unit: 24 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 65771684(0x3eb98a4) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 65771684(0x3eb98a4) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 65771684(0x3eb98a4) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: ******* Agent 2 ******* Name: gfx900 Uuid: GPU-021504f1231031a4 Marketing Name: Radeon RX Vega Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 4096(0x1000) KB Chip ID: 26751(0x687f) ASIC Revision: 1(0x1) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 1630 BDFID: 2560 Internal Node ID: 1 Compute Unit: 64 SIMDs per CU: 4 Shader Engines: 4 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Coherent Host Access: FALSE Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 64(0x40) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 40(0x28) Max Work-item Per CU: 2560(0xa00) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 468 SDMA engine uCode:: 434 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 8372224(0x7fc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 8372224(0x7fc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx900:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 *** Done *** ``` `apt show rocm-libs -a` (version) ``` Package: rocm-libs Version: 6.0.2.60002-115~22.04 Priority: optional Section: devel Maintainer: ROCm Dev Support <rocm-dev.support@amd.com> Installed-Size: 13.3 kB Depends: hipblas (= 2.0.0.60002-115~22.04), hipblaslt (= 0.6.0.60002-115~22.04), hipfft (= 1.0.13.60002-115~22.04), hipsolver (= 2.0.0.60002-115~22.04), hipsparse (= 3.0.0.60002-115~22.04), hiptensor (= 1.1.0.60002-115~22.04), miopen-hip (= 3.00.0.60002-115~22.04), half (= 1.12.0.60002-115~22.04), rccl (= 2.18.3.60002-115~22.04), rocalution (= 3.0.3.60002-115~22.04), rocblas (= 4.0.0.60002-115~22.04), rocfft (= 1.0.25.60002-115~22.04), rocrand (= 3.0.0.60002-115~22.04), hiprand (= 2.10.16.60002-115~22.04), rocsolver (= 3.24.0.60002-115~22.04), rocsparse (= 3.0.2.60002-115~22.04), rocm-core (= 6.0.2.60002-115~22.04), composablekernel-dev (= 1.1.0.60002-115~22.04), hipblas-dev (= 2.0.0.60002-115~22.04), hipblaslt-dev (= 0.6.0.60002-115~22.04), hipcub-dev (= 3.0.0.60002-115~22.04), hipfft-dev (= 1.0.13.60002-115~22.04), hipsolver-dev (= 2.0.0.60002-115~22.04), hipsparse-dev (= 3.0.0.60002-115~22.04), hiptensor-dev (= 1.1.0.60002-115~22.04), miopen-hip-dev (= 3.00.0.60002-115~22.04), rccl-dev (= 2.18.3.60002-115~22.04), rocalution-dev (= 3.0.3.60002-115~22.04), rocblas-dev (= 4.0.0.60002-115~22.04), rocfft-dev (= 1.0.25.60002-115~22.04), rocprim-dev (= 3.0.0.60002-115~22.04), rocrand-dev (= 3.0.0.60002-115~22.04), hiprand-dev (= 2.10.16.60002-115~22.04), rocsolver-dev (= 3.24.0.60002-115~22.04), rocsparse-dev (= 3.0.2.60002-115~22.04), rocthrust-dev (= 3.0.0.60002-115~22.04), rocwmma-dev (= 1.3.0.60002-115~22.04) Homepage: https://github.com/RadeonOpenCompute/ROCm Download-Size: 1,050 B APT-Sources: https://repo.radeon.com/rocm/apt/6.0.2 jammy/main amd64 Packages Description: Radeon Open Compute (ROCm) Runtime software stack ```
Author
Owner

@dhiltgen commented on GitHub (Aug 8, 2024):

It looks like it's using /opt/rocm on your host, which appears to be missing the gfx900 data files. The install script is detecting this ROCm install and skipping the install of our bundled version. To workaround this, you could run something like this: curl --fail --show-error --location --progress-bar "https://ollama.com/download/ollama-linux-amd64-rocm.tgz" | sudo tar zx --owner ollama --group ollama -C /usr/share/ollama/lib/rocm .

Once #5631 merges, we'll shift to a model of always carrying the rocm dependency and use it even if ROCm is installed on the target system.

<!-- gh-comment-id:2276401625 --> @dhiltgen commented on GitHub (Aug 8, 2024): It looks like it's using `/opt/rocm` on your host, which appears to be missing the gfx900 data files. The install script is detecting this ROCm install and skipping the install of our bundled version. To workaround this, you could run something like this: `curl --fail --show-error --location --progress-bar "https://ollama.com/download/ollama-linux-amd64-rocm.tgz" | sudo tar zx --owner ollama --group ollama -C /usr/share/ollama/lib/rocm .` Once #5631 merges, we'll shift to a model of always carrying the rocm dependency and use it even if ROCm is installed on the target system.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#29598