[GH-ISSUE #6144] Automated install script breaks ollama installation on AMD Vega 64 #29598

New Issue

GiteaMirror · 2026-04-22T08:36:21-05:00

GiteaMirror commented

2026-04-22 08:36:21 -05:00

Originally created by @mathatan on GitHub (Aug 2, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6144

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

After debugging for a while (see https://github.com/ollama/ollama/issues/5143#issuecomment-2265824021 and https://github.com/ollama/ollama/issues/5143#issuecomment-2265892604 for details) I came to realize that Ollama install.sh breaks an upgrade (on my setup at least), unless I edit it so that it exit 0's after line trap install_success EXIT (currently line 81)

I'm not entirely sure what happens, but the gist of the matter is that manually copying the new Ollama binary to /usr/local/bin works just fine but trying to run the install script will break everything permanently. The only way I have been able to recover my Ollama installation has been to use a hard rollback on ZFS.

Also, I have tried multiple times to install everything from scratch and have been positively unable to get Ollama working. While I don't know what is going on, here are the details of my system and steps to repeat installation:

CPU: AMD Ryzen 9 3900X 12-Core Processor
GPU: Radeon RX Vega 64 (gfx900)
OS: Ubuntu 22.04.4 LTS (in a jail within TrueNAS-SCALE-24.04.2)

Rocm libs:

Package: rocm-libs
Version: 6.1.2.60102-119~22.04
Priority: optional
Section: devel
Maintainer: ROCm Dev Support <rocm-dev.support@amd.com>
Installed-Size: 13.3 kB
Depends: hipblas (= 2.1.0.60102-119~22.04),
 hipblaslt (= 0.7.0.60102-119~22.04),
 hipfft (= 1.0.14.60102-119~22.04),
 hipsolver (= 2.1.1.60102-119~22.04),
 hipsparse (= 3.0.1.60102-119~22.04),
 hiptensor (= 1.2.0.60102-119~22.04),
 miopen-hip (= 3.1.0.60102-119~22.04),
 half (= 1.12.0.60102-119~22.04),
 rccl (= 2.18.6.60102-119~22.04),
 rocalution (= 3.1.1.60102-119~22.04),
 rocblas (= 4.1.2.60102-119~22.04),
 rocfft (= 1.0.27.60102-119~22.04),
 rocrand (= 3.0.1.60102-119~22.04),
 hiprand (= 2.10.16.60102-119~22.04),
 rocsolver (= 3.25.0.60102-119~22.04),
 rocsparse (= 3.1.2.60102-119~22.04),
 rocm-core (= 6.1.2.60102-119~22.04),
 hipsparselt (= 0.2.0.60102-119~22.04),
 composablekernel-dev (= 1.1.0.60102-119~22.04),
 hipblas-dev (= 2.1.0.60102-119~22.04),
 hipblaslt-dev (= 0.7.0.60102-119~22.04),
 hipcub-dev (= 3.1.0.60102-119~22.04),
 hipfft-dev (= 1.0.14.60102-119~22.04),
 hipsolver-dev (= 2.1.1.60102-119~22.04),
 hipsparse-dev (= 3.0.1.60102-119~22.04),
 hiptensor-dev (= 1.2.0.60102-119~22.04),
 miopen-hip-dev (= 3.1.0.60102-119~22.04),
 rccl-dev (= 2.18.6.60102-119~22.04),
 rocalution-dev (= 3.1.1.60102-119~22.04),
 rocblas-dev (= 4.1.2.60102-119~22.04),
 rocfft-dev (= 1.0.27.60102-119~22.04),
 rocprim-dev (= 3.1.0.60102-119~22.04),
 rocrand-dev (= 3.0.1.60102-119~22.04),
 hiprand-dev (= 2.10.16.60102-119~22.04),
 rocsolver-dev (= 3.25.0.60102-119~22.04),
 rocsparse-dev (= 3.1.2.60102-119~22.04),
 rocthrust-dev (= 3.0.1.60102-119~22.04),
 rocwmma-dev (= 1.4.0.60102-119~22.04),
 hipsparselt-dev (= 0.2.0.60102-119~22.04)
Homepage: https://github.com/RadeonOpenCompute/ROCm
Download-Size: 1,068 B
APT-Sources: https://repo.radeon.com/rocm/apt/6.1.2 jammy/main amd64 Packages
Description: Radeon Open Compute (ROCm) Runtime software stack

Jail installation (requires the amazing https://github.com/Jip-Hop/jailmaker script)
Truenas:

$ jlmkr create --distro=ubuntu --release=jammy ollama-ubuntu-jammy --bind=/dev/dri --bind=/dev/kfd
$ jlmkr edit ollama-ubuntu-jammy

(add)
systemd_nspawn_user_args=--bind=/dev/dri
            --bind=/dev/kfd
+            --bind='/mnt/NVME/jails/ollama:/usr/share/ollama/.ollama'
+            --property=DeviceAllow="/dev/kfd rw"

$ jlmkr start ollama-ubuntu-jammy
$ jlmkr shell ollama-ubuntu-jammy

Jail:

$ apt install curl
$ curl -O https://repo.radeon.com/amdgpu-install/6.0.2/ubuntu/jammy/amdgpu-install_6.0.60002-1_all.deb
$ apt install ./amdgpu-install_6.0.60002-1_all.deb
$ apt update
$ apt install amdgpu-dkms rocm radeontop
$ reboot

(reconnect)

$ curl -fsSL https://ollama.com/install.sh | sh
$ systemctl edit ollama.service

(add)

[Service]
Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_LLM_LIBRARY=rocm_v60002"
Environment="HSA_OVERRIDE_GFX_VERSION=9.0.0"
Environment="OLLAMA_NOHISTORY=1"
Environment="HSA_ENABLE_SDMA=0"
Environment="OLLAMA_DEBUG=1"

$ systemctl restart ollama.service

It should be noted that there might be something I'm missing on these steps like creating the user and groups (jail only has root user by-default), which I'm not entirely sure how it went, but as the install.sh breaks a working setup I doubt that any missing step here is an actual issue. Also if logs are needed you can check them from the previously linked comments.

These are the working logs:

Aug  2 22:01:33 ollama-ubuntu-jammy systemd[1]: Started Ollama Service.
Aug  2 22:01:33 ollama-ubuntu-jammy ollama[387]: 2024/08/02 22:01:33 routes.go:1109: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:9.0.0 OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY:rocm_v60002 OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:true OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
Aug  2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.855+03:00 level=INFO source=images.go:781 msg="total blobs: 77"
Aug  2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.857+03:00 level=INFO source=images.go:788 msg="total unused blobs removed: 0"
Aug  2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=INFO source=routes.go:1156 msg="Listening on [::]:11434 (version 0.3.2)"
Aug  2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3196240257/runners
Aug  2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz
Aug  2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz
Aug  2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz
Aug  2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz
Aug  2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz
Aug  2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz
Aug  2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz
Aug  2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/deps.txt.gz
Aug  2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/ollama_llama_server.gz
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu/ollama_llama_server
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu_avx/ollama_llama_server
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu_avx2/ollama_llama_server
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cuda_v11/ollama_llama_server
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/rocm_v60102/ollama_llama_server
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60102]"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=DEBUG source=sched.go:105 msg="starting llm scheduler"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=INFO source=gpu.go:205 msg="looking for compatible GPUs"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.873+03:00 level=DEBUG source=gpu.go:91 msg="searching for GPU discovery libraries for NVIDIA"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.873+03:00 level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcuda.so*
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.873+03:00 level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.874+03:00 level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[]
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.874+03:00 level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcudart.so*
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.874+03:00 level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/libcudart.so** /tmp/ollama3196240257/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.874+03:00 level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[/tmp/ollama3196240257/runners/cuda_v11/libcudart.so.11.0]
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: cudaSetDevice err: 35
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=gpu.go:533 msg="Unable to load cudart" library=/tmp/ollama3196240257/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=WARN source=amd_linux.go:59 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_linux.go:127 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_linux.go:217 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=26751 unique_id=150031596308672932
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_linux.go:251 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card0/device
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_linux.go:283 msg="amdgpu memory" gpu=0 total="8.0 GiB"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_linux.go:284 msg="amdgpu memory" gpu=0 available="8.0 GiB"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/local/bin/rocm"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.876+03:00 level=INFO source=amd_linux.go:348 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=9.0.0
Aug  2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.876+03:00 level=INFO source=types.go:105 msg="inference compute" id=0 library=rocm compute=gfx900 driver=0.0 name=1002:687f total="8.0 GiB" available="8.0 GiB"
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: [GIN] 2024/08/02 - 22:01:39 | 200 |      24.509µs |       127.0.0.1 | HEAD     "/"
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: [GIN] 2024/08/02 - 22:01:39 | 200 |    4.530772ms |       127.0.0.1 | POST     "/api/show"
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.389+03:00 level=DEBUG source=gpu.go:358 msg="updating system memory data" before.total="62.7 GiB" before.free="50.4 GiB" before.free_swap="0 B" now.total="62.7 GiB" now.free="50.4 GiB" now.free_swap="0 B"
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.389+03:00 level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:687f before="8.0 GiB" now="8.0 GiB"
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.389+03:00 level=DEBUG source=sched.go:181 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=0x816020 gpu_count=1
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.397+03:00 level=DEBUG source=sched.go:219 msg="loading first model" model=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.397+03:00 level=DEBUG source=memory.go:101 msg=evaluating library=rocm gpu_count=1 available="[8.0 GiB]"
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.397+03:00 level=INFO source=sched.go:710 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf gpu=0 parallel=4 available=8564838400 required="6.1 GiB"
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.397+03:00 level=DEBUG source=server.go:100 msg="system memory" total="62.7 GiB" free="50.4 GiB" free_swap="0 B"
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.397+03:00 level=DEBUG source=memory.go:101 msg=evaluating library=rocm gpu_count=1 available="[8.0 GiB]"
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=INFO source=memory.go:309 msg="offload to rocm" layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[8.0 GiB]" memory.required.full="6.1 GiB" memory.required.partial="6.1 GiB" memory.required.kv="3.0 GiB" memory.required.allocations="[6.1 GiB]" memory.weights.total="4.9 GiB" memory.weights.repeating="4.8 GiB" memory.weights.nonrepeating="77.1 MiB" memory.graph.full="512.0 MiB" memory.graph.partial="512.0 MiB"
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu/ollama_llama_server
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu_avx/ollama_llama_server
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu_avx2/ollama_llama_server
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cuda_v11/ollama_llama_server
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/rocm_v60102/ollama_llama_server
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu/ollama_llama_server
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu_avx/ollama_llama_server
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu_avx2/ollama_llama_server
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cuda_v11/ollama_llama_server
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/rocm_v60102/ollama_llama_server
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=INFO source=server.go:170 msg="Invalid OLLAMA_LLM_LIBRARY rocm_v60002 - not found"
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.399+03:00 level=INFO source=server.go:384 msg="starting llama server" cmd="/tmp/ollama3196240257/runners/rocm_v60102/ollama_llama_server --model /root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 33 --verbose --parallel 4 --port 46879"
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.399+03:00 level=DEBUG source=server.go:401 msg=subprocess environment="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin HSA_OVERRIDE_GFX_VERSION=9.0.0 HSA_ENABLE_SDMA=0 LD_LIBRARY_PATH=/opt/rocm/lib:/tmp/ollama3196240257/runners/rocm_v60102:/tmp/ollama3196240257/runners HIP_VISIBLE_DEVICES=0]"
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.399+03:00 level=INFO source=sched.go:445 msg="loaded runners" count=1
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.399+03:00 level=INFO source=server.go:584 msg="waiting for llama runner to start responding"
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.399+03:00 level=INFO source=server.go:618 msg="waiting for server to become available" status="llm server error"
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[423]: INFO [main] build info | build=1 commit="6eeaeba" tid="139675947496256" timestamp=1722625299
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[423]: INFO [main] system info | n_threads=12 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="139675947496256" timestamp=1722625299 total_threads=24
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[423]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="23" port="46879" tid="139675947496256" timestamp=1722625299
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: loaded meta data with 36 key-value pairs and 197 tensors from /root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf (version GGUF V3 (latest))
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv   0:                       general.architecture str              = phi3
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv   1:                               general.type str              = model
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv   2:                               general.name str              = Phi 3 Mini 128k Instruct
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv   3:                           general.finetune str              = 128k-instruct
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv   4:                           general.basename str              = Phi-3
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv   5:                         general.size_label str              = mini
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv   6:                            general.license str              = mit
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv   7:                       general.license.link str              = https://huggingface.co/microsoft/Phi-...
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv   8:                               general.tags arr[str,3]       = ["nlp", "code", "text-generation"]
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv   9:                          general.languages arr[str,1]       = ["en"]
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  10:                        phi3.context_length u32              = 131072
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  11:  phi3.rope.scaling.original_context_length u32              = 4096
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  12:                      phi3.embedding_length u32              = 3072
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  13:                   phi3.feed_forward_length u32              = 8192
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  14:                           phi3.block_count u32              = 32
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  15:                  phi3.attention.head_count u32              = 32
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  16:               phi3.attention.head_count_kv u32              = 32
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  17:      phi3.attention.layer_norm_rms_epsilon f32              = 0.000010
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  18:                  phi3.rope.dimension_count u32              = 96
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  19:                        phi3.rope.freq_base f32              = 10000.000000
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  20:                          general.file_type u32              = 2
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  21:              phi3.attention.sliding_window u32              = 262144
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  22:              phi3.rope.scaling.attn_factor f32              = 1.190238
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = llama
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = default
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,32064]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  26:                      tokenizer.ggml.scores arr[f32,32064]   = [-1000.000000, -1000.000000, -1000.00...
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  27:                  tokenizer.ggml.token_type arr[i32,32064]   = [3, 3, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  28:                tokenizer.ggml.bos_token_id u32              = 1
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  29:                tokenizer.ggml.eos_token_id u32              = 32000
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  30:            tokenizer.ggml.unknown_token_id u32              = 0
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  31:            tokenizer.ggml.padding_token_id u32              = 32000
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  32:               tokenizer.ggml.add_bos_token bool             = false
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  33:               tokenizer.ggml.add_eos_token bool             = false
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  34:                    tokenizer.chat_template str              = {% for message in messages %}{% if me...
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv  35:               general.quantization_version u32              = 2
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - type  f32:   67 tensors
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - type q4_0:  129 tensors
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - type q6_K:    1 tensors
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_vocab: special tokens cache size = 14
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_vocab: token to piece cache size = 0.1685 MB
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: format           = GGUF V3 (latest)
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: arch             = phi3
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: vocab type       = SPM
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_vocab          = 32064
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_merges         = 0
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: vocab_only       = 0
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_ctx_train      = 131072
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_embd           = 3072
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_layer          = 32
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_head           = 32
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_head_kv        = 32
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_rot            = 96
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_swa            = 262144
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_embd_head_k    = 96
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_embd_head_v    = 96
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_gqa            = 1
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_embd_k_gqa     = 3072
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_embd_v_gqa     = 3072
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: f_norm_eps       = 0.0e+00
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: f_logit_scale    = 0.0e+00
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_ff             = 8192
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_expert         = 0
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_expert_used    = 0
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: causal attn      = 1
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: pooling type     = 0
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: rope type        = 2
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: rope scaling     = linear
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: freq_base_train  = 10000.0
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: freq_scale_train = 1
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_ctx_orig_yarn  = 4096
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: rope_finetuned   = unknown
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: ssm_d_conv       = 0
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: ssm_d_inner      = 0
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: ssm_d_state      = 0
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: ssm_dt_rank      = 0
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: model type       = 3B
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: model ftype      = Q4_0
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: model params     = 3.82 B
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: model size       = 2.03 GiB (4.55 BPW)
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: general.name     = Phi 3 Mini 128k Instruct
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: BOS token        = 1 '<s>'
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: EOS token        = 32000 '<|endoftext|>'
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: UNK token        = 0 '<unk>'
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: PAD token        = 32000 '<|endoftext|>'
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: LF token         = 13 '<0x0A>'
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: EOT token        = 32007 '<|end|>'
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: max token length = 48
Aug  2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.651+03:00 level=INFO source=server.go:618 msg="waiting for server to become available" status="llm server loading model"
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: ggml_cuda_init: found 1 ROCm devices:
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]:   Device 0: Radeon RX Vega, compute capability 9.0, VMM: no
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llm_load_tensors: ggml ctx size =    0.21 MiB
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llm_load_tensors: offloading 32 repeating layers to GPU
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llm_load_tensors: offloading non-repeating layers to GPU
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llm_load_tensors: offloaded 33/33 layers to GPU
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llm_load_tensors:      ROCm0 buffer size =  2021.84 MiB
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llm_load_tensors:        CPU buffer size =    52.84 MiB
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:40.403+03:00 level=DEBUG source=server.go:629 msg="model load progress 0.08"
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:40.654+03:00 level=DEBUG source=server.go:629 msg="model load progress 0.57"
Aug  2 22:01:40 ollama-ubuntu-jammy systemd[1]: Starting Daily apt upgrade and clean activities...
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:40.905+03:00 level=DEBUG source=server.go:629 msg="model load progress 0.97"
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: n_ctx      = 8192
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: n_batch    = 512
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: n_ubatch   = 512
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: flash_attn = 0
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: freq_base  = 10000.0
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: freq_scale = 1
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_kv_cache_init:      ROCm0 KV buffer size =  3072.00 MiB
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: KV self size  = 3072.00 MiB, K (f16): 1536.00 MiB, V (f16): 1536.00 MiB
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model:  ROCm_Host  output buffer size =     0.54 MiB
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model:      ROCm0 compute buffer size =   564.00 MiB
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model:  ROCm_Host compute buffer size =    22.01 MiB
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: graph nodes  = 1286
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: graph splits = 2
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[423]: DEBUG [initialize] initializing slots | n_slots=4 tid="139675947496256" timestamp=1722625300
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[423]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=0 tid="139675947496256" timestamp=1722625300
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[423]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=1 tid="139675947496256" timestamp=1722625300
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[423]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=2 tid="139675947496256" timestamp=1722625300
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[423]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=3 tid="139675947496256" timestamp=1722625300
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[423]: INFO [main] model loaded | tid="139675947496256" timestamp=1722625300
Aug  2 22:01:40 ollama-ubuntu-jammy ollama[423]: DEBUG [update_slots] all slots are idle and system prompt is empty, clear the KV cache | tid="139675947496256" timestamp=1722625300
Aug  2 22:01:41 ollama-ubuntu-jammy systemd[1]: apt-daily-upgrade.service: Deactivated successfully.
Aug  2 22:01:41 ollama-ubuntu-jammy systemd[1]: Finished Daily apt upgrade and clean activities.
Aug  2 22:01:41 ollama-ubuntu-jammy ollama[423]: DEBUG [process_single_task] slot data | n_idle_slots=4 n_processing_slots=0 task_id=0 tid="139675947496256" timestamp=1722625301
Aug  2 22:01:41 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:41.156+03:00 level=INFO source=server.go:623 msg="llama runner started in 1.76 seconds"
Aug  2 22:01:41 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:41.156+03:00 level=DEBUG source=sched.go:458 msg="finished setting up runner" model=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf
Aug  2 22:01:41 ollama-ubuntu-jammy ollama[387]: [GIN] 2024/08/02 - 22:01:41 | 200 |  1.775479674s |       127.0.0.1 | POST     "/api/chat"
Aug  2 22:01:41 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:41.156+03:00 level=DEBUG source=sched.go:462 msg="context for request finished"
Aug  2 22:01:41 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:41.156+03:00 level=DEBUG source=sched.go:334 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf duration=5m0s
Aug  2 22:01:41 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:41.156+03:00 level=DEBUG source=sched.go:352 msg="after processing request finished event" modelPath=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf refCount=0
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:45.117+03:00 level=DEBUG source=sched.go:571 msg="evaluating already loaded" model=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [process_single_task] slot data | n_idle_slots=4 n_processing_slots=0 task_id=1 tid="139675947496256" timestamp=1722625305
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:45.118+03:00 level=DEBUG source=routes.go:1347 msg="chat request" images=0 prompt="<|user|>\nHey there!<|end|>\n<|assistant|>\n"
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [process_single_task] slot data | n_idle_slots=4 n_processing_slots=0 task_id=2 tid="139675947496256" timestamp=1722625305
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [launch_slot_with_data] slot is processing task | slot_id=0 task_id=3 tid="139675947496256" timestamp=1722625305
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [update_slots] slot progression | ga_i=0 n_past=0 n_past_se=0 n_prompt_tokens_processed=13 slot_id=0 task_id=3 tid="139675947496256" timestamp=1722625305
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [update_slots] kv cache rm [p0, end) | p0=0 slot_id=0 task_id=3 tid="139675947496256" timestamp=1722625305
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [print_timings] prompt eval time     =      98.30 ms /    13 tokens (    7.56 ms per token,   132.25 tokens per second) | n_prompt_tokens_processed=13 n_tokens_second=132.25091049665303 slot_id=0 t_prompt_processing=98.298 t_token=7.561384615384616 task_id=3 tid="139675947496256" timestamp=1722625305
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [print_timings] generation eval time =     171.00 ms /    10 runs   (   17.10 ms per token,    58.48 tokens per second) | n_decoded=10 n_tokens_second=58.48021614287887 slot_id=0 t_token=17.0998 t_token_generation=170.998 task_id=3 tid="139675947496256" timestamp=1722625305
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [print_timings]           total time =     269.30 ms | slot_id=0 t_prompt_processing=98.298 t_token_generation=170.998 t_total=269.296 task_id=3 tid="139675947496256" timestamp=1722625305
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [update_slots] slot released | n_cache_tokens=23 n_ctx=8192 n_past=22 n_system_tokens=0 slot_id=0 task_id=3 tid="139675947496256" timestamp=1722625305 truncated=false
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [log_server_request] request | method="POST" params={} path="/completion" remote_addr="127.0.0.1" remote_port=48122 status=200 tid="139675929642560" timestamp=1722625305
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[387]: [GIN] 2024/08/02 - 22:01:45 | 200 |  319.564933ms |       127.0.0.1 | POST     "/api/chat"
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:45.429+03:00 level=DEBUG source=sched.go:403 msg="context for request finished"
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:45.429+03:00 level=DEBUG source=sched.go:334 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf duration=5m0s
Aug  2 22:01:45 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:45.429+03:00 level=DEBUG source=sched.go:352 msg="after processing request finished event" modelPath=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf refCount=0

Thanks for an amazing software/app, I hope someone can wrap their head around this issue!

OS

Linux

GPU

AMD

CPU

AMD

Ollama version

from 0.1.48 to 0.3.x

Originally created by @mathatan on GitHub (Aug 2, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6144 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? After debugging for a while (see https://github.com/ollama/ollama/issues/5143#issuecomment-2265824021 and https://github.com/ollama/ollama/issues/5143#issuecomment-2265892604 for details) I came to realize that Ollama install.sh breaks an upgrade (on my setup at least), unless I edit it so that it `exit 0`'s after line `trap install_success EXIT` (currently line 81) I'm not entirely sure what happens, but the gist of the matter is that manually copying the new Ollama binary to `/usr/local/bin` works just fine but trying to run the install script will break everything permanently. The only way I have been able to recover my Ollama installation has been to use a hard rollback on ZFS. Also, I have tried multiple times to install everything from scratch and have been positively unable to get Ollama working. While I don't know what is going on, here are the details of my system and steps to repeat installation: CPU: AMD Ryzen 9 3900X 12-Core Processor GPU: Radeon RX Vega 64 (gfx900) OS: Ubuntu 22.04.4 LTS (in a jail within TrueNAS-SCALE-24.04.2) Rocm libs: ``` Package: rocm-libs Version: 6.1.2.60102-119~22.04 Priority: optional Section: devel Maintainer: ROCm Dev Support <rocm-dev.support@amd.com> Installed-Size: 13.3 kB Depends: hipblas (= 2.1.0.60102-119~22.04), hipblaslt (= 0.7.0.60102-119~22.04), hipfft (= 1.0.14.60102-119~22.04), hipsolver (= 2.1.1.60102-119~22.04), hipsparse (= 3.0.1.60102-119~22.04), hiptensor (= 1.2.0.60102-119~22.04), miopen-hip (= 3.1.0.60102-119~22.04), half (= 1.12.0.60102-119~22.04), rccl (= 2.18.6.60102-119~22.04), rocalution (= 3.1.1.60102-119~22.04), rocblas (= 4.1.2.60102-119~22.04), rocfft (= 1.0.27.60102-119~22.04), rocrand (= 3.0.1.60102-119~22.04), hiprand (= 2.10.16.60102-119~22.04), rocsolver (= 3.25.0.60102-119~22.04), rocsparse (= 3.1.2.60102-119~22.04), rocm-core (= 6.1.2.60102-119~22.04), hipsparselt (= 0.2.0.60102-119~22.04), composablekernel-dev (= 1.1.0.60102-119~22.04), hipblas-dev (= 2.1.0.60102-119~22.04), hipblaslt-dev (= 0.7.0.60102-119~22.04), hipcub-dev (= 3.1.0.60102-119~22.04), hipfft-dev (= 1.0.14.60102-119~22.04), hipsolver-dev (= 2.1.1.60102-119~22.04), hipsparse-dev (= 3.0.1.60102-119~22.04), hiptensor-dev (= 1.2.0.60102-119~22.04), miopen-hip-dev (= 3.1.0.60102-119~22.04), rccl-dev (= 2.18.6.60102-119~22.04), rocalution-dev (= 3.1.1.60102-119~22.04), rocblas-dev (= 4.1.2.60102-119~22.04), rocfft-dev (= 1.0.27.60102-119~22.04), rocprim-dev (= 3.1.0.60102-119~22.04), rocrand-dev (= 3.0.1.60102-119~22.04), hiprand-dev (= 2.10.16.60102-119~22.04), rocsolver-dev (= 3.25.0.60102-119~22.04), rocsparse-dev (= 3.1.2.60102-119~22.04), rocthrust-dev (= 3.0.1.60102-119~22.04), rocwmma-dev (= 1.4.0.60102-119~22.04), hipsparselt-dev (= 0.2.0.60102-119~22.04) Homepage: https://github.com/RadeonOpenCompute/ROCm Download-Size: 1,068 B APT-Sources: https://repo.radeon.com/rocm/apt/6.1.2 jammy/main amd64 Packages Description: Radeon Open Compute (ROCm) Runtime software stack ``` Jail installation (requires the amazing https://github.com/Jip-Hop/jailmaker script) Truenas: ``` $ jlmkr create --distro=ubuntu --release=jammy ollama-ubuntu-jammy --bind=/dev/dri --bind=/dev/kfd $ jlmkr edit ollama-ubuntu-jammy (add) systemd_nspawn_user_args=--bind=/dev/dri --bind=/dev/kfd + --bind='/mnt/NVME/jails/ollama:/usr/share/ollama/.ollama' + --property=DeviceAllow="/dev/kfd rw" $ jlmkr start ollama-ubuntu-jammy $ jlmkr shell ollama-ubuntu-jammy ``` Jail: ``` $ apt install curl $ curl -O https://repo.radeon.com/amdgpu-install/6.0.2/ubuntu/jammy/amdgpu-install_6.0.60002-1_all.deb $ apt install ./amdgpu-install_6.0.60002-1_all.deb $ apt update $ apt install amdgpu-dkms rocm radeontop $ reboot (reconnect) $ curl -fsSL https://ollama.com/install.sh | sh $ systemctl edit ollama.service (add) [Service] Environment="OLLAMA_HOST=0.0.0.0" Environment="OLLAMA_LLM_LIBRARY=rocm_v60002" Environment="HSA_OVERRIDE_GFX_VERSION=9.0.0" Environment="OLLAMA_NOHISTORY=1" Environment="HSA_ENABLE_SDMA=0" Environment="OLLAMA_DEBUG=1" $ systemctl restart ollama.service ``` It should be noted that there might be something I'm missing on these steps like creating the user and groups (jail only has root user by-default), which I'm not entirely sure how it went, but as the install.sh breaks a working setup I doubt that any missing step here is an actual issue. Also if logs are needed you can check them from the previously linked comments. These are the **working** logs: ``` Aug 2 22:01:33 ollama-ubuntu-jammy systemd[1]: Started Ollama Service. Aug 2 22:01:33 ollama-ubuntu-jammy ollama[387]: 2024/08/02 22:01:33 routes.go:1109: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:9.0.0 OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY:rocm_v60002 OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:true OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]" Aug 2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.855+03:00 level=INFO source=images.go:781 msg="total blobs: 77" Aug 2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.857+03:00 level=INFO source=images.go:788 msg="total unused blobs removed: 0" Aug 2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=INFO source=routes.go:1156 msg="Listening on [::]:11434 (version 0.3.2)" Aug 2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3196240257/runners Aug 2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz Aug 2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz Aug 2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz Aug 2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz Aug 2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz Aug 2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz Aug 2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz Aug 2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/deps.txt.gz Aug 2 22:01:33 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:33.859+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/ollama_llama_server.gz Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu/ollama_llama_server Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu_avx/ollama_llama_server Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu_avx2/ollama_llama_server Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cuda_v11/ollama_llama_server Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/rocm_v60102/ollama_llama_server Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60102]" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=DEBUG source=sched.go:105 msg="starting llm scheduler" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.872+03:00 level=INFO source=gpu.go:205 msg="looking for compatible GPUs" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.873+03:00 level=DEBUG source=gpu.go:91 msg="searching for GPU discovery libraries for NVIDIA" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.873+03:00 level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcuda.so* Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.873+03:00 level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.874+03:00 level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[] Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.874+03:00 level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcudart.so* Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.874+03:00 level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/libcudart.so** /tmp/ollama3196240257/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.874+03:00 level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[/tmp/ollama3196240257/runners/cuda_v11/libcudart.so.11.0] Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: cudaSetDevice err: 35 Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=gpu.go:533 msg="Unable to load cudart" library=/tmp/ollama3196240257/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=WARN source=amd_linux.go:59 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_linux.go:127 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_linux.go:217 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=26751 unique_id=150031596308672932 Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_linux.go:251 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card0/device Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_linux.go:283 msg="amdgpu memory" gpu=0 total="8.0 GiB" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_linux.go:284 msg="amdgpu memory" gpu=0 available="8.0 GiB" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/local/bin/rocm" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.875+03:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.876+03:00 level=INFO source=amd_linux.go:348 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=9.0.0 Aug 2 22:01:37 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:37.876+03:00 level=INFO source=types.go:105 msg="inference compute" id=0 library=rocm compute=gfx900 driver=0.0 name=1002:687f total="8.0 GiB" available="8.0 GiB" Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: [GIN] 2024/08/02 - 22:01:39 | 200 | 24.509µs | 127.0.0.1 | HEAD "/" Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: [GIN] 2024/08/02 - 22:01:39 | 200 | 4.530772ms | 127.0.0.1 | POST "/api/show" Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.389+03:00 level=DEBUG source=gpu.go:358 msg="updating system memory data" before.total="62.7 GiB" before.free="50.4 GiB" before.free_swap="0 B" now.total="62.7 GiB" now.free="50.4 GiB" now.free_swap="0 B" Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.389+03:00 level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:687f before="8.0 GiB" now="8.0 GiB" Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.389+03:00 level=DEBUG source=sched.go:181 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=0x816020 gpu_count=1 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.397+03:00 level=DEBUG source=sched.go:219 msg="loading first model" model=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.397+03:00 level=DEBUG source=memory.go:101 msg=evaluating library=rocm gpu_count=1 available="[8.0 GiB]" Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.397+03:00 level=INFO source=sched.go:710 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf gpu=0 parallel=4 available=8564838400 required="6.1 GiB" Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.397+03:00 level=DEBUG source=server.go:100 msg="system memory" total="62.7 GiB" free="50.4 GiB" free_swap="0 B" Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.397+03:00 level=DEBUG source=memory.go:101 msg=evaluating library=rocm gpu_count=1 available="[8.0 GiB]" Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=INFO source=memory.go:309 msg="offload to rocm" layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[8.0 GiB]" memory.required.full="6.1 GiB" memory.required.partial="6.1 GiB" memory.required.kv="3.0 GiB" memory.required.allocations="[6.1 GiB]" memory.weights.total="4.9 GiB" memory.weights.repeating="4.8 GiB" memory.weights.nonrepeating="77.1 MiB" memory.graph.full="512.0 MiB" memory.graph.partial="512.0 MiB" Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu/ollama_llama_server Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu_avx/ollama_llama_server Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu_avx2/ollama_llama_server Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cuda_v11/ollama_llama_server Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/rocm_v60102/ollama_llama_server Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu/ollama_llama_server Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu_avx/ollama_llama_server Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cpu_avx2/ollama_llama_server Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/cuda_v11/ollama_llama_server Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3196240257/runners/rocm_v60102/ollama_llama_server Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.398+03:00 level=INFO source=server.go:170 msg="Invalid OLLAMA_LLM_LIBRARY rocm_v60002 - not found" Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.399+03:00 level=INFO source=server.go:384 msg="starting llama server" cmd="/tmp/ollama3196240257/runners/rocm_v60102/ollama_llama_server --model /root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 33 --verbose --parallel 4 --port 46879" Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.399+03:00 level=DEBUG source=server.go:401 msg=subprocess environment="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin HSA_OVERRIDE_GFX_VERSION=9.0.0 HSA_ENABLE_SDMA=0 LD_LIBRARY_PATH=/opt/rocm/lib:/tmp/ollama3196240257/runners/rocm_v60102:/tmp/ollama3196240257/runners HIP_VISIBLE_DEVICES=0]" Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.399+03:00 level=INFO source=sched.go:445 msg="loaded runners" count=1 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.399+03:00 level=INFO source=server.go:584 msg="waiting for llama runner to start responding" Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.399+03:00 level=INFO source=server.go:618 msg="waiting for server to become available" status="llm server error" Aug 2 22:01:39 ollama-ubuntu-jammy ollama[423]: INFO [main] build info | build=1 commit="6eeaeba" tid="139675947496256" timestamp=1722625299 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[423]: INFO [main] system info | n_threads=12 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="139675947496256" timestamp=1722625299 total_threads=24 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[423]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="23" port="46879" tid="139675947496256" timestamp=1722625299 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: loaded meta data with 36 key-value pairs and 197 tensors from /root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf (version GGUF V3 (latest)) Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 0: general.architecture str = phi3 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 1: general.type str = model Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 2: general.name str = Phi 3 Mini 128k Instruct Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 3: general.finetune str = 128k-instruct Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 4: general.basename str = Phi-3 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 5: general.size_label str = mini Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 6: general.license str = mit Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/microsoft/Phi-... Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 8: general.tags arr[str,3] = ["nlp", "code", "text-generation"] Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 9: general.languages arr[str,1] = ["en"] Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 10: phi3.context_length u32 = 131072 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 11: phi3.rope.scaling.original_context_length u32 = 4096 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 12: phi3.embedding_length u32 = 3072 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 13: phi3.feed_forward_length u32 = 8192 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 14: phi3.block_count u32 = 32 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 15: phi3.attention.head_count u32 = 32 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 16: phi3.attention.head_count_kv u32 = 32 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 17: phi3.attention.layer_norm_rms_epsilon f32 = 0.000010 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 18: phi3.rope.dimension_count u32 = 96 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 19: phi3.rope.freq_base f32 = 10000.000000 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 20: general.file_type u32 = 2 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 21: phi3.attention.sliding_window u32 = 262144 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 22: phi3.rope.scaling.attn_factor f32 = 1.190238 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 23: tokenizer.ggml.model str = llama Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 24: tokenizer.ggml.pre str = default Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,32064] = ["<unk>", "<s>", "</s>", "<0x00>", "<... Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 26: tokenizer.ggml.scores arr[f32,32064] = [-1000.000000, -1000.000000, -1000.00... Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 27: tokenizer.ggml.token_type arr[i32,32064] = [3, 3, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 28: tokenizer.ggml.bos_token_id u32 = 1 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 29: tokenizer.ggml.eos_token_id u32 = 32000 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 30: tokenizer.ggml.unknown_token_id u32 = 0 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 31: tokenizer.ggml.padding_token_id u32 = 32000 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 32: tokenizer.ggml.add_bos_token bool = false Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 33: tokenizer.ggml.add_eos_token bool = false Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 34: tokenizer.chat_template str = {% for message in messages %}{% if me... Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - kv 35: general.quantization_version u32 = 2 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - type f32: 67 tensors Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - type q4_0: 129 tensors Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llama_model_loader: - type q6_K: 1 tensors Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_vocab: special tokens cache size = 14 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_vocab: token to piece cache size = 0.1685 MB Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: format = GGUF V3 (latest) Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: arch = phi3 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: vocab type = SPM Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_vocab = 32064 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_merges = 0 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: vocab_only = 0 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_ctx_train = 131072 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_embd = 3072 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_layer = 32 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_head = 32 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_head_kv = 32 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_rot = 96 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_swa = 262144 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_embd_head_k = 96 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_embd_head_v = 96 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_gqa = 1 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_embd_k_gqa = 3072 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_embd_v_gqa = 3072 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: f_norm_eps = 0.0e+00 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: f_logit_scale = 0.0e+00 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_ff = 8192 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_expert = 0 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_expert_used = 0 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: causal attn = 1 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: pooling type = 0 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: rope type = 2 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: rope scaling = linear Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: freq_base_train = 10000.0 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: freq_scale_train = 1 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: n_ctx_orig_yarn = 4096 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: rope_finetuned = unknown Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: ssm_d_conv = 0 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: ssm_d_inner = 0 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: ssm_d_state = 0 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: ssm_dt_rank = 0 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: model type = 3B Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: model ftype = Q4_0 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: model params = 3.82 B Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: model size = 2.03 GiB (4.55 BPW) Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: general.name = Phi 3 Mini 128k Instruct Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: BOS token = 1 '<s>' Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: EOS token = 32000 '<|endoftext|>' Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: UNK token = 0 '<unk>' Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: PAD token = 32000 '<|endoftext|>' Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: LF token = 13 '<0x0A>' Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: EOT token = 32007 '<|end|>' Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: llm_load_print_meta: max token length = 48 Aug 2 22:01:39 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:39.651+03:00 level=INFO source=server.go:618 msg="waiting for server to become available" status="llm server loading model" Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: ggml_cuda_init: found 1 ROCm devices: Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: Device 0: Radeon RX Vega, compute capability 9.0, VMM: no Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llm_load_tensors: ggml ctx size = 0.21 MiB Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llm_load_tensors: offloading 32 repeating layers to GPU Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llm_load_tensors: offloading non-repeating layers to GPU Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llm_load_tensors: offloaded 33/33 layers to GPU Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llm_load_tensors: ROCm0 buffer size = 2021.84 MiB Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llm_load_tensors: CPU buffer size = 52.84 MiB Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:40.403+03:00 level=DEBUG source=server.go:629 msg="model load progress 0.08" Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:40.654+03:00 level=DEBUG source=server.go:629 msg="model load progress 0.57" Aug 2 22:01:40 ollama-ubuntu-jammy systemd[1]: Starting Daily apt upgrade and clean activities... Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:40.905+03:00 level=DEBUG source=server.go:629 msg="model load progress 0.97" Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: n_ctx = 8192 Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: n_batch = 512 Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: n_ubatch = 512 Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: flash_attn = 0 Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: freq_base = 10000.0 Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: freq_scale = 1 Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_kv_cache_init: ROCm0 KV buffer size = 3072.00 MiB Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: KV self size = 3072.00 MiB, K (f16): 1536.00 MiB, V (f16): 1536.00 MiB Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: ROCm_Host output buffer size = 0.54 MiB Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: ROCm0 compute buffer size = 564.00 MiB Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: ROCm_Host compute buffer size = 22.01 MiB Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: graph nodes = 1286 Aug 2 22:01:40 ollama-ubuntu-jammy ollama[387]: llama_new_context_with_model: graph splits = 2 Aug 2 22:01:40 ollama-ubuntu-jammy ollama[423]: DEBUG [initialize] initializing slots | n_slots=4 tid="139675947496256" timestamp=1722625300 Aug 2 22:01:40 ollama-ubuntu-jammy ollama[423]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=0 tid="139675947496256" timestamp=1722625300 Aug 2 22:01:40 ollama-ubuntu-jammy ollama[423]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=1 tid="139675947496256" timestamp=1722625300 Aug 2 22:01:40 ollama-ubuntu-jammy ollama[423]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=2 tid="139675947496256" timestamp=1722625300 Aug 2 22:01:40 ollama-ubuntu-jammy ollama[423]: DEBUG [initialize] new slot | n_ctx_slot=2048 slot_id=3 tid="139675947496256" timestamp=1722625300 Aug 2 22:01:40 ollama-ubuntu-jammy ollama[423]: INFO [main] model loaded | tid="139675947496256" timestamp=1722625300 Aug 2 22:01:40 ollama-ubuntu-jammy ollama[423]: DEBUG [update_slots] all slots are idle and system prompt is empty, clear the KV cache | tid="139675947496256" timestamp=1722625300 Aug 2 22:01:41 ollama-ubuntu-jammy systemd[1]: apt-daily-upgrade.service: Deactivated successfully. Aug 2 22:01:41 ollama-ubuntu-jammy systemd[1]: Finished Daily apt upgrade and clean activities. Aug 2 22:01:41 ollama-ubuntu-jammy ollama[423]: DEBUG [process_single_task] slot data | n_idle_slots=4 n_processing_slots=0 task_id=0 tid="139675947496256" timestamp=1722625301 Aug 2 22:01:41 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:41.156+03:00 level=INFO source=server.go:623 msg="llama runner started in 1.76 seconds" Aug 2 22:01:41 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:41.156+03:00 level=DEBUG source=sched.go:458 msg="finished setting up runner" model=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf Aug 2 22:01:41 ollama-ubuntu-jammy ollama[387]: [GIN] 2024/08/02 - 22:01:41 | 200 | 1.775479674s | 127.0.0.1 | POST "/api/chat" Aug 2 22:01:41 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:41.156+03:00 level=DEBUG source=sched.go:462 msg="context for request finished" Aug 2 22:01:41 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:41.156+03:00 level=DEBUG source=sched.go:334 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf duration=5m0s Aug 2 22:01:41 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:41.156+03:00 level=DEBUG source=sched.go:352 msg="after processing request finished event" modelPath=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf refCount=0 Aug 2 22:01:45 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:45.117+03:00 level=DEBUG source=sched.go:571 msg="evaluating already loaded" model=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf Aug 2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [process_single_task] slot data | n_idle_slots=4 n_processing_slots=0 task_id=1 tid="139675947496256" timestamp=1722625305 Aug 2 22:01:45 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:45.118+03:00 level=DEBUG source=routes.go:1347 msg="chat request" images=0 prompt="<|user|>\nHey there!<|end|>\n<|assistant|>\n" Aug 2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [process_single_task] slot data | n_idle_slots=4 n_processing_slots=0 task_id=2 tid="139675947496256" timestamp=1722625305 Aug 2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [launch_slot_with_data] slot is processing task | slot_id=0 task_id=3 tid="139675947496256" timestamp=1722625305 Aug 2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [update_slots] slot progression | ga_i=0 n_past=0 n_past_se=0 n_prompt_tokens_processed=13 slot_id=0 task_id=3 tid="139675947496256" timestamp=1722625305 Aug 2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [update_slots] kv cache rm [p0, end) | p0=0 slot_id=0 task_id=3 tid="139675947496256" timestamp=1722625305 Aug 2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [print_timings] prompt eval time = 98.30 ms / 13 tokens ( 7.56 ms per token, 132.25 tokens per second) | n_prompt_tokens_processed=13 n_tokens_second=132.25091049665303 slot_id=0 t_prompt_processing=98.298 t_token=7.561384615384616 task_id=3 tid="139675947496256" timestamp=1722625305 Aug 2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [print_timings] generation eval time = 171.00 ms / 10 runs ( 17.10 ms per token, 58.48 tokens per second) | n_decoded=10 n_tokens_second=58.48021614287887 slot_id=0 t_token=17.0998 t_token_generation=170.998 task_id=3 tid="139675947496256" timestamp=1722625305 Aug 2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [print_timings] total time = 269.30 ms | slot_id=0 t_prompt_processing=98.298 t_token_generation=170.998 t_total=269.296 task_id=3 tid="139675947496256" timestamp=1722625305 Aug 2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [update_slots] slot released | n_cache_tokens=23 n_ctx=8192 n_past=22 n_system_tokens=0 slot_id=0 task_id=3 tid="139675947496256" timestamp=1722625305 truncated=false Aug 2 22:01:45 ollama-ubuntu-jammy ollama[423]: DEBUG [log_server_request] request | method="POST" params={} path="/completion" remote_addr="127.0.0.1" remote_port=48122 status=200 tid="139675929642560" timestamp=1722625305 Aug 2 22:01:45 ollama-ubuntu-jammy ollama[387]: [GIN] 2024/08/02 - 22:01:45 | 200 | 319.564933ms | 127.0.0.1 | POST "/api/chat" Aug 2 22:01:45 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:45.429+03:00 level=DEBUG source=sched.go:403 msg="context for request finished" Aug 2 22:01:45 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:45.429+03:00 level=DEBUG source=sched.go:334 msg="runner with non-zero duration has gone idle, adding timer" modelPath=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf duration=5m0s Aug 2 22:01:45 ollama-ubuntu-jammy ollama[387]: time=2024-08-02T22:01:45.429+03:00 level=DEBUG source=sched.go:352 msg="after processing request finished event" modelPath=/root/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf refCount=0 ``` Thanks for an amazing software/app, I hope someone can wrap their head around this issue! ### OS Linux ### GPU AMD ### CPU AMD ### Ollama version from 0.1.48 to 0.3.x

GiteaMirror added the install linux amd bug labels 2026-04-22 08:36:21 -05:00

GiteaMirror closed this issue

2026-04-22 08:36:22 -05:00

GiteaMirror commented

2026-04-22 08:36:24 -05:00

@dhiltgen commented on GitHub (Aug 2, 2024):

Can you clarify "Ollama install.sh breaks an upgrade" ? Did the install fail during install, or is Ollama crashing after upgrade, or is it failing to find your GPU, or something else? Are there server logs for the failure after upgrade?

Between 0.1.48 and 0.3.x we've bumped ROCm versions from 6.1.1 to 6.1.2 and we've been shifting to a model of favoring our bundled ROCm version.

@dhiltgen commented on GitHub (Aug 2, 2024): Can you clarify "Ollama install.sh breaks an upgrade" ? Did the install fail during install, or is Ollama crashing after upgrade, or is it failing to find your GPU, or something else? Are there server logs for the failure after upgrade? Between 0.1.48 and 0.3.x we've bumped ROCm versions from 6.1.1 to 6.1.2 and we've been shifting to a model of favoring our bundled ROCm version.

GiteaMirror commented

2026-04-22 08:36:24 -05:00

@mathatan commented on GitHub (Aug 3, 2024):

Oh, right. It was starting to get late and I didn't even remember to report the actual issue.

After running the install script and trying to load a model (via ollama run) I get:
rocBLAS error: Could not initialize Tensile host: No devices found

The device is available from, as it's working perfectly fine otherwise. Also, it's not possible to revert the installation, i.e. installing an older version of Ollama will not fix things. If I just replace the executable manually (or by editing the install script) everything works fine.

Here are the detailed info and logs (copy pasted from the original link):

Environment:

[Service]
Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_LLM_LIBRARY=rocm_v60002"
Environment="HSA_OVERRIDE_GFX_VERSION=9.0.0"
Environment="OLLAMA_NOHISTORY=1"
Environment="HSA_ENABLE_SDMA=0"
Environment="OLLAMA_DEBUG=1"
Environment="AMD_SERIALIZE_KERNEL=3"

Logs:

Aug  2 20:05:43 ollama-ubuntu-jammy-2 systemd[1]: Started Ollama Service.
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: 2024/08/02 20:05:43 routes.go:1109: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:9.0.0 OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY:rocm_v60002 OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_NOHISTORY:true OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.405+03:00 level=INFO source=images.go:781 msg="total blobs: 77"
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.407+03:00 level=INFO source=images.go:788 msg="total unused blobs removed: 0"
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=INFO source=routes.go:1156 msg="Listening on [::]:11434 (version 0.3.2)"
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama802441264/runners
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/deps.txt.gz
Aug  2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/ollama_llama_server.gz
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu/ollama_llama_server
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx/ollama_llama_server
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx2/ollama_llama_server
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cuda_v11/ollama_llama_server
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/rocm_v60102/ollama_llama_server
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cuda_v11 rocm_v60102 cpu]"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=sched.go:105 msg="starting llm scheduler"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=INFO source=gpu.go:205 msg="looking for compatible GPUs"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=gpu.go:91 msg="searching for GPU discovery libraries for NVIDIA"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcuda.so*
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.278+03:00 level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[]
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.278+03:00 level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcudart.so*
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.278+03:00 level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/libcudart.so** /tmp/ollama802441264/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.278+03:00 level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[/tmp/ollama802441264/runners/cuda_v11/libcudart.so.11.0]
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: cudaSetDevice err: 35
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=gpu.go:533 msg="Unable to load cudart" library=/tmp/ollama802441264/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing.  If you have a CUDA GPU please upgrade to run ollama"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=WARN source=amd_linux.go:59 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:127 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:217 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=26751 unique_id=150031596308672932
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:251 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card0/device
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:283 msg="amdgpu memory" gpu=0 total="8.0 GiB"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:284 msg="amdgpu memory" gpu=0 available="8.0 GiB"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/local/bin/rocm"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib"
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.280+03:00 level=INFO source=amd_linux.go:348 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=9.0.0
Aug  2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.280+03:00 level=INFO source=types.go:105 msg="inference compute" id=0 library=rocm compute=gfx900 driver=0.0 name=1002:687f total="8.0 GiB" available="8.0 GiB"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: [GIN] 2024/08/02 - 20:05:50 | 200 |      42.189µs |       127.0.0.1 | HEAD     "/"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: [GIN] 2024/08/02 - 20:05:50 | 200 |    4.888907ms |       127.0.0.1 | POST     "/api/show"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.541+03:00 level=DEBUG source=gpu.go:358 msg="updating system memory data" before.total="62.7 GiB" before.free="44.4 GiB" before.free_swap="0 B" now.total="62.7 GiB" now.free="44.3 GiB" now.free_swap="0 B"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.541+03:00 level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:687f before="8.0 GiB" now="8.0 GiB"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.541+03:00 level=DEBUG source=sched.go:181 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=0x816020 gpu_count=1
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=sched.go:219 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=memory.go:101 msg=evaluating library=rocm gpu_count=1 available="[8.0 GiB]"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=INFO source=sched.go:710 msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf gpu=0 parallel=4 available=8564838400 required="6.1 GiB"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=server.go:100 msg="system memory" total="62.7 GiB" free="44.3 GiB" free_swap="0 B"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=memory.go:101 msg=evaluating library=rocm gpu_count=1 available="[8.0 GiB]"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=INFO source=memory.go:309 msg="offload to rocm" layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[8.0 GiB]" memory.required.full="6.1 GiB" memory.required.partial="6.1 GiB" memory.required.kv="3.0 GiB" memory.required.allocations="[6.1 GiB]" memory.weights.total="4.9 GiB" memory.weights.repeating="4.8 GiB" memory.weights.nonrepeating="77.1 MiB" memory.graph.full="512.0 MiB" memory.graph.partial="512.0 MiB"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu/ollama_llama_server
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx/ollama_llama_server
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx2/ollama_llama_server
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cuda_v11/ollama_llama_server
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/rocm_v60102/ollama_llama_server
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu/ollama_llama_server
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx/ollama_llama_server
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx2/ollama_llama_server
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cuda_v11/ollama_llama_server
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/rocm_v60102/ollama_llama_server
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=INFO source=server.go:170 msg="Invalid OLLAMA_LLM_LIBRARY rocm_v60002 - not found"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.546+03:00 level=INFO source=server.go:384 msg="starting llama server" cmd="/tmp/ollama802441264/runners/rocm_v60102/ollama_llama_server --model /usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 33 --verbose --parallel 4 --port 46445"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.546+03:00 level=DEBUG source=server.go:401 msg=subprocess environment="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin HSA_OVERRIDE_GFX_VERSION=9.0.0 HSA_ENABLE_SDMA=0 LD_LIBRARY_PATH=/opt/rocm/lib:/tmp/ollama802441264/runners/rocm_v60102:/tmp/ollama802441264/runners HIP_VISIBLE_DEVICES=0]"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.546+03:00 level=INFO source=sched.go:445 msg="loaded runners" count=1
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.546+03:00 level=INFO source=server.go:584 msg="waiting for llama runner to start responding"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.546+03:00 level=INFO source=server.go:618 msg="waiting for server to become available" status="llm server error"
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[482]: INFO [main] build info | build=1 commit="6eeaeba" tid="139832002468928" timestamp=1722618350
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[482]: INFO [main] system info | n_threads=12 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="139832002468928" timestamp=1722618350 total_threads=24
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[482]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="23" port="46445" tid="139832002468928" timestamp=1722618350
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: loaded meta data with 36 key-value pairs and 197 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf (version GGUF V3 (latest))
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv   0:                       general.architecture str              = phi3
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv   1:                               general.type str              = model
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv   2:                               general.name str              = Phi 3 Mini 128k Instruct
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv   3:                           general.finetune str              = 128k-instruct
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv   4:                           general.basename str              = Phi-3
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv   5:                         general.size_label str              = mini
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv   6:                            general.license str              = mit
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv   7:                       general.license.link str              = https://huggingface.co/microsoft/Phi-...
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv   8:                               general.tags arr[str,3]       = ["nlp", "code", "text-generation"]
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv   9:                          general.languages arr[str,1]       = ["en"]
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  10:                        phi3.context_length u32              = 131072
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  11:  phi3.rope.scaling.original_context_length u32              = 4096
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  12:                      phi3.embedding_length u32              = 3072
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  13:                   phi3.feed_forward_length u32              = 8192
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  14:                           phi3.block_count u32              = 32
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  15:                  phi3.attention.head_count u32              = 32
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  16:               phi3.attention.head_count_kv u32              = 32
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  17:      phi3.attention.layer_norm_rms_epsilon f32              = 0.000010
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  18:                  phi3.rope.dimension_count u32              = 96
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  19:                        phi3.rope.freq_base f32              = 10000.000000
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  20:                          general.file_type u32              = 2
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  21:              phi3.attention.sliding_window u32              = 262144
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  22:              phi3.rope.scaling.attn_factor f32              = 1.190238
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = llama
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = default
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,32064]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  26:                      tokenizer.ggml.scores arr[f32,32064]   = [-1000.000000, -1000.000000, -1000.00...
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  27:                  tokenizer.ggml.token_type arr[i32,32064]   = [3, 3, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  28:                tokenizer.ggml.bos_token_id u32              = 1
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  29:                tokenizer.ggml.eos_token_id u32              = 32000
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  30:            tokenizer.ggml.unknown_token_id u32              = 0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  31:            tokenizer.ggml.padding_token_id u32              = 32000
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  32:               tokenizer.ggml.add_bos_token bool             = false
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  33:               tokenizer.ggml.add_eos_token bool             = false
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  34:                    tokenizer.chat_template str              = {% for message in messages %}{% if me...
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv  35:               general.quantization_version u32              = 2
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - type  f32:   67 tensors
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - type q4_0:  129 tensors
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - type q6_K:    1 tensors
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_vocab: special tokens cache size = 14
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_vocab: token to piece cache size = 0.1685 MB
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: format           = GGUF V3 (latest)
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: arch             = phi3
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: vocab type       = SPM
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_vocab          = 32064
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_merges         = 0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: vocab_only       = 0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_ctx_train      = 131072
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_embd           = 3072
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_layer          = 32
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_head           = 32
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_head_kv        = 32
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_rot            = 96
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_swa            = 262144
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_embd_head_k    = 96
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_embd_head_v    = 96
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_gqa            = 1
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_embd_k_gqa     = 3072
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_embd_v_gqa     = 3072
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: f_norm_eps       = 0.0e+00
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: f_logit_scale    = 0.0e+00
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_ff             = 8192
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_expert         = 0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_expert_used    = 0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: causal attn      = 1
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: pooling type     = 0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: rope type        = 2
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: rope scaling     = linear
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: freq_base_train  = 10000.0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: freq_scale_train = 1
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_ctx_orig_yarn  = 4096
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: rope_finetuned   = unknown
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: ssm_d_conv       = 0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: ssm_d_inner      = 0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: ssm_d_state      = 0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: ssm_dt_rank      = 0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: model type       = 3B
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: model ftype      = Q4_0
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: model params     = 3.82 B
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: model size       = 2.03 GiB (4.55 BPW)
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: general.name     = Phi 3 Mini 128k Instruct
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: BOS token        = 1 '<s>'
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: EOS token        = 32000 '<|endoftext|>'
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: UNK token        = 0 '<unk>'
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: PAD token        = 32000 '<|endoftext|>'
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: LF token         = 13 '<0x0A>'
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: EOT token        = 32007 '<|end|>'
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: max token length = 48
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: rocBLAS error: Could not initialize Tensile host: No devices found
Aug  2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.997+03:00 level=INFO source=server.go:618 msg="waiting for server to become available" status="llm server not responding"
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.086+03:00 level=DEBUG source=server.go:424 msg="llama runner terminated" error="signal: aborted (core dumped)"
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=ERROR source=sched.go:451 msg="error loading llama server" error="llama runner process has terminated: error:Could not initialize Tensile host: No devices found"
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=sched.go:454 msg="triggering expiration for failed load" model=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=sched.go:355 msg="runner expired event received" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=sched.go:371 msg="got lock to unload" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: [GIN] 2024/08/02 - 20:05:52 | 500 |  1.613482132s |       127.0.0.1 | POST     "/api/chat"
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=gpu.go:358 msg="updating system memory data" before.total="62.7 GiB" before.free="44.3 GiB" before.free_swap="0 B" now.total="62.7 GiB" now.free="44.3 GiB" now.free_swap="0 B"
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:687f before="8.0 GiB" now="8.0 GiB"
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=server.go:1042 msg="stopping llama server"
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=sched.go:376 msg="runner released" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.401+03:00 level=DEBUG source=gpu.go:358 msg="updating system memory data" before.total="62.7 GiB" before.free="44.3 GiB" before.free_swap="0 B" now.total="62.7 GiB" now.free="44.3 GiB" now.free_swap="0 B"
Aug  2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.401+03:00 level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:687f before="8.0 GiB" now="8.0 GiB"

As can be seen, it detects the GPU perfectly fine and tries to use it, but for some reason rocBLAS fails.

ROCM-SMI:

======================================= ROCm System Management Interface =======================================
================================================= Concise Info =================================================
Device  [Model : Revision]    Temp    Power     Partitions      SCLK    MCLK    Fan  Perf  PwrCap  VRAM%  GPU%
        Name (20 chars)       (Edge)  (Socket)  (Mem, Compute)
================================================================================================================
0       [0x2308 : 0xc1]       45.0°C  4.0W      N/A, N/A        852Mhz  167Mhz  0%   auto  247.0W    0%   0%
        Vega 10 XL/XT [Radeo
================================================================================================================
============================================= End of ROCm SMI Log ==============================================

rocminfo:

ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE
System Endianness:       LITTLE
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========
HSA Agents
==========
*******
Agent 1
*******
  Name:                    AMD Ryzen 9 3900X 12-Core Processor
  Uuid:                    CPU-XX
  Marketing Name:          AMD Ryzen 9 3900X 12-Core Processor
  Vendor Name:             CPU
  Feature:                 None specified
  Profile:                 FULL_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        0(0x0)
  Queue Min Size:          0(0x0)
  Queue Max Size:          0(0x0)
  Queue Type:              MULTI
  Node:                    0
  Device Type:             CPU
  Cache Info:
    L1:                      32768(0x8000) KB
  Chip ID:                 0(0x0)
  ASIC Revision:           0(0x0)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   3800
  BDFID:                   0
  Internal Node ID:        0
  Compute Unit:            24
  SIMDs per CU:            0
  Shader Engines:          0
  Shader Arrs. per Eng.:   0
  WatchPts on Addr. Ranges:1
  Features:                None
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: FINE GRAINED
      Size:                    65771684(0x3eb98a4) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 2
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    65771684(0x3eb98a4) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 3
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    65771684(0x3eb98a4) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
  ISA Info:
*******
Agent 2
*******
  Name:                    gfx900
  Uuid:                    GPU-021504f1231031a4
  Marketing Name:          Radeon RX Vega
  Vendor Name:             AMD
  Feature:                 KERNEL_DISPATCH
  Profile:                 BASE_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        128(0x80)
  Queue Min Size:          64(0x40)
  Queue Max Size:          131072(0x20000)
  Queue Type:              MULTI
  Node:                    1
  Device Type:             GPU
  Cache Info:
    L1:                      16(0x10) KB
    L2:                      4096(0x1000) KB
  Chip ID:                 26751(0x687f)
  ASIC Revision:           1(0x1)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   1630
  BDFID:                   2560
  Internal Node ID:        1
  Compute Unit:            64
  SIMDs per CU:            4
  Shader Engines:          4
  Shader Arrs. per Eng.:   1
  WatchPts on Addr. Ranges:4
  Coherent Host Access:    FALSE
  Features:                KERNEL_DISPATCH
  Fast F16 Operation:      TRUE
  Wavefront Size:          64(0x40)
  Workgroup Max Size:      1024(0x400)
  Workgroup Max Size per Dimension:
    x                        1024(0x400)
    y                        1024(0x400)
    z                        1024(0x400)
  Max Waves Per CU:        40(0x28)
  Max Work-item Per CU:    2560(0xa00)
  Grid Max Size:           4294967295(0xffffffff)
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)
    y                        4294967295(0xffffffff)
    z                        4294967295(0xffffffff)
  Max fbarriers/Workgrp:   32
  Packet Processor uCode:: 468
  SDMA engine uCode::      434
  IOMMU Support::          None
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    8372224(0x7fc000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 2
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    8372224(0x7fc000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 3
      Segment:                 GROUP
      Size:                    64(0x40) KB
      Allocatable:             FALSE
      Alloc Granule:           0KB
      Alloc Alignment:         0KB
      Accessible by all:       FALSE
  ISA Info:
    ISA 1
      Name:                    amdgcn-amd-amdhsa--gfx900:xnack-
      Machine Models:          HSA_MACHINE_MODEL_LARGE
      Profiles:                HSA_PROFILE_BASE
      Default Rounding Mode:   NEAR
      Default Rounding Mode:   NEAR
      Fast f16:                TRUE
      Workgroup Max Size:      1024(0x400)
      Workgroup Max Size per Dimension:
        x                        1024(0x400)
        y                        1024(0x400)
        z                        1024(0x400)
      Grid Max Size:           4294967295(0xffffffff)
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)
        y                        4294967295(0xffffffff)
        z                        4294967295(0xffffffff)
      FBarrier Max Size:       32
*** Done ***

apt show rocm-libs -a (version)

Package: rocm-libs
Version: 6.0.2.60002-115~22.04
Priority: optional
Section: devel
Maintainer: ROCm Dev Support <rocm-dev.support@amd.com>
Installed-Size: 13.3 kB
Depends: hipblas (= 2.0.0.60002-115~22.04), hipblaslt (= 0.6.0.60002-115~22.04), hipfft (= 1.0.13.60002-115~22.04), hipsolver (= 2.0.0.60002-115~22.04), hipsparse (= 3.0.0.60002-115~22.04), hiptensor (= 1.1.0.60002-115~22.04), miopen-hip (= 3.00.0.60002-115~22.04), half (= 1.12.0.60002-115~22.04), rccl (= 2.18.3.60002-115~22.04), rocalution (= 3.0.3.60002-115~22.04), rocblas (= 4.0.0.60002-115~22.04), rocfft (= 1.0.25.60002-115~22.04), rocrand (= 3.0.0.60002-115~22.04), hiprand (= 2.10.16.60002-115~22.04), rocsolver (= 3.24.0.60002-115~22.04), rocsparse (= 3.0.2.60002-115~22.04), rocm-core (= 6.0.2.60002-115~22.04), composablekernel-dev (= 1.1.0.60002-115~22.04), hipblas-dev (= 2.0.0.60002-115~22.04), hipblaslt-dev (= 0.6.0.60002-115~22.04), hipcub-dev (= 3.0.0.60002-115~22.04), hipfft-dev (= 1.0.13.60002-115~22.04), hipsolver-dev (= 2.0.0.60002-115~22.04), hipsparse-dev (= 3.0.0.60002-115~22.04), hiptensor-dev (= 1.1.0.60002-115~22.04), miopen-hip-dev (= 3.00.0.60002-115~22.04), rccl-dev (= 2.18.3.60002-115~22.04), rocalution-dev (= 3.0.3.60002-115~22.04), rocblas-dev (= 4.0.0.60002-115~22.04), rocfft-dev (= 1.0.25.60002-115~22.04), rocprim-dev (= 3.0.0.60002-115~22.04), rocrand-dev (= 3.0.0.60002-115~22.04), hiprand-dev (= 2.10.16.60002-115~22.04), rocsolver-dev (= 3.24.0.60002-115~22.04), rocsparse-dev (= 3.0.2.60002-115~22.04), rocthrust-dev (= 3.0.0.60002-115~22.04), rocwmma-dev (= 1.3.0.60002-115~22.04)
Homepage: https://github.com/RadeonOpenCompute/ROCm
Download-Size: 1,050 B
APT-Sources: https://repo.radeon.com/rocm/apt/6.0.2 jammy/main amd64 Packages
Description: Radeon Open Compute (ROCm) Runtime software stack

@mathatan commented on GitHub (Aug 3, 2024): Oh, right. It was starting to get late and I didn't even remember to report the actual issue. After running the install script and trying to load a model (via `ollama run`) I get: `rocBLAS error: Could not initialize Tensile host: No devices found` The device is available from, as it's working perfectly fine otherwise. Also, it's not possible to revert the installation, i.e. installing an older version of Ollama will not fix things. If I just replace the executable manually (or by editing the install script) everything works fine. Here are the detailed info and logs (copy pasted from the original link): Environment: ``` [Service] Environment="OLLAMA_HOST=0.0.0.0" Environment="OLLAMA_LLM_LIBRARY=rocm_v60002" Environment="HSA_OVERRIDE_GFX_VERSION=9.0.0" Environment="OLLAMA_NOHISTORY=1" Environment="HSA_ENABLE_SDMA=0" Environment="OLLAMA_DEBUG=1" Environment="AMD_SERIALIZE_KERNEL=3" ``` Logs: ``` Aug 2 20:05:43 ollama-ubuntu-jammy-2 systemd[1]: Started Ollama Service. Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: 2024/08/02 20:05:43 routes.go:1109: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:9.0.0 OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY:rocm_v60002 OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_NOHISTORY:true OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]" Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.405+03:00 level=INFO source=images.go:781 msg="total blobs: 77" Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.407+03:00 level=INFO source=images.go:788 msg="total unused blobs removed: 0" Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=INFO source=routes.go:1156 msg="Listening on [::]:11434 (version 0.3.2)" Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama802441264/runners Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublas.so.11.gz Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcublasLt.so.11.gz Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/libcudart.so.11.0.gz Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=cuda_v11 file=build/linux/x86_64/cuda_v11/bin/ollama_llama_server.gz Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/deps.txt.gz Aug 2 20:05:43 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:43.409+03:00 level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60102 file=build/linux/x86_64/rocm_v60102/bin/ollama_llama_server.gz Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu/ollama_llama_server Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx/ollama_llama_server Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx2/ollama_llama_server Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cuda_v11/ollama_llama_server Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/rocm_v60102/ollama_llama_server Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cuda_v11 rocm_v60102 cpu]" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=sched.go:105 msg="starting llm scheduler" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=INFO source=gpu.go:205 msg="looking for compatible GPUs" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=gpu.go:91 msg="searching for GPU discovery libraries for NVIDIA" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcuda.so* Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.276+03:00 level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/libcuda.so** /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.278+03:00 level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[] Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.278+03:00 level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcudart.so* Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.278+03:00 level=DEBUG source=gpu.go:487 msg="gpu library search" globs="[/libcudart.so** /tmp/ollama802441264/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.278+03:00 level=DEBUG source=gpu.go:521 msg="discovered GPU libraries" paths=[/tmp/ollama802441264/runners/cuda_v11/libcudart.so.11.0] Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: cudaSetDevice err: 35 Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=gpu.go:533 msg="Unable to load cudart" library=/tmp/ollama802441264/runners/cuda_v11/libcudart.so.11.0 error="your nvidia driver is too old or missing. If you have a CUDA GPU please upgrade to run ollama" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=WARN source=amd_linux.go:59 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:127 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:217 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=26751 unique_id=150031596308672932 Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:251 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card0/device Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:283 msg="amdgpu memory" gpu=0 total="8.0 GiB" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_linux.go:284 msg="amdgpu memory" gpu=0 available="8.0 GiB" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /usr/local/bin/rocm" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.279+03:00 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /opt/rocm/lib" Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.280+03:00 level=INFO source=amd_linux.go:348 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=9.0.0 Aug 2 20:05:47 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:47.280+03:00 level=INFO source=types.go:105 msg="inference compute" id=0 library=rocm compute=gfx900 driver=0.0 name=1002:687f total="8.0 GiB" available="8.0 GiB" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: [GIN] 2024/08/02 - 20:05:50 | 200 | 42.189µs | 127.0.0.1 | HEAD "/" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: [GIN] 2024/08/02 - 20:05:50 | 200 | 4.888907ms | 127.0.0.1 | POST "/api/show" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.541+03:00 level=DEBUG source=gpu.go:358 msg="updating system memory data" before.total="62.7 GiB" before.free="44.4 GiB" before.free_swap="0 B" now.total="62.7 GiB" now.free="44.3 GiB" now.free_swap="0 B" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.541+03:00 level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:687f before="8.0 GiB" now="8.0 GiB" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.541+03:00 level=DEBUG source=sched.go:181 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=0x816020 gpu_count=1 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=sched.go:219 msg="loading first model" model=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=memory.go:101 msg=evaluating library=rocm gpu_count=1 available="[8.0 GiB]" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=INFO source=sched.go:710 msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf gpu=0 parallel=4 available=8564838400 required="6.1 GiB" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=server.go:100 msg="system memory" total="62.7 GiB" free="44.3 GiB" free_swap="0 B" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=memory.go:101 msg=evaluating library=rocm gpu_count=1 available="[8.0 GiB]" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=INFO source=memory.go:309 msg="offload to rocm" layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[8.0 GiB]" memory.required.full="6.1 GiB" memory.required.partial="6.1 GiB" memory.required.kv="3.0 GiB" memory.required.allocations="[6.1 GiB]" memory.weights.total="4.9 GiB" memory.weights.repeating="4.8 GiB" memory.weights.nonrepeating="77.1 MiB" memory.graph.full="512.0 MiB" memory.graph.partial="512.0 MiB" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu/ollama_llama_server Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx/ollama_llama_server Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx2/ollama_llama_server Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cuda_v11/ollama_llama_server Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/rocm_v60102/ollama_llama_server Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu/ollama_llama_server Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx/ollama_llama_server Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cpu_avx2/ollama_llama_server Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/cuda_v11/ollama_llama_server Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama802441264/runners/rocm_v60102/ollama_llama_server Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.545+03:00 level=INFO source=server.go:170 msg="Invalid OLLAMA_LLM_LIBRARY rocm_v60002 - not found" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.546+03:00 level=INFO source=server.go:384 msg="starting llama server" cmd="/tmp/ollama802441264/runners/rocm_v60102/ollama_llama_server --model /usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 33 --verbose --parallel 4 --port 46445" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.546+03:00 level=DEBUG source=server.go:401 msg=subprocess environment="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin HSA_OVERRIDE_GFX_VERSION=9.0.0 HSA_ENABLE_SDMA=0 LD_LIBRARY_PATH=/opt/rocm/lib:/tmp/ollama802441264/runners/rocm_v60102:/tmp/ollama802441264/runners HIP_VISIBLE_DEVICES=0]" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.546+03:00 level=INFO source=sched.go:445 msg="loaded runners" count=1 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.546+03:00 level=INFO source=server.go:584 msg="waiting for llama runner to start responding" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.546+03:00 level=INFO source=server.go:618 msg="waiting for server to become available" status="llm server error" Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[482]: INFO [main] build info | build=1 commit="6eeaeba" tid="139832002468928" timestamp=1722618350 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[482]: INFO [main] system info | n_threads=12 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="139832002468928" timestamp=1722618350 total_threads=24 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[482]: INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="23" port="46445" tid="139832002468928" timestamp=1722618350 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: loaded meta data with 36 key-value pairs and 197 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf (version GGUF V3 (latest)) Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 0: general.architecture str = phi3 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 1: general.type str = model Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 2: general.name str = Phi 3 Mini 128k Instruct Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 3: general.finetune str = 128k-instruct Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 4: general.basename str = Phi-3 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 5: general.size_label str = mini Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 6: general.license str = mit Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/microsoft/Phi-... Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 8: general.tags arr[str,3] = ["nlp", "code", "text-generation"] Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 9: general.languages arr[str,1] = ["en"] Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 10: phi3.context_length u32 = 131072 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 11: phi3.rope.scaling.original_context_length u32 = 4096 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 12: phi3.embedding_length u32 = 3072 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 13: phi3.feed_forward_length u32 = 8192 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 14: phi3.block_count u32 = 32 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 15: phi3.attention.head_count u32 = 32 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 16: phi3.attention.head_count_kv u32 = 32 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 17: phi3.attention.layer_norm_rms_epsilon f32 = 0.000010 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 18: phi3.rope.dimension_count u32 = 96 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 19: phi3.rope.freq_base f32 = 10000.000000 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 20: general.file_type u32 = 2 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 21: phi3.attention.sliding_window u32 = 262144 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 22: phi3.rope.scaling.attn_factor f32 = 1.190238 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 23: tokenizer.ggml.model str = llama Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 24: tokenizer.ggml.pre str = default Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,32064] = ["<unk>", "<s>", "</s>", "<0x00>", "<... Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 26: tokenizer.ggml.scores arr[f32,32064] = [-1000.000000, -1000.000000, -1000.00... Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 27: tokenizer.ggml.token_type arr[i32,32064] = [3, 3, 4, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 28: tokenizer.ggml.bos_token_id u32 = 1 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 29: tokenizer.ggml.eos_token_id u32 = 32000 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 30: tokenizer.ggml.unknown_token_id u32 = 0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 31: tokenizer.ggml.padding_token_id u32 = 32000 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 32: tokenizer.ggml.add_bos_token bool = false Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 33: tokenizer.ggml.add_eos_token bool = false Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 34: tokenizer.chat_template str = {% for message in messages %}{% if me... Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - kv 35: general.quantization_version u32 = 2 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - type f32: 67 tensors Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - type q4_0: 129 tensors Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llama_model_loader: - type q6_K: 1 tensors Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_vocab: special tokens cache size = 14 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_vocab: token to piece cache size = 0.1685 MB Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: format = GGUF V3 (latest) Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: arch = phi3 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: vocab type = SPM Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_vocab = 32064 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_merges = 0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: vocab_only = 0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_ctx_train = 131072 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_embd = 3072 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_layer = 32 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_head = 32 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_head_kv = 32 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_rot = 96 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_swa = 262144 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_embd_head_k = 96 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_embd_head_v = 96 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_gqa = 1 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_embd_k_gqa = 3072 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_embd_v_gqa = 3072 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: f_norm_eps = 0.0e+00 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: f_logit_scale = 0.0e+00 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_ff = 8192 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_expert = 0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_expert_used = 0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: causal attn = 1 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: pooling type = 0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: rope type = 2 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: rope scaling = linear Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: freq_base_train = 10000.0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: freq_scale_train = 1 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: n_ctx_orig_yarn = 4096 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: rope_finetuned = unknown Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: ssm_d_conv = 0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: ssm_d_inner = 0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: ssm_d_state = 0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: ssm_dt_rank = 0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: model type = 3B Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: model ftype = Q4_0 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: model params = 3.82 B Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: model size = 2.03 GiB (4.55 BPW) Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: general.name = Phi 3 Mini 128k Instruct Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: BOS token = 1 '<s>' Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: EOS token = 32000 '<|endoftext|>' Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: UNK token = 0 '<unk>' Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: PAD token = 32000 '<|endoftext|>' Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: LF token = 13 '<0x0A>' Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: EOT token = 32007 '<|end|>' Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: llm_load_print_meta: max token length = 48 Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: rocBLAS error: Could not initialize Tensile host: No devices found Aug 2 20:05:50 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:50.997+03:00 level=INFO source=server.go:618 msg="waiting for server to become available" status="llm server not responding" Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.086+03:00 level=DEBUG source=server.go:424 msg="llama runner terminated" error="signal: aborted (core dumped)" Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=ERROR source=sched.go:451 msg="error loading llama server" error="llama runner process has terminated: error:Could not initialize Tensile host: No devices found" Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=sched.go:454 msg="triggering expiration for failed load" model=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=sched.go:355 msg="runner expired event received" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=sched.go:371 msg="got lock to unload" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: [GIN] 2024/08/02 - 20:05:52 | 500 | 1.613482132s | 127.0.0.1 | POST "/api/chat" Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=gpu.go:358 msg="updating system memory data" before.total="62.7 GiB" before.free="44.3 GiB" before.free_swap="0 B" now.total="62.7 GiB" now.free="44.3 GiB" now.free_swap="0 B" Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:687f before="8.0 GiB" now="8.0 GiB" Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=server.go:1042 msg="stopping llama server" Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.150+03:00 level=DEBUG source=sched.go:376 msg="runner released" modelPath=/usr/share/ollama/.ollama/models/blobs/sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.401+03:00 level=DEBUG source=gpu.go:358 msg="updating system memory data" before.total="62.7 GiB" before.free="44.3 GiB" before.free_swap="0 B" now.total="62.7 GiB" now.free="44.3 GiB" now.free_swap="0 B" Aug 2 20:05:52 ollama-ubuntu-jammy-2 ollama[447]: time=2024-08-02T20:05:52.401+03:00 level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:687f before="8.0 GiB" now="8.0 GiB" ``` As can be seen, it detects the GPU perfectly fine and tries to use it, but for some reason rocBLAS fails. ROCM-SMI: ``` ======================================= ROCm System Management Interface ======================================= ================================================= Concise Info ================================================= Device [Model : Revision] Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU% Name (20 chars) (Edge) (Socket) (Mem, Compute) ================================================================================================================ 0 [0x2308 : 0xc1] 45.0°C 4.0W N/A, N/A 852Mhz 167Mhz 0% auto 247.0W 0% 0% Vega 10 XL/XT [Radeo ================================================================================================================ ============================================= End of ROCm SMI Log ============================================== ``` rocminfo: ``` ROCk module is loaded ===================== HSA System Attributes ===================== Runtime Version: 1.1 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE Mwaitx: DISABLED DMAbuf Support: YES ========== HSA Agents ========== ******* Agent 1 ******* Name: AMD Ryzen 9 3900X 12-Core Processor Uuid: CPU-XX Marketing Name: AMD Ryzen 9 3900X 12-Core Processor Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 3800 BDFID: 0 Internal Node ID: 0 Compute Unit: 24 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 65771684(0x3eb98a4) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 65771684(0x3eb98a4) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 65771684(0x3eb98a4) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: ******* Agent 2 ******* Name: gfx900 Uuid: GPU-021504f1231031a4 Marketing Name: Radeon RX Vega Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 4096(0x1000) KB Chip ID: 26751(0x687f) ASIC Revision: 1(0x1) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 1630 BDFID: 2560 Internal Node ID: 1 Compute Unit: 64 SIMDs per CU: 4 Shader Engines: 4 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Coherent Host Access: FALSE Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 64(0x40) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 40(0x28) Max Work-item Per CU: 2560(0xa00) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 468 SDMA engine uCode:: 434 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 8372224(0x7fc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 8372224(0x7fc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx900:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 *** Done *** ``` `apt show rocm-libs -a` (version) ``` Package: rocm-libs Version: 6.0.2.60002-115~22.04 Priority: optional Section: devel Maintainer: ROCm Dev Support <rocm-dev.support@amd.com> Installed-Size: 13.3 kB Depends: hipblas (= 2.0.0.60002-115~22.04), hipblaslt (= 0.6.0.60002-115~22.04), hipfft (= 1.0.13.60002-115~22.04), hipsolver (= 2.0.0.60002-115~22.04), hipsparse (= 3.0.0.60002-115~22.04), hiptensor (= 1.1.0.60002-115~22.04), miopen-hip (= 3.00.0.60002-115~22.04), half (= 1.12.0.60002-115~22.04), rccl (= 2.18.3.60002-115~22.04), rocalution (= 3.0.3.60002-115~22.04), rocblas (= 4.0.0.60002-115~22.04), rocfft (= 1.0.25.60002-115~22.04), rocrand (= 3.0.0.60002-115~22.04), hiprand (= 2.10.16.60002-115~22.04), rocsolver (= 3.24.0.60002-115~22.04), rocsparse (= 3.0.2.60002-115~22.04), rocm-core (= 6.0.2.60002-115~22.04), composablekernel-dev (= 1.1.0.60002-115~22.04), hipblas-dev (= 2.0.0.60002-115~22.04), hipblaslt-dev (= 0.6.0.60002-115~22.04), hipcub-dev (= 3.0.0.60002-115~22.04), hipfft-dev (= 1.0.13.60002-115~22.04), hipsolver-dev (= 2.0.0.60002-115~22.04), hipsparse-dev (= 3.0.0.60002-115~22.04), hiptensor-dev (= 1.1.0.60002-115~22.04), miopen-hip-dev (= 3.00.0.60002-115~22.04), rccl-dev (= 2.18.3.60002-115~22.04), rocalution-dev (= 3.0.3.60002-115~22.04), rocblas-dev (= 4.0.0.60002-115~22.04), rocfft-dev (= 1.0.25.60002-115~22.04), rocprim-dev (= 3.0.0.60002-115~22.04), rocrand-dev (= 3.0.0.60002-115~22.04), hiprand-dev (= 2.10.16.60002-115~22.04), rocsolver-dev (= 3.24.0.60002-115~22.04), rocsparse-dev (= 3.0.2.60002-115~22.04), rocthrust-dev (= 3.0.0.60002-115~22.04), rocwmma-dev (= 1.3.0.60002-115~22.04) Homepage: https://github.com/RadeonOpenCompute/ROCm Download-Size: 1,050 B APT-Sources: https://repo.radeon.com/rocm/apt/6.0.2 jammy/main amd64 Packages Description: Radeon Open Compute (ROCm) Runtime software stack ```

GiteaMirror commented

2026-04-22 08:36:25 -05:00

@dhiltgen commented on GitHub (Aug 8, 2024):

It looks like it's using /opt/rocm on your host, which appears to be missing the gfx900 data files. The install script is detecting this ROCm install and skipping the install of our bundled version. To workaround this, you could run something like this: curl --fail --show-error --location --progress-bar "https://ollama.com/download/ollama-linux-amd64-rocm.tgz" | sudo tar zx --owner ollama --group ollama -C /usr/share/ollama/lib/rocm .

Once #5631 merges, we'll shift to a model of always carrying the rocm dependency and use it even if ROCm is installed on the target system.

@dhiltgen commented on GitHub (Aug 8, 2024): It looks like it's using `/opt/rocm` on your host, which appears to be missing the gfx900 data files. The install script is detecting this ROCm install and skipping the install of our bundled version. To workaround this, you could run something like this: `curl --fail --show-error --location --progress-bar "https://ollama.com/download/ollama-linux-amd64-rocm.tgz" | sudo tar zx --owner ollama --group ollama -C /usr/share/ollama/lib/rocm .` Once #5631 merges, we'll shift to a model of always carrying the rocm dependency and use it even if ROCm is installed on the target system.

Sign in to join this conversation.

Branches Tags

main

dhiltgen/ci

parth-launch-plan-gating

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#29598