[GH-ISSUE #6423] Running on MI300X via Docker fails with rocBLAS error: Could not initialize Tensile host: No devices found #66076

New Issue

GiteaMirror · 2026-05-03T23:51:51-05:00

GiteaMirror commented

2026-05-03 23:51:51 -05:00

Originally created by @peterschmidt85 on GitHub (Aug 19, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6423

Originally assigned to: @dhiltgen on GitHub.

Steps to reproduce:

Run a Docker container using ollama/ollama:rocm on a machine with a single MI300X
Inside the container, run ollama run llama3.1:70B

Actual behaviour:

rocBLAS error: Could not initialize Tensile host: No devices found

The full output:

ollama serve &
[1] 649
[root@f4425b1a0236 workflow]# Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
Your new public key is: 

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIHmumM0c/iN0gZ9aPo99pq6QfzU+7AuA4V3/z933kCjK

2024/08/19 16:42:26 routes.go:1123: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-08-19T16:42:26.947Z level=INFO source=images.go:782 msg="total blobs: 0"
time=2024-08-19T16:42:26.948Z level=INFO source=images.go:790 msg="total unused blobs removed: 0"
time=2024-08-19T16:42:26.948Z level=INFO source=routes.go:1170 msg="Listening on [::]:11434 (version 0.3.5)"
time=2024-08-19T16:42:26.949Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama307827265/runners
time=2024-08-19T16:42:30.581Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60102]"
time=2024-08-19T16:42:30.581Z level=INFO source=gpu.go:204 msg="looking for compatible GPUs"
time=2024-08-19T16:42:30.590Z level=WARN source=amd_linux.go:201 msg="amdgpu too old gfx000" gpu=0
time=2024-08-19T16:42:30.590Z level=WARN source=amd_linux.go:201 msg="amdgpu too old gfx000" gpu=1
time=2024-08-19T16:42:30.590Z level=WARN source=amd_linux.go:201 msg="amdgpu too old gfx000" gpu=2
time=2024-08-19T16:42:30.603Z level=INFO source=amd_linux.go:345 msg="amdgpu is supported" gpu=3 gpu_type=gfx942
time=2024-08-19T16:42:30.603Z level=WARN source=amd_linux.go:201 msg="amdgpu too old gfx000" gpu=4
time=2024-08-19T16:42:30.603Z level=WARN source=amd_linux.go:201 msg="amdgpu too old gfx000" gpu=5
time=2024-08-19T16:42:30.603Z level=WARN source=amd_linux.go:201 msg="amdgpu too old gfx000" gpu=6
time=2024-08-19T16:42:30.603Z level=WARN source=amd_linux.go:201 msg="amdgpu too old gfx000" gpu=7
time=2024-08-19T16:42:30.603Z level=INFO source=types.go:105 msg="inference compute" id=3 library=rocm compute=gfx942 driver=6.7 name=1002:74a1 total="192.0 GiB" available="191.7 GiB"

[root@f4425b1a0236 workflow]# 
[root@f4425b1a0236 workflow]# ollama pull llama3.1:70b
[GIN] 2024/08/19 - 16:42:37 | 200 |     129.844µs |       127.0.0.1 | HEAD     "/"
pulling manifest ⠇ time=2024-08-19T16:42:39.572Z level=INFO source=download.go:175 msg="downloading a677b4a4b70c in 65 624 MB part(s)"
pulling manifest 
pulling a677b4a4b70c...  58% ▕████████████████████████████████████████████████████                                      ▏  23 GB/ 39 GB  465 MB/s     35st
pulling manifest 
pulling a677b4a4b70c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏  39 GB                         t
pulling manifest 
pulling a677b4a4b70c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏  39 GB                         
pulling manifest 
pulling a677b4a4b70c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏  39 GB                         
pulling manifest 
pulling a677b4a4b70c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏  39 GB                         
pulling manifest 
pulling manifest 
pulling a677b4a4b70c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏  39 GB                         
pulling 11ce4ee3e170... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏ 1.7 KB                         
pulling 0ba8f0e314b4... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏  12 KB                         
pulling 56bb8bd477a5... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏   96 B                         
pulling 654440dac7f3... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏  486 B                         
verifying sha256 digest 
writing manifest 
removing any unused layers 
success

ollama run llama3.1:70b
[GIN] 2024/08/19 - 16:45:03 | 200 |      37.636µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/08/19 - 16:45:03 | 200 |   33.789282ms |       127.0.0.1 | POST     "/api/show"
time=2024-08-19T16:45:03.649Z level=INFO source=sched.go:710 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574 gpu=3 parallel=4 available=205843886080 required="41.2 GiB"
time=2024-08-19T16:45:03.650Z level=INFO source=memory.go:309 msg="offload to rocm" layers.requested=-1 layers.model=81 layers.offload=81 layers.split="" memory.available="[191.7 GiB]" memory.required.full="41.2 GiB" memory.required.partial="41.2 GiB" memory.required.kv="2.5 GiB" memory.required.allocations="[41.2 GiB]" memory.weights.total="38.4 GiB" memory.weights.repeating="37.6 GiB" memory.weights.nonrepeating="822.0 MiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.1 GiB"
time=2024-08-19T16:45:03.665Z level=INFO source=server.go:393 msg="starting llama server" cmd="/tmp/ollama307827265/runners/rocm_v60102/ollama_llama_server --model /root/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574 --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 81 --numa distribute --parallel 4 --port 37363"
time=2024-08-19T16:45:03.665Z level=INFO source=sched.go:445 msg="loaded runners" count=1
time=2024-08-19T16:45:03.665Z level=INFO source=server.go:593 msg="waiting for llama runner to start responding"
time=2024-08-19T16:45:03.665Z level=INFO source=server.go:627 msg="waiting for server to become available" status="llm server error"
⠹ WARNING: /proc/sys/kernel/numa_balancing is enabled, this has been observed to impair performance
INFO [main] build info | build=1 commit="1e6f655" tid="138631197918016" timestamp=1724085903
INFO [main] system info | n_threads=96 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="138631197918016" timestamp=1724085903 total_threads=192
INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="191" port="37363" tid="138631197918016" timestamp=1724085903
⠸ time=2024-08-19T16:45:03.917Z level=INFO source=server.go:627 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: loaded meta data with 29 key-value pairs and 724 tensors from /root/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Meta Llama 3.1 70B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Meta-Llama-3.1
llama_model_loader: - kv   5:                         general.size_label str              = 70B
llama_model_loader: - kv   6:                            general.license str              = llama3.1
llama_model_loader: - kv   7:                               general.tags arr[str,6]       = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv   8:                          general.languages arr[str,8]       = ["en", "de", "fr", "it", "pt", "hi", ...
llama_model_loader: - kv   9:                          llama.block_count u32              = 80
llama_model_loader: - kv  10:                       llama.context_length u32              = 131072
llama_model_loader: - kv  11:                     llama.embedding_length u32              = 8192
llama_model_loader: - kv  12:                  llama.feed_forward_length u32              = 28672
llama_model_loader: - kv  13:                 llama.attention.head_count u32              = 64
llama_model_loader: - kv  14:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  15:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  16:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  17:                          general.file_type u32              = 2
llama_model_loader: - kv  18:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  19:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  20:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  21:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  22:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  23:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  24:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  25:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  26:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  27:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv  28:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  162 tensors
llama_model_loader: - type q4_0:  561 tensors
llama_model_loader: - type q6_K:    1 tensors
⠦ llm_load_vocab: special tokens cache size = 256
⠧ llm_load_vocab: token to piece cache size = 0.7999 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 131072
llm_load_print_meta: n_embd           = 8192
llm_load_print_meta: n_layer          = 80
llm_load_print_meta: n_head           = 64
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 8
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 28672
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 131072
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 70B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 70.55 B
llm_load_print_meta: model size       = 37.22 GiB (4.53 BPW) 
llm_load_print_meta: general.name     = Meta Llama 3.1 70B Instruct
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128009 '<|eot_id|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'
llm_load_print_meta: max token length = 256
⠇ 
rocBLAS error: Could not initialize Tensile host: No devices found

Originally created by @peterschmidt85 on GitHub (Aug 19, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6423 Originally assigned to: @dhiltgen on GitHub. **Steps to reproduce:** 1. Run a Docker container using `ollama/ollama:rocm` on a machine with a single MI300X 2. Inside the container, run `ollama run llama3.1:70B` **Actual behaviour:** ``` rocBLAS error: Could not initialize Tensile host: No devices found ``` The full output: ``` ollama serve & [1] 649 [root@f4425b1a0236 workflow]# Couldn't find '/root/.ollama/id_ed25519'. Generating new private key. Your new public key is: ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIHmumM0c/iN0gZ9aPo99pq6QfzU+7AuA4V3/z933kCjK 2024/08/19 16:42:26 routes.go:1123: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]" time=2024-08-19T16:42:26.947Z level=INFO source=images.go:782 msg="total blobs: 0" time=2024-08-19T16:42:26.948Z level=INFO source=images.go:790 msg="total unused blobs removed: 0" time=2024-08-19T16:42:26.948Z level=INFO source=routes.go:1170 msg="Listening on [::]:11434 (version 0.3.5)" time=2024-08-19T16:42:26.949Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama307827265/runners time=2024-08-19T16:42:30.581Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60102]" time=2024-08-19T16:42:30.581Z level=INFO source=gpu.go:204 msg="looking for compatible GPUs" time=2024-08-19T16:42:30.590Z level=WARN source=amd_linux.go:201 msg="amdgpu too old gfx000" gpu=0 time=2024-08-19T16:42:30.590Z level=WARN source=amd_linux.go:201 msg="amdgpu too old gfx000" gpu=1 time=2024-08-19T16:42:30.590Z level=WARN source=amd_linux.go:201 msg="amdgpu too old gfx000" gpu=2 time=2024-08-19T16:42:30.603Z level=INFO source=amd_linux.go:345 msg="amdgpu is supported" gpu=3 gpu_type=gfx942 time=2024-08-19T16:42:30.603Z level=WARN source=amd_linux.go:201 msg="amdgpu too old gfx000" gpu=4 time=2024-08-19T16:42:30.603Z level=WARN source=amd_linux.go:201 msg="amdgpu too old gfx000" gpu=5 time=2024-08-19T16:42:30.603Z level=WARN source=amd_linux.go:201 msg="amdgpu too old gfx000" gpu=6 time=2024-08-19T16:42:30.603Z level=WARN source=amd_linux.go:201 msg="amdgpu too old gfx000" gpu=7 time=2024-08-19T16:42:30.603Z level=INFO source=types.go:105 msg="inference compute" id=3 library=rocm compute=gfx942 driver=6.7 name=1002:74a1 total="192.0 GiB" available="191.7 GiB" [root@f4425b1a0236 workflow]# [root@f4425b1a0236 workflow]# ollama pull llama3.1:70b [GIN] 2024/08/19 - 16:42:37 | 200 | 129.844µs | 127.0.0.1 | HEAD "/" pulling manifest ⠇ time=2024-08-19T16:42:39.572Z level=INFO source=download.go:175 msg="downloading a677b4a4b70c in 65 624 MB part(s)" pulling manifest pulling a677b4a4b70c... 58% ▕████████████████████████████████████████████████████ ▏ 23 GB/ 39 GB 465 MB/s 35st pulling manifest pulling a677b4a4b70c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏ 39 GB t pulling manifest pulling a677b4a4b70c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏ 39 GB pulling manifest pulling a677b4a4b70c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏ 39 GB pulling manifest pulling a677b4a4b70c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏ 39 GB pulling manifest pulling manifest pulling a677b4a4b70c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏ 39 GB pulling 11ce4ee3e170... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏ 1.7 KB pulling 0ba8f0e314b4... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏ 12 KB pulling 56bb8bd477a5... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏ 96 B pulling 654440dac7f3... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏ 486 B verifying sha256 digest writing manifest removing any unused layers success ``` ``` ollama run llama3.1:70b [GIN] 2024/08/19 - 16:45:03 | 200 | 37.636µs | 127.0.0.1 | HEAD "/" [GIN] 2024/08/19 - 16:45:03 | 200 | 33.789282ms | 127.0.0.1 | POST "/api/show" time=2024-08-19T16:45:03.649Z level=INFO source=sched.go:710 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574 gpu=3 parallel=4 available=205843886080 required="41.2 GiB" time=2024-08-19T16:45:03.650Z level=INFO source=memory.go:309 msg="offload to rocm" layers.requested=-1 layers.model=81 layers.offload=81 layers.split="" memory.available="[191.7 GiB]" memory.required.full="41.2 GiB" memory.required.partial="41.2 GiB" memory.required.kv="2.5 GiB" memory.required.allocations="[41.2 GiB]" memory.weights.total="38.4 GiB" memory.weights.repeating="37.6 GiB" memory.weights.nonrepeating="822.0 MiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.1 GiB" time=2024-08-19T16:45:03.665Z level=INFO source=server.go:393 msg="starting llama server" cmd="/tmp/ollama307827265/runners/rocm_v60102/ollama_llama_server --model /root/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574 --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 81 --numa distribute --parallel 4 --port 37363" time=2024-08-19T16:45:03.665Z level=INFO source=sched.go:445 msg="loaded runners" count=1 time=2024-08-19T16:45:03.665Z level=INFO source=server.go:593 msg="waiting for llama runner to start responding" time=2024-08-19T16:45:03.665Z level=INFO source=server.go:627 msg="waiting for server to become available" status="llm server error" ⠹ WARNING: /proc/sys/kernel/numa_balancing is enabled, this has been observed to impair performance INFO [main] build info | build=1 commit="1e6f655" tid="138631197918016" timestamp=1724085903 INFO [main] system info | n_threads=96 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="138631197918016" timestamp=1724085903 total_threads=192 INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="191" port="37363" tid="138631197918016" timestamp=1724085903 ⠸ time=2024-08-19T16:45:03.917Z level=INFO source=server.go:627 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: loaded meta data with 29 key-value pairs and 724 tensors from /root/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Meta Llama 3.1 70B Instruct llama_model_loader: - kv 3: general.finetune str = Instruct llama_model_loader: - kv 4: general.basename str = Meta-Llama-3.1 llama_model_loader: - kv 5: general.size_label str = 70B llama_model_loader: - kv 6: general.license str = llama3.1 llama_model_loader: - kv 7: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam... llama_model_loader: - kv 8: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ... llama_model_loader: - kv 9: llama.block_count u32 = 80 llama_model_loader: - kv 10: llama.context_length u32 = 131072 llama_model_loader: - kv 11: llama.embedding_length u32 = 8192 llama_model_loader: - kv 12: llama.feed_forward_length u32 = 28672 llama_model_loader: - kv 13: llama.attention.head_count u32 = 64 llama_model_loader: - kv 14: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 15: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 16: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 17: general.file_type u32 = 2 llama_model_loader: - kv 18: llama.vocab_size u32 = 128256 llama_model_loader: - kv 19: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 20: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 21: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 24: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 25: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 27: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... llama_model_loader: - kv 28: general.quantization_version u32 = 2 llama_model_loader: - type f32: 162 tensors llama_model_loader: - type q4_0: 561 tensors llama_model_loader: - type q6_K: 1 tensors ⠦ llm_load_vocab: special tokens cache size = 256 ⠧ llm_load_vocab: token to piece cache size = 0.7999 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 128256 llm_load_print_meta: n_merges = 280147 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 131072 llm_load_print_meta: n_embd = 8192 llm_load_print_meta: n_layer = 80 llm_load_print_meta: n_head = 64 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 8 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 28672 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 500000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 131072 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: model type = 70B llm_load_print_meta: model ftype = Q4_0 llm_load_print_meta: model params = 70.55 B llm_load_print_meta: model size = 37.22 GiB (4.53 BPW) llm_load_print_meta: general.name = Meta Llama 3.1 70B Instruct llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' llm_load_print_meta: EOS token = 128009 '<|eot_id|>' llm_load_print_meta: LF token = 128 'Ä' llm_load_print_meta: EOT token = 128009 '<|eot_id|>' llm_load_print_meta: max token length = 256 ⠇ rocBLAS error: Could not initialize Tensile host: No devices found ```

GiteaMirror added the docker bug amd labels 2026-05-03 23:51:52 -05:00

GiteaMirror closed this issue

2026-05-03 23:51:54 -05:00

GiteaMirror commented

2026-05-03 23:51:57 -05:00

@rick-github commented on GitHub (Aug 19, 2024):

What's the docker command you are using to start the container?

@rick-github commented on GitHub (Aug 19, 2024): What's the docker command you are using to start the container?

GiteaMirror commented

2026-05-03 23:51:58 -05:00

@peterschmidt85 commented on GitHub (Aug 19, 2024):

What's the docker command you are using to start the container?

@rick-github I was running it via dstack's integration with RunPod. Basically, RunPod runs the container.
With HF's TGI it works perfectly but not with Ollama.

@peterschmidt85 commented on GitHub (Aug 19, 2024): > What's the docker command you are using to start the container? @rick-github I was running it via dstack's integration with RunPod. Basically, RunPod runs the container. With HF's TGI it works perfectly but not with Ollama.

GiteaMirror commented

2026-05-03 23:52:00 -05:00

@rick-github commented on GitHub (Aug 19, 2024):

Can you see the parameters that the container is started with? For example, does it have --device /dev/kfd --device /dev/dri flags? Without these, ollama can't access the GPU.

@rick-github commented on GitHub (Aug 19, 2024): Can you see the parameters that the container is started with? For example, does it have `--device /dev/kfd --device /dev/dri` flags? Without these, ollama can't access the GPU.

GiteaMirror commented

2026-05-03 23:52:03 -05:00

@peterschmidt85 commented on GitHub (Aug 19, 2024):

Can you see the parameters that the container is started with? For example, does it have --device /dev/kfd --device /dev/dri flags? Without these, ollama can't access the GPU.

100% it mounts the device. You can even see this in the logs of the ollama serve:

time=2024-08-19T16:42:30.603Z level=INFO source=types.go:105 msg="inference compute" id=3 library=rocm compute=gfx942 driver=6.7 name=1002:74a1 total="192.0 GiB" available="191.7 GiB"

@peterschmidt85 commented on GitHub (Aug 19, 2024): > Can you see the parameters that the container is started with? For example, does it have `--device /dev/kfd --device /dev/dri` flags? Without these, ollama can't access the GPU. 100% it mounts the device. You can even see this in the logs of the `ollama serve`: ``` time=2024-08-19T16:42:30.603Z level=INFO source=types.go:105 msg="inference compute" id=3 library=rocm compute=gfx942 driver=6.7 name=1002:74a1 total="192.0 GiB" available="191.7 GiB" ```

GiteaMirror commented

2026-05-03 23:52:05 -05:00

@dhiltgen commented on GitHub (Aug 19, 2024):

From the logs, it looks like the amdgpu driver is enumerating 8 GPUs in sysfs, and GPU 3 is the correct one. My suspicion is something may be getting mixed up in the GPU selection, and then ROCm is trying to connect to one of the incorrect GPUs. Running with -e OLLAMA_DEBUG=1 may shed some more light, or you can also experiment with setting HIP_VISIBLE_DEVICES to various values (I'd start with 3) and see if that yields a working setup.

@dhiltgen commented on GitHub (Aug 19, 2024): From the logs, it looks like the amdgpu driver is enumerating 8 GPUs in sysfs, and GPU 3 is the correct one. My suspicion is something may be getting mixed up in the GPU selection, and then ROCm is trying to connect to one of the incorrect GPUs. Running with `-e OLLAMA_DEBUG=1` may shed some more light, or you can also experiment with setting `HIP_VISIBLE_DEVICES` to various values (I'd start with 3) and see if that yields a working setup.

GiteaMirror commented

2026-05-03 23:52:08 -05:00

@dhiltgen commented on GitHub (Sep 3, 2024):

If you're still having trouble, please share a debug log and I'll reopen.

@dhiltgen commented on GitHub (Sep 3, 2024): If you're still having trouble, please share a debug log and I'll reopen.

GiteaMirror commented

2026-05-03 23:52:11 -05:00

@rtaic-coder commented on GitHub (Sep 7, 2024):

I have same issue. Tried to add render and video to ubuntu user. But still get the same error.
After running the container. docker logs shows:

2024/09/07 03:15:21 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-09-07T03:15:21.232Z level=INFO source=images.go:753 msg="total blobs: 22"
time=2024-09-07T03:15:21.232Z level=INFO source=images.go:760 msg="total unused blobs removed: 0"
time=2024-09-07T03:15:21.233Z level=INFO source=routes.go:1172 msg="Listening on [::]:11434 (version 0.3.9-11-g037a4d1)"
time=2024-09-07T03:15:21.233Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3272431602/runners
time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/libggml.so.gz
time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/libllama.so.gz
time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz
time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/libggml.so.gz
time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/libllama.so.gz
time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz
time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/libggml.so.gz
time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/libllama.so.gz
time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz
time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60200 file=build/linux/x86_64/rocm_v60200/bin/libggml.so.gz
time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60200 file=build/linux/x86_64/rocm_v60200/bin/libllama.so.gz
time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60200 file=build/linux/x86_64/rocm_v60200/bin/ollama_llama_server.gz
time=2024-09-07T03:15:22.674Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu/ollama_llama_server
time=2024-09-07T03:15:22.674Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu_avx/ollama_llama_server
time=2024-09-07T03:15:22.674Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu_avx2/ollama_llama_server
time=2024-09-07T03:15:22.674Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/rocm_v60200/ollama_llama_server
time=2024-09-07T03:15:22.674Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx2 rocm_v60200 cpu cpu_avx]"
time=2024-09-07T03:15:22.674Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-09-07T03:15:22.674Z level=DEBUG source=sched.go:105 msg="starting llm scheduler"
time=2024-09-07T03:15:22.674Z level=INFO source=gpu.go:200 msg="looking for compatible GPUs"
time=2024-09-07T03:15:22.674Z level=DEBUG source=gpu.go:86 msg="searching for GPU discovery libraries for NVIDIA"
time=2024-09-07T03:15:22.674Z level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcuda.so*
time=2024-09-07T03:15:22.674Z level=DEBUG source=gpu.go:491 msg="gpu library search" globs="[/usr/lib/ollama/libcuda.so* /lib/ollama/libcuda.so* /opt/rocm/lib/libcuda.so* /opt/amdgpu/lib/x86_64-linux-gnu/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-09-07T03:15:22.675Z level=DEBUG source=gpu.go:525 msg="discovered GPU libraries" paths=[]
time=2024-09-07T03:15:22.675Z level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcudart.so*
time=2024-09-07T03:15:22.675Z level=DEBUG source=gpu.go:491 msg="gpu library search" globs="[/usr/lib/ollama/libcudart.so* /lib/ollama/libcudart.so* /opt/rocm/lib/libcudart.so* /opt/amdgpu/lib/x86_64-linux-gnu/libcudart.so* /tmp/ollama3272431602/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-09-07T03:15:22.675Z level=DEBUG source=gpu.go:525 msg="discovered GPU libraries" paths=[]
time=2024-09-07T03:15:22.675Z level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-09-07T03:15:22.675Z level=DEBUG source=amd_linux.go:127 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-09-07T03:15:22.675Z level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-09-07T03:15:22.675Z level=DEBUG source=amd_linux.go:217 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=29772 unique_id=4579446943467448766
time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_linux.go:251 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card1/device
time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_linux.go:283 msg="amdgpu memory" gpu=0 total="24.0 GiB"
time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_linux.go:284 msg="amdgpu memory" gpu=0 available="24.0 GiB"
time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:18 msg="evaluating potential rocm lib dir /usr/lib/ollama"
time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:21 msg="Checking glob pattern: /usr/lib/ollama/libhipblas.so.2*"
time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:21 msg="Checking glob pattern: /usr/lib/ollama/rocblas"
time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:18 msg="evaluating potential rocm lib dir /lib/ollama"
time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:21 msg="Checking glob pattern: /lib/ollama/libhipblas.so.2*"
time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:21 msg="Checking glob pattern: /lib/ollama/rocblas"
time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:18 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:21 msg="Checking glob pattern: /opt/rocm/lib/libhipblas.so.2*"
time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:21 msg="Checking glob pattern: /opt/rocm/lib/rocblas"
time=2024-09-07T03:15:22.677Z level=DEBUG source=amd_linux.go:336 msg="rocm supported GPUs" types="[gfx1010 gfx1012 gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]"
time=2024-09-07T03:15:22.677Z level=INFO source=amd_linux.go:345 msg="amdgpu is supported" gpu=0 gpu_type=gfx1100
time=2024-09-07T03:15:22.677Z level=INFO source=types.go:107 msg="inference compute" id=0 library=rocm variant="" compute=gfx1100 driver=6.8 name=1002:744c total="24.0 GiB" available="24.0 GiB"

Then after running the llama model, we get this: Error: llama runner process has terminated: error:Could not initialize Tensile host: No devices found

time=2024-09-07T03:15:22.677Z level=INFO source=amd_linux.go:345 msg="amdgpu is supported" gpu=0 gpu_type=gfx1100
time=2024-09-07T03:15:22.677Z level=INFO source=types.go:107 msg="inference compute" id=0 library=rocm variant="" compute=gfx1100 driver=6.8 name=1002:744c total="24.0 GiB" available="24.0 GiB"
[GIN] 2024/09/07 - 03:18:40 | 200 |        23.3µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/09/07 - 03:18:40 | 200 |   12.174247ms |       127.0.0.1 | POST     "/api/show"
time=2024-09-07T03:18:40.103Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.8 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:40.103Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:40.103Z level=DEBUG source=sched.go:181 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=0x8263a0 gpu_count=1
time=2024-09-07T03:18:40.122Z level=DEBUG source=sched.go:224 msg="loading first model" model=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe
time=2024-09-07T03:18:40.122Z level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[24.0 GiB]"
time=2024-09-07T03:18:40.123Z level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe gpu=0 parallel=4 available=25716850688 required="6.2 GiB"
time=2024-09-07T03:18:40.123Z level=INFO source=server.go:101 msg="system memory" total="62.4 GiB" free="58.8 GiB" free_swap="8.0 GiB"
time=2024-09-07T03:18:40.123Z level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[24.0 GiB]"
time=2024-09-07T03:18:40.123Z level=INFO source=memory.go:326 msg="offload to rocm" layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[24.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="6.2 GiB" memory.required.partial="6.2 GiB" memory.required.kv="1.0 GiB" memory.required.allocations="[6.2 GiB]" memory.weights.total="4.7 GiB" memory.weights.repeating="4.3 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="560.0 MiB" memory.graph.partial="677.5 MiB"
time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu/ollama_llama_server
time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu_avx/ollama_llama_server
time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu_avx2/ollama_llama_server
time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/rocm_v60200/ollama_llama_server
time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu/ollama_llama_server
time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu_avx/ollama_llama_server
time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu_avx2/ollama_llama_server
time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/rocm_v60200/ollama_llama_server
time=2024-09-07T03:18:40.125Z level=INFO source=server.go:391 msg="starting llama server" cmd="/tmp/ollama3272431602/runners/rocm_v60200/ollama_llama_server --model /root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 33 --verbose --parallel 4 --port 42295"
time=2024-09-07T03:18:40.125Z level=DEBUG source=server.go:408 msg=subprocess environment="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin LD_LIBRARY_PATH=/opt/rocm/lib:/tmp/ollama3272431602/runners/rocm_v60200:/lib/ollama:/opt/rocm/lib:/opt/amdgpu/lib/x86_64-linux-gnu HIP_VISIBLE_DEVICES=0]"
time=2024-09-07T03:18:40.125Z level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2024-09-07T03:18:40.125Z level=INFO source=server.go:590 msg="waiting for llama runner to start responding"
time=2024-09-07T03:18:40.125Z level=INFO source=server.go:624 msg="waiting for server to become available" status="llm server error"
INFO [main] build info | build=1 commit="8962422b" tid="134934456185664" timestamp=1725679120
INFO [main] system info | n_threads=16 n_threads_batch=16 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="134934456185664" timestamp=1725679120 total_threads=32
INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="31" port="42295" tid="134934456185664" timestamp=1725679120
llama_model_loader: loaded meta data with 29 key-value pairs and 292 tensors from /root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Meta Llama 3.1 8B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Meta-Llama-3.1
llama_model_loader: - kv   5:                         general.size_label str              = 8B
llama_model_loader: - kv   6:                            general.license str              = llama3.1
llama_model_loader: - kv   7:                               general.tags arr[str,6]       = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv   8:                          general.languages arr[str,8]       = ["en", "de", "fr", "it", "pt", "hi", ...
llama_model_loader: - kv   9:                          llama.block_count u32              = 32
llama_model_loader: - kv  10:                       llama.context_length u32              = 131072
llama_model_loader: - kv  11:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv  12:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv  13:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv  14:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  15:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  16:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  17:                          general.file_type u32              = 2
llama_model_loader: - kv  18:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  19:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  20:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  21:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  22:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  23:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  24:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  25:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  26:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  27:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv  28:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   66 tensors
llama_model_loader: - type q4_0:  225 tensors
llama_model_loader: - type q6_K:    1 tensors
llm_load_vocab: special tokens cache size = 256
llm_load_vocab: token to piece cache size = 0.7999 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 131072
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 14336
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 131072
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: ssm_dt_b_c_rms   = 0
llm_load_print_meta: model type       = 8B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 8.03 B
llm_load_print_meta: model size       = 4.33 GiB (4.64 BPW) 
llm_load_print_meta: general.name     = Meta Llama 3.1 8B Instruct
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128009 '<|eot_id|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'
llm_load_print_meta: max token length = 256

rocBLAS error: Could not initialize Tensile host: No devices found
time=2024-09-07T03:18:40.431Z level=DEBUG source=server.go:431 msg="llama runner terminated" error="signal: aborted (core dumped)"
time=2024-09-07T03:18:40.681Z level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: error:Could not initialize Tensile host: No devices found"
time=2024-09-07T03:18:40.681Z level=DEBUG source=sched.go:459 msg="triggering expiration for failed load" model=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe
time=2024-09-07T03:18:40.681Z level=DEBUG source=sched.go:360 msg="runner expired event received" modelPath=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe
[GIN] 2024/09/07 - 03:18:40 | 500 |  589.158933ms |       127.0.0.1 | POST     "/api/chat"
time=2024-09-07T03:18:40.681Z level=DEBUG source=sched.go:376 msg="got lock to unload" modelPath=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe
time=2024-09-07T03:18:40.681Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.8 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:40.682Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:40.682Z level=DEBUG source=server.go:1047 msg="stopping llama server"
time=2024-09-07T03:18:40.682Z level=DEBUG source=sched.go:381 msg="runner released" modelPath=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe
time=2024-09-07T03:18:40.933Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:40.933Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:41.182Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:41.182Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:41.432Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:41.432Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:41.683Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:41.683Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:41.932Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:41.932Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:42.182Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:42.182Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:42.433Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:42.433Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:42.682Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:42.682Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:42.932Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:42.932Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:43.183Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:43.183Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:43.432Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:43.432Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:43.682Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:43.682Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:43.932Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:43.933Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:44.182Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:44.182Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:44.432Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:44.432Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:44.683Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:44.683Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:44.932Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:44.932Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:45.182Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:45.182Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:45.432Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:45.433Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:45.682Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.000190324 model=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe
time=2024-09-07T03:18:45.682Z level=DEBUG source=sched.go:385 msg="sending an unloaded event" modelPath=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe
time=2024-09-07T03:18:45.682Z level=DEBUG source=sched.go:308 msg="ignoring unload event with no pending requests"
time=2024-09-07T03:18:45.682Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:45.682Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:45.932Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.250581314 model=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe
time=2024-09-07T03:18:45.932Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:45.932Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:46.182Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.500877044 model=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe

rocminfo:

ROCk module version 6.8.5 is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.14
Runtime Ext Version:     1.6
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 9 7950X3D 16-Core Processor
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 9 7950X3D 16-Core Processor
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   5759                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            32                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Memory Properties:       
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    65426568(0x3e65488) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    65426568(0x3e65488) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    65426568(0x3e65488) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1100                            
  Uuid:                    GPU-3f8d76bb6d2029be               
  Marketing Name:          Radeon RX 7900 XTX                 
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      32(0x20) KB                        
    L2:                      6144(0x1800) KB                    
    L3:                      98304(0x18000) KB                  
  Chip ID:                 29772(0x744c)                      
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2371                               
  BDFID:                   5888                               
  Internal Node ID:        1                                  
  Compute Unit:            96                                 
  SIMDs per CU:            2                                  
  Shader Engines:          6                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Memory Properties:       
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 232                                
  SDMA engine uCode::      21                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    25149440(0x17fc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    25149440(0x17fc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1100         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***

@rtaic-coder commented on GitHub (Sep 7, 2024): I have same issue. Tried to add render and video to ubuntu user. But still get the same error. After running the container. docker logs shows: ```log 2024/09/07 03:15:21 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]" time=2024-09-07T03:15:21.232Z level=INFO source=images.go:753 msg="total blobs: 22" time=2024-09-07T03:15:21.232Z level=INFO source=images.go:760 msg="total unused blobs removed: 0" time=2024-09-07T03:15:21.233Z level=INFO source=routes.go:1172 msg="Listening on [::]:11434 (version 0.3.9-11-g037a4d1)" time=2024-09-07T03:15:21.233Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3272431602/runners time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/libggml.so.gz time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/libllama.so.gz time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/libggml.so.gz time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/libllama.so.gz time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/libggml.so.gz time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/libllama.so.gz time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60200 file=build/linux/x86_64/rocm_v60200/bin/libggml.so.gz time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60200 file=build/linux/x86_64/rocm_v60200/bin/libllama.so.gz time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60200 file=build/linux/x86_64/rocm_v60200/bin/ollama_llama_server.gz time=2024-09-07T03:15:22.674Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu/ollama_llama_server time=2024-09-07T03:15:22.674Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu_avx/ollama_llama_server time=2024-09-07T03:15:22.674Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu_avx2/ollama_llama_server time=2024-09-07T03:15:22.674Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/rocm_v60200/ollama_llama_server time=2024-09-07T03:15:22.674Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx2 rocm_v60200 cpu cpu_avx]" time=2024-09-07T03:15:22.674Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" time=2024-09-07T03:15:22.674Z level=DEBUG source=sched.go:105 msg="starting llm scheduler" time=2024-09-07T03:15:22.674Z level=INFO source=gpu.go:200 msg="looking for compatible GPUs" time=2024-09-07T03:15:22.674Z level=DEBUG source=gpu.go:86 msg="searching for GPU discovery libraries for NVIDIA" time=2024-09-07T03:15:22.674Z level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcuda.so* time=2024-09-07T03:15:22.674Z level=DEBUG source=gpu.go:491 msg="gpu library search" globs="[/usr/lib/ollama/libcuda.so* /lib/ollama/libcuda.so* /opt/rocm/lib/libcuda.so* /opt/amdgpu/lib/x86_64-linux-gnu/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-09-07T03:15:22.675Z level=DEBUG source=gpu.go:525 msg="discovered GPU libraries" paths=[] time=2024-09-07T03:15:22.675Z level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcudart.so* time=2024-09-07T03:15:22.675Z level=DEBUG source=gpu.go:491 msg="gpu library search" globs="[/usr/lib/ollama/libcudart.so* /lib/ollama/libcudart.so* /opt/rocm/lib/libcudart.so* /opt/amdgpu/lib/x86_64-linux-gnu/libcudart.so* /tmp/ollama3272431602/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-09-07T03:15:22.675Z level=DEBUG source=gpu.go:525 msg="discovered GPU libraries" paths=[] time=2024-09-07T03:15:22.675Z level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-09-07T03:15:22.675Z level=DEBUG source=amd_linux.go:127 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-09-07T03:15:22.675Z level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-09-07T03:15:22.675Z level=DEBUG source=amd_linux.go:217 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=29772 unique_id=4579446943467448766 time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_linux.go:251 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card1/device time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_linux.go:283 msg="amdgpu memory" gpu=0 total="24.0 GiB" time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_linux.go:284 msg="amdgpu memory" gpu=0 available="24.0 GiB" time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:18 msg="evaluating potential rocm lib dir /usr/lib/ollama" time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:21 msg="Checking glob pattern: /usr/lib/ollama/libhipblas.so.2*" time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:21 msg="Checking glob pattern: /usr/lib/ollama/rocblas" time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:18 msg="evaluating potential rocm lib dir /lib/ollama" time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:21 msg="Checking glob pattern: /lib/ollama/libhipblas.so.2*" time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:21 msg="Checking glob pattern: /lib/ollama/rocblas" time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:18 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:21 msg="Checking glob pattern: /opt/rocm/lib/libhipblas.so.2*" time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:21 msg="Checking glob pattern: /opt/rocm/lib/rocblas" time=2024-09-07T03:15:22.677Z level=DEBUG source=amd_linux.go:336 msg="rocm supported GPUs" types="[gfx1010 gfx1012 gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]" time=2024-09-07T03:15:22.677Z level=INFO source=amd_linux.go:345 msg="amdgpu is supported" gpu=0 gpu_type=gfx1100 time=2024-09-07T03:15:22.677Z level=INFO source=types.go:107 msg="inference compute" id=0 library=rocm variant="" compute=gfx1100 driver=6.8 name=1002:744c total="24.0 GiB" available="24.0 GiB" ``` Then after running the llama model, we get this: _**Error: llama runner process has terminated: error:Could not initialize Tensile host: No devices found**_ ```log time=2024-09-07T03:15:22.677Z level=INFO source=amd_linux.go:345 msg="amdgpu is supported" gpu=0 gpu_type=gfx1100 time=2024-09-07T03:15:22.677Z level=INFO source=types.go:107 msg="inference compute" id=0 library=rocm variant="" compute=gfx1100 driver=6.8 name=1002:744c total="24.0 GiB" available="24.0 GiB" [GIN] 2024/09/07 - 03:18:40 | 200 | 23.3µs | 127.0.0.1 | HEAD "/" [GIN] 2024/09/07 - 03:18:40 | 200 | 12.174247ms | 127.0.0.1 | POST "/api/show" time=2024-09-07T03:18:40.103Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.8 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:40.103Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:40.103Z level=DEBUG source=sched.go:181 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=0x8263a0 gpu_count=1 time=2024-09-07T03:18:40.122Z level=DEBUG source=sched.go:224 msg="loading first model" model=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe time=2024-09-07T03:18:40.122Z level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[24.0 GiB]" time=2024-09-07T03:18:40.123Z level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe gpu=0 parallel=4 available=25716850688 required="6.2 GiB" time=2024-09-07T03:18:40.123Z level=INFO source=server.go:101 msg="system memory" total="62.4 GiB" free="58.8 GiB" free_swap="8.0 GiB" time=2024-09-07T03:18:40.123Z level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[24.0 GiB]" time=2024-09-07T03:18:40.123Z level=INFO source=memory.go:326 msg="offload to rocm" layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[24.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="6.2 GiB" memory.required.partial="6.2 GiB" memory.required.kv="1.0 GiB" memory.required.allocations="[6.2 GiB]" memory.weights.total="4.7 GiB" memory.weights.repeating="4.3 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="560.0 MiB" memory.graph.partial="677.5 MiB" time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu/ollama_llama_server time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu_avx/ollama_llama_server time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu_avx2/ollama_llama_server time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/rocm_v60200/ollama_llama_server time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu/ollama_llama_server time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu_avx/ollama_llama_server time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu_avx2/ollama_llama_server time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/rocm_v60200/ollama_llama_server time=2024-09-07T03:18:40.125Z level=INFO source=server.go:391 msg="starting llama server" cmd="/tmp/ollama3272431602/runners/rocm_v60200/ollama_llama_server --model /root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 33 --verbose --parallel 4 --port 42295" time=2024-09-07T03:18:40.125Z level=DEBUG source=server.go:408 msg=subprocess environment="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin LD_LIBRARY_PATH=/opt/rocm/lib:/tmp/ollama3272431602/runners/rocm_v60200:/lib/ollama:/opt/rocm/lib:/opt/amdgpu/lib/x86_64-linux-gnu HIP_VISIBLE_DEVICES=0]" time=2024-09-07T03:18:40.125Z level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2024-09-07T03:18:40.125Z level=INFO source=server.go:590 msg="waiting for llama runner to start responding" time=2024-09-07T03:18:40.125Z level=INFO source=server.go:624 msg="waiting for server to become available" status="llm server error" INFO [main] build info | build=1 commit="8962422b" tid="134934456185664" timestamp=1725679120 INFO [main] system info | n_threads=16 n_threads_batch=16 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="134934456185664" timestamp=1725679120 total_threads=32 INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="31" port="42295" tid="134934456185664" timestamp=1725679120 llama_model_loader: loaded meta data with 29 key-value pairs and 292 tensors from /root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Meta Llama 3.1 8B Instruct llama_model_loader: - kv 3: general.finetune str = Instruct llama_model_loader: - kv 4: general.basename str = Meta-Llama-3.1 llama_model_loader: - kv 5: general.size_label str = 8B llama_model_loader: - kv 6: general.license str = llama3.1 llama_model_loader: - kv 7: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam... llama_model_loader: - kv 8: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ... llama_model_loader: - kv 9: llama.block_count u32 = 32 llama_model_loader: - kv 10: llama.context_length u32 = 131072 llama_model_loader: - kv 11: llama.embedding_length u32 = 4096 llama_model_loader: - kv 12: llama.feed_forward_length u32 = 14336 llama_model_loader: - kv 13: llama.attention.head_count u32 = 32 llama_model_loader: - kv 14: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 15: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 16: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 17: general.file_type u32 = 2 llama_model_loader: - kv 18: llama.vocab_size u32 = 128256 llama_model_loader: - kv 19: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 20: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 21: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 24: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 25: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 27: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... llama_model_loader: - kv 28: general.quantization_version u32 = 2 llama_model_loader: - type f32: 66 tensors llama_model_loader: - type q4_0: 225 tensors llama_model_loader: - type q6_K: 1 tensors llm_load_vocab: special tokens cache size = 256 llm_load_vocab: token to piece cache size = 0.7999 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 128256 llm_load_print_meta: n_merges = 280147 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 131072 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 4 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 14336 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 500000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 131072 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: ssm_dt_b_c_rms = 0 llm_load_print_meta: model type = 8B llm_load_print_meta: model ftype = Q4_0 llm_load_print_meta: model params = 8.03 B llm_load_print_meta: model size = 4.33 GiB (4.64 BPW) llm_load_print_meta: general.name = Meta Llama 3.1 8B Instruct llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' llm_load_print_meta: EOS token = 128009 '<|eot_id|>' llm_load_print_meta: LF token = 128 'Ä' llm_load_print_meta: EOT token = 128009 '<|eot_id|>' llm_load_print_meta: max token length = 256 rocBLAS error: Could not initialize Tensile host: No devices found time=2024-09-07T03:18:40.431Z level=DEBUG source=server.go:431 msg="llama runner terminated" error="signal: aborted (core dumped)" time=2024-09-07T03:18:40.681Z level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: error:Could not initialize Tensile host: No devices found" time=2024-09-07T03:18:40.681Z level=DEBUG source=sched.go:459 msg="triggering expiration for failed load" model=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe time=2024-09-07T03:18:40.681Z level=DEBUG source=sched.go:360 msg="runner expired event received" modelPath=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe [GIN] 2024/09/07 - 03:18:40 | 500 | 589.158933ms | 127.0.0.1 | POST "/api/chat" time=2024-09-07T03:18:40.681Z level=DEBUG source=sched.go:376 msg="got lock to unload" modelPath=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe time=2024-09-07T03:18:40.681Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.8 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:40.682Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:40.682Z level=DEBUG source=server.go:1047 msg="stopping llama server" time=2024-09-07T03:18:40.682Z level=DEBUG source=sched.go:381 msg="runner released" modelPath=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe time=2024-09-07T03:18:40.933Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:40.933Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:41.182Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:41.182Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:41.432Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:41.432Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:41.683Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:41.683Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:41.932Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:41.932Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:42.182Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:42.182Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:42.433Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:42.433Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:42.682Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:42.682Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:42.932Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:42.932Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:43.183Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:43.183Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:43.432Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:43.432Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:43.682Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:43.682Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:43.932Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:43.933Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:44.182Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:44.182Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:44.432Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:44.432Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:44.683Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:44.683Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:44.932Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:44.932Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:45.182Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:45.182Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:45.432Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:45.433Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:45.682Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.000190324 model=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe time=2024-09-07T03:18:45.682Z level=DEBUG source=sched.go:385 msg="sending an unloaded event" modelPath=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe time=2024-09-07T03:18:45.682Z level=DEBUG source=sched.go:308 msg="ignoring unload event with no pending requests" time=2024-09-07T03:18:45.682Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:45.682Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:45.932Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.250581314 model=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe time=2024-09-07T03:18:45.932Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:45.932Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:46.182Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.500877044 model=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe ``` rocminfo: ``` ROCk module version 6.8.5 is loaded ===================== HSA System Attributes ===================== Runtime Version: 1.14 Runtime Ext Version: 1.6 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE Mwaitx: DISABLED DMAbuf Support: YES ========== HSA Agents ========== ******* Agent 1 ******* Name: AMD Ryzen 9 7950X3D 16-Core Processor Uuid: CPU-XX Marketing Name: AMD Ryzen 9 7950X3D 16-Core Processor Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 5759 BDFID: 0 Internal Node ID: 0 Compute Unit: 32 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Memory Properties: Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 65426568(0x3e65488) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 65426568(0x3e65488) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 65426568(0x3e65488) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: ******* Agent 2 ******* Name: gfx1100 Uuid: GPU-3f8d76bb6d2029be Marketing Name: Radeon RX 7900 XTX Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 32(0x20) KB L2: 6144(0x1800) KB L3: 98304(0x18000) KB Chip ID: 29772(0x744c) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 2371 BDFID: 5888 Internal Node ID: 1 Compute Unit: 96 SIMDs per CU: 2 Shader Engines: 6 Shader Arrs. per Eng.: 2 WatchPts on Addr. Ranges:4 Coherent Host Access: FALSE Memory Properties: Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 32(0x20) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 232 SDMA engine uCode:: 21 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 25149440(0x17fc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 25149440(0x17fc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Recommended Granule:0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx1100 Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 *** Done *** ```

GiteaMirror commented

2026-05-03 23:52:13 -05:00

@rtaic-coder commented on GitHub (Sep 8, 2024):

BTW, ollama runs in GPU fine if I run directly on the host. Ubuntu 24.04 rcom 6.2.0 installed.

@rtaic-coder commented on GitHub (Sep 8, 2024): BTW, ollama runs in GPU fine if I run directly on the host. Ubuntu 24.04 rcom 6.2.0 installed.

GiteaMirror commented

2026-05-03 23:52:15 -05:00

@dhiltgen commented on GitHub (Sep 10, 2024):

@rtaic-coder from your logs, it sounds like you're hitting the same failure mode as #6685 Lets track it under that issue.

@dhiltgen commented on GitHub (Sep 10, 2024): @rtaic-coder from your logs, it sounds like you're hitting the same failure mode as #6685 Lets track it under that issue.

Sign in to join this conversation.

Branches Tags

main

parth-mlx-decode-checkpoints

dhiltgen/ci

hoyyeva/editor-config-repair

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

hoyyeva/launch-backup-ux

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

brucemacd/download-before-remove

parth/update-claude-docs

parth-anthropic-reference-images-path

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#66076

[GH-ISSUE #6423] Running on MI300X via Docker fails with rocBLAS error: Could not initialize Tensile host: No devices found #66076

[GH-ISSUE #6423] Running on MI300X via Docker fails with `rocBLAS error: Could not initialize Tensile host: No devices found` #66076