[GH-ISSUE #6423] Running on MI300X via Docker fails with rocBLAS error: Could not initialize Tensile host: No devices found #66076

Closed
opened 2026-05-03 23:51:51 -05:00 by GiteaMirror · 9 comments
Owner

Originally created by @peterschmidt85 on GitHub (Aug 19, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6423

Originally assigned to: @dhiltgen on GitHub.

Steps to reproduce:

  1. Run a Docker container using ollama/ollama:rocm on a machine with a single MI300X
  2. Inside the container, run ollama run llama3.1:70B

Actual behaviour:

rocBLAS error: Could not initialize Tensile host: No devices found

The full output:

ollama serve &
[1] 649
[root@f4425b1a0236 workflow]# Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
Your new public key is: 

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIHmumM0c/iN0gZ9aPo99pq6QfzU+7AuA4V3/z933kCjK

2024/08/19 16:42:26 routes.go:1123: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-08-19T16:42:26.947Z level=INFO source=images.go:782 msg="total blobs: 0"
time=2024-08-19T16:42:26.948Z level=INFO source=images.go:790 msg="total unused blobs removed: 0"
time=2024-08-19T16:42:26.948Z level=INFO source=routes.go:1170 msg="Listening on [::]:11434 (version 0.3.5)"
time=2024-08-19T16:42:26.949Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama307827265/runners
time=2024-08-19T16:42:30.581Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60102]"
time=2024-08-19T16:42:30.581Z level=INFO source=gpu.go:204 msg="looking for compatible GPUs"
time=2024-08-19T16:42:30.590Z level=WARN source=amd_linux.go:201 msg="amdgpu too old gfx000" gpu=0
time=2024-08-19T16:42:30.590Z level=WARN source=amd_linux.go:201 msg="amdgpu too old gfx000" gpu=1
time=2024-08-19T16:42:30.590Z level=WARN source=amd_linux.go:201 msg="amdgpu too old gfx000" gpu=2
time=2024-08-19T16:42:30.603Z level=INFO source=amd_linux.go:345 msg="amdgpu is supported" gpu=3 gpu_type=gfx942
time=2024-08-19T16:42:30.603Z level=WARN source=amd_linux.go:201 msg="amdgpu too old gfx000" gpu=4
time=2024-08-19T16:42:30.603Z level=WARN source=amd_linux.go:201 msg="amdgpu too old gfx000" gpu=5
time=2024-08-19T16:42:30.603Z level=WARN source=amd_linux.go:201 msg="amdgpu too old gfx000" gpu=6
time=2024-08-19T16:42:30.603Z level=WARN source=amd_linux.go:201 msg="amdgpu too old gfx000" gpu=7
time=2024-08-19T16:42:30.603Z level=INFO source=types.go:105 msg="inference compute" id=3 library=rocm compute=gfx942 driver=6.7 name=1002:74a1 total="192.0 GiB" available="191.7 GiB"

[root@f4425b1a0236 workflow]# 
[root@f4425b1a0236 workflow]# ollama pull llama3.1:70b
[GIN] 2024/08/19 - 16:42:37 | 200 |     129.844µs |       127.0.0.1 | HEAD     "/"
pulling manifest ⠇ time=2024-08-19T16:42:39.572Z level=INFO source=download.go:175 msg="downloading a677b4a4b70c in 65 624 MB part(s)"
pulling manifest 
pulling a677b4a4b70c...  58% ▕████████████████████████████████████████████████████                                      ▏  23 GB/ 39 GB  465 MB/s     35st
pulling manifest 
pulling a677b4a4b70c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏  39 GB                         t
pulling manifest 
pulling a677b4a4b70c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏  39 GB                         
pulling manifest 
pulling a677b4a4b70c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏  39 GB                         
pulling manifest 
pulling a677b4a4b70c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏  39 GB                         
pulling manifest 
pulling manifest 
pulling a677b4a4b70c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏  39 GB                         
pulling 11ce4ee3e170... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏ 1.7 KB                         
pulling 0ba8f0e314b4... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏  12 KB                         
pulling 56bb8bd477a5... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏   96 B                         
pulling 654440dac7f3... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏  486 B                         
verifying sha256 digest 
writing manifest 
removing any unused layers 
success
ollama run llama3.1:70b
[GIN] 2024/08/19 - 16:45:03 | 200 |      37.636µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/08/19 - 16:45:03 | 200 |   33.789282ms |       127.0.0.1 | POST     "/api/show"
time=2024-08-19T16:45:03.649Z level=INFO source=sched.go:710 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574 gpu=3 parallel=4 available=205843886080 required="41.2 GiB"
time=2024-08-19T16:45:03.650Z level=INFO source=memory.go:309 msg="offload to rocm" layers.requested=-1 layers.model=81 layers.offload=81 layers.split="" memory.available="[191.7 GiB]" memory.required.full="41.2 GiB" memory.required.partial="41.2 GiB" memory.required.kv="2.5 GiB" memory.required.allocations="[41.2 GiB]" memory.weights.total="38.4 GiB" memory.weights.repeating="37.6 GiB" memory.weights.nonrepeating="822.0 MiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.1 GiB"
time=2024-08-19T16:45:03.665Z level=INFO source=server.go:393 msg="starting llama server" cmd="/tmp/ollama307827265/runners/rocm_v60102/ollama_llama_server --model /root/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574 --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 81 --numa distribute --parallel 4 --port 37363"
time=2024-08-19T16:45:03.665Z level=INFO source=sched.go:445 msg="loaded runners" count=1
time=2024-08-19T16:45:03.665Z level=INFO source=server.go:593 msg="waiting for llama runner to start responding"
time=2024-08-19T16:45:03.665Z level=INFO source=server.go:627 msg="waiting for server to become available" status="llm server error"
⠹ WARNING: /proc/sys/kernel/numa_balancing is enabled, this has been observed to impair performance
INFO [main] build info | build=1 commit="1e6f655" tid="138631197918016" timestamp=1724085903
INFO [main] system info | n_threads=96 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="138631197918016" timestamp=1724085903 total_threads=192
INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="191" port="37363" tid="138631197918016" timestamp=1724085903
⠸ time=2024-08-19T16:45:03.917Z level=INFO source=server.go:627 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: loaded meta data with 29 key-value pairs and 724 tensors from /root/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Meta Llama 3.1 70B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Meta-Llama-3.1
llama_model_loader: - kv   5:                         general.size_label str              = 70B
llama_model_loader: - kv   6:                            general.license str              = llama3.1
llama_model_loader: - kv   7:                               general.tags arr[str,6]       = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv   8:                          general.languages arr[str,8]       = ["en", "de", "fr", "it", "pt", "hi", ...
llama_model_loader: - kv   9:                          llama.block_count u32              = 80
llama_model_loader: - kv  10:                       llama.context_length u32              = 131072
llama_model_loader: - kv  11:                     llama.embedding_length u32              = 8192
llama_model_loader: - kv  12:                  llama.feed_forward_length u32              = 28672
llama_model_loader: - kv  13:                 llama.attention.head_count u32              = 64
llama_model_loader: - kv  14:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  15:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  16:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  17:                          general.file_type u32              = 2
llama_model_loader: - kv  18:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  19:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  20:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  21:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  22:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  23:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  24:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  25:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  26:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  27:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv  28:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  162 tensors
llama_model_loader: - type q4_0:  561 tensors
llama_model_loader: - type q6_K:    1 tensors
⠦ llm_load_vocab: special tokens cache size = 256
⠧ llm_load_vocab: token to piece cache size = 0.7999 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 131072
llm_load_print_meta: n_embd           = 8192
llm_load_print_meta: n_layer          = 80
llm_load_print_meta: n_head           = 64
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 8
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 28672
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 131072
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 70B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 70.55 B
llm_load_print_meta: model size       = 37.22 GiB (4.53 BPW) 
llm_load_print_meta: general.name     = Meta Llama 3.1 70B Instruct
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128009 '<|eot_id|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'
llm_load_print_meta: max token length = 256
⠇ 
rocBLAS error: Could not initialize Tensile host: No devices found
Originally created by @peterschmidt85 on GitHub (Aug 19, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6423 Originally assigned to: @dhiltgen on GitHub. **Steps to reproduce:** 1. Run a Docker container using `ollama/ollama:rocm` on a machine with a single MI300X 2. Inside the container, run `ollama run llama3.1:70B` **Actual behaviour:** ``` rocBLAS error: Could not initialize Tensile host: No devices found ``` The full output: ``` ollama serve & [1] 649 [root@f4425b1a0236 workflow]# Couldn't find '/root/.ollama/id_ed25519'. Generating new private key. Your new public key is: ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIHmumM0c/iN0gZ9aPo99pq6QfzU+7AuA4V3/z933kCjK 2024/08/19 16:42:26 routes.go:1123: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]" time=2024-08-19T16:42:26.947Z level=INFO source=images.go:782 msg="total blobs: 0" time=2024-08-19T16:42:26.948Z level=INFO source=images.go:790 msg="total unused blobs removed: 0" time=2024-08-19T16:42:26.948Z level=INFO source=routes.go:1170 msg="Listening on [::]:11434 (version 0.3.5)" time=2024-08-19T16:42:26.949Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama307827265/runners time=2024-08-19T16:42:30.581Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 cuda_v11 rocm_v60102]" time=2024-08-19T16:42:30.581Z level=INFO source=gpu.go:204 msg="looking for compatible GPUs" time=2024-08-19T16:42:30.590Z level=WARN source=amd_linux.go:201 msg="amdgpu too old gfx000" gpu=0 time=2024-08-19T16:42:30.590Z level=WARN source=amd_linux.go:201 msg="amdgpu too old gfx000" gpu=1 time=2024-08-19T16:42:30.590Z level=WARN source=amd_linux.go:201 msg="amdgpu too old gfx000" gpu=2 time=2024-08-19T16:42:30.603Z level=INFO source=amd_linux.go:345 msg="amdgpu is supported" gpu=3 gpu_type=gfx942 time=2024-08-19T16:42:30.603Z level=WARN source=amd_linux.go:201 msg="amdgpu too old gfx000" gpu=4 time=2024-08-19T16:42:30.603Z level=WARN source=amd_linux.go:201 msg="amdgpu too old gfx000" gpu=5 time=2024-08-19T16:42:30.603Z level=WARN source=amd_linux.go:201 msg="amdgpu too old gfx000" gpu=6 time=2024-08-19T16:42:30.603Z level=WARN source=amd_linux.go:201 msg="amdgpu too old gfx000" gpu=7 time=2024-08-19T16:42:30.603Z level=INFO source=types.go:105 msg="inference compute" id=3 library=rocm compute=gfx942 driver=6.7 name=1002:74a1 total="192.0 GiB" available="191.7 GiB" [root@f4425b1a0236 workflow]# [root@f4425b1a0236 workflow]# ollama pull llama3.1:70b [GIN] 2024/08/19 - 16:42:37 | 200 | 129.844µs | 127.0.0.1 | HEAD "/" pulling manifest ⠇ time=2024-08-19T16:42:39.572Z level=INFO source=download.go:175 msg="downloading a677b4a4b70c in 65 624 MB part(s)" pulling manifest pulling a677b4a4b70c... 58% ▕████████████████████████████████████████████████████ ▏ 23 GB/ 39 GB 465 MB/s 35st pulling manifest pulling a677b4a4b70c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏ 39 GB t pulling manifest pulling a677b4a4b70c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏ 39 GB pulling manifest pulling a677b4a4b70c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏ 39 GB pulling manifest pulling a677b4a4b70c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏ 39 GB pulling manifest pulling manifest pulling a677b4a4b70c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏ 39 GB pulling 11ce4ee3e170... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏ 1.7 KB pulling 0ba8f0e314b4... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏ 12 KB pulling 56bb8bd477a5... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏ 96 B pulling 654440dac7f3... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████▏ 486 B verifying sha256 digest writing manifest removing any unused layers success ``` ``` ollama run llama3.1:70b [GIN] 2024/08/19 - 16:45:03 | 200 | 37.636µs | 127.0.0.1 | HEAD "/" [GIN] 2024/08/19 - 16:45:03 | 200 | 33.789282ms | 127.0.0.1 | POST "/api/show" time=2024-08-19T16:45:03.649Z level=INFO source=sched.go:710 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574 gpu=3 parallel=4 available=205843886080 required="41.2 GiB" time=2024-08-19T16:45:03.650Z level=INFO source=memory.go:309 msg="offload to rocm" layers.requested=-1 layers.model=81 layers.offload=81 layers.split="" memory.available="[191.7 GiB]" memory.required.full="41.2 GiB" memory.required.partial="41.2 GiB" memory.required.kv="2.5 GiB" memory.required.allocations="[41.2 GiB]" memory.weights.total="38.4 GiB" memory.weights.repeating="37.6 GiB" memory.weights.nonrepeating="822.0 MiB" memory.graph.full="1.1 GiB" memory.graph.partial="1.1 GiB" time=2024-08-19T16:45:03.665Z level=INFO source=server.go:393 msg="starting llama server" cmd="/tmp/ollama307827265/runners/rocm_v60102/ollama_llama_server --model /root/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574 --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 81 --numa distribute --parallel 4 --port 37363" time=2024-08-19T16:45:03.665Z level=INFO source=sched.go:445 msg="loaded runners" count=1 time=2024-08-19T16:45:03.665Z level=INFO source=server.go:593 msg="waiting for llama runner to start responding" time=2024-08-19T16:45:03.665Z level=INFO source=server.go:627 msg="waiting for server to become available" status="llm server error" ⠹ WARNING: /proc/sys/kernel/numa_balancing is enabled, this has been observed to impair performance INFO [main] build info | build=1 commit="1e6f655" tid="138631197918016" timestamp=1724085903 INFO [main] system info | n_threads=96 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="138631197918016" timestamp=1724085903 total_threads=192 INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="191" port="37363" tid="138631197918016" timestamp=1724085903 ⠸ time=2024-08-19T16:45:03.917Z level=INFO source=server.go:627 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: loaded meta data with 29 key-value pairs and 724 tensors from /root/.ollama/models/blobs/sha256-a677b4a4b70c45e702b1d600f7905e367733c53898b8be60e3f29272cf334574 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Meta Llama 3.1 70B Instruct llama_model_loader: - kv 3: general.finetune str = Instruct llama_model_loader: - kv 4: general.basename str = Meta-Llama-3.1 llama_model_loader: - kv 5: general.size_label str = 70B llama_model_loader: - kv 6: general.license str = llama3.1 llama_model_loader: - kv 7: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam... llama_model_loader: - kv 8: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ... llama_model_loader: - kv 9: llama.block_count u32 = 80 llama_model_loader: - kv 10: llama.context_length u32 = 131072 llama_model_loader: - kv 11: llama.embedding_length u32 = 8192 llama_model_loader: - kv 12: llama.feed_forward_length u32 = 28672 llama_model_loader: - kv 13: llama.attention.head_count u32 = 64 llama_model_loader: - kv 14: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 15: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 16: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 17: general.file_type u32 = 2 llama_model_loader: - kv 18: llama.vocab_size u32 = 128256 llama_model_loader: - kv 19: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 20: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 21: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 24: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 25: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 27: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... llama_model_loader: - kv 28: general.quantization_version u32 = 2 llama_model_loader: - type f32: 162 tensors llama_model_loader: - type q4_0: 561 tensors llama_model_loader: - type q6_K: 1 tensors ⠦ llm_load_vocab: special tokens cache size = 256 ⠧ llm_load_vocab: token to piece cache size = 0.7999 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 128256 llm_load_print_meta: n_merges = 280147 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 131072 llm_load_print_meta: n_embd = 8192 llm_load_print_meta: n_layer = 80 llm_load_print_meta: n_head = 64 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 8 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 28672 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 500000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 131072 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: model type = 70B llm_load_print_meta: model ftype = Q4_0 llm_load_print_meta: model params = 70.55 B llm_load_print_meta: model size = 37.22 GiB (4.53 BPW) llm_load_print_meta: general.name = Meta Llama 3.1 70B Instruct llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' llm_load_print_meta: EOS token = 128009 '<|eot_id|>' llm_load_print_meta: LF token = 128 'Ä' llm_load_print_meta: EOT token = 128009 '<|eot_id|>' llm_load_print_meta: max token length = 256 ⠇ rocBLAS error: Could not initialize Tensile host: No devices found ```
GiteaMirror added the dockerbugamd labels 2026-05-03 23:51:52 -05:00
Author
Owner

@rick-github commented on GitHub (Aug 19, 2024):

What's the docker command you are using to start the container?

<!-- gh-comment-id:2297297472 --> @rick-github commented on GitHub (Aug 19, 2024): What's the docker command you are using to start the container?
Author
Owner

@peterschmidt85 commented on GitHub (Aug 19, 2024):

What's the docker command you are using to start the container?

@rick-github I was running it via dstack's integration with RunPod. Basically, RunPod runs the container.
With HF's TGI it works perfectly but not with Ollama.

<!-- gh-comment-id:2297365231 --> @peterschmidt85 commented on GitHub (Aug 19, 2024): > What's the docker command you are using to start the container? @rick-github I was running it via dstack's integration with RunPod. Basically, RunPod runs the container. With HF's TGI it works perfectly but not with Ollama.
Author
Owner

@rick-github commented on GitHub (Aug 19, 2024):

Can you see the parameters that the container is started with? For example, does it have --device /dev/kfd --device /dev/dri flags? Without these, ollama can't access the GPU.

<!-- gh-comment-id:2297375694 --> @rick-github commented on GitHub (Aug 19, 2024): Can you see the parameters that the container is started with? For example, does it have `--device /dev/kfd --device /dev/dri` flags? Without these, ollama can't access the GPU.
Author
Owner

@peterschmidt85 commented on GitHub (Aug 19, 2024):

Can you see the parameters that the container is started with? For example, does it have --device /dev/kfd --device /dev/dri flags? Without these, ollama can't access the GPU.

100% it mounts the device. You can even see this in the logs of the ollama serve:

time=2024-08-19T16:42:30.603Z level=INFO source=types.go:105 msg="inference compute" id=3 library=rocm compute=gfx942 driver=6.7 name=1002:74a1 total="192.0 GiB" available="191.7 GiB"
<!-- gh-comment-id:2297401439 --> @peterschmidt85 commented on GitHub (Aug 19, 2024): > Can you see the parameters that the container is started with? For example, does it have `--device /dev/kfd --device /dev/dri` flags? Without these, ollama can't access the GPU. 100% it mounts the device. You can even see this in the logs of the `ollama serve`: ``` time=2024-08-19T16:42:30.603Z level=INFO source=types.go:105 msg="inference compute" id=3 library=rocm compute=gfx942 driver=6.7 name=1002:74a1 total="192.0 GiB" available="191.7 GiB" ```
Author
Owner

@dhiltgen commented on GitHub (Aug 19, 2024):

From the logs, it looks like the amdgpu driver is enumerating 8 GPUs in sysfs, and GPU 3 is the correct one. My suspicion is something may be getting mixed up in the GPU selection, and then ROCm is trying to connect to one of the incorrect GPUs. Running with -e OLLAMA_DEBUG=1 may shed some more light, or you can also experiment with setting HIP_VISIBLE_DEVICES to various values (I'd start with 3) and see if that yields a working setup.

<!-- gh-comment-id:2297701610 --> @dhiltgen commented on GitHub (Aug 19, 2024): From the logs, it looks like the amdgpu driver is enumerating 8 GPUs in sysfs, and GPU 3 is the correct one. My suspicion is something may be getting mixed up in the GPU selection, and then ROCm is trying to connect to one of the incorrect GPUs. Running with `-e OLLAMA_DEBUG=1` may shed some more light, or you can also experiment with setting `HIP_VISIBLE_DEVICES` to various values (I'd start with 3) and see if that yields a working setup.
Author
Owner

@dhiltgen commented on GitHub (Sep 3, 2024):

If you're still having trouble, please share a debug log and I'll reopen.

<!-- gh-comment-id:2327612476 --> @dhiltgen commented on GitHub (Sep 3, 2024): If you're still having trouble, please share a debug log and I'll reopen.
Author
Owner

@rtaic-coder commented on GitHub (Sep 7, 2024):

I have same issue. Tried to add render and video to ubuntu user. But still get the same error.
After running the container. docker logs shows:

2024/09/07 03:15:21 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-09-07T03:15:21.232Z level=INFO source=images.go:753 msg="total blobs: 22"
time=2024-09-07T03:15:21.232Z level=INFO source=images.go:760 msg="total unused blobs removed: 0"
time=2024-09-07T03:15:21.233Z level=INFO source=routes.go:1172 msg="Listening on [::]:11434 (version 0.3.9-11-g037a4d1)"
time=2024-09-07T03:15:21.233Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3272431602/runners
time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/libggml.so.gz
time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/libllama.so.gz
time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz
time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/libggml.so.gz
time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/libllama.so.gz
time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz
time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/libggml.so.gz
time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/libllama.so.gz
time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz
time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60200 file=build/linux/x86_64/rocm_v60200/bin/libggml.so.gz
time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60200 file=build/linux/x86_64/rocm_v60200/bin/libllama.so.gz
time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60200 file=build/linux/x86_64/rocm_v60200/bin/ollama_llama_server.gz
time=2024-09-07T03:15:22.674Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu/ollama_llama_server
time=2024-09-07T03:15:22.674Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu_avx/ollama_llama_server
time=2024-09-07T03:15:22.674Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu_avx2/ollama_llama_server
time=2024-09-07T03:15:22.674Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/rocm_v60200/ollama_llama_server
time=2024-09-07T03:15:22.674Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx2 rocm_v60200 cpu cpu_avx]"
time=2024-09-07T03:15:22.674Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-09-07T03:15:22.674Z level=DEBUG source=sched.go:105 msg="starting llm scheduler"
time=2024-09-07T03:15:22.674Z level=INFO source=gpu.go:200 msg="looking for compatible GPUs"
time=2024-09-07T03:15:22.674Z level=DEBUG source=gpu.go:86 msg="searching for GPU discovery libraries for NVIDIA"
time=2024-09-07T03:15:22.674Z level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcuda.so*
time=2024-09-07T03:15:22.674Z level=DEBUG source=gpu.go:491 msg="gpu library search" globs="[/usr/lib/ollama/libcuda.so* /lib/ollama/libcuda.so* /opt/rocm/lib/libcuda.so* /opt/amdgpu/lib/x86_64-linux-gnu/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]"
time=2024-09-07T03:15:22.675Z level=DEBUG source=gpu.go:525 msg="discovered GPU libraries" paths=[]
time=2024-09-07T03:15:22.675Z level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcudart.so*
time=2024-09-07T03:15:22.675Z level=DEBUG source=gpu.go:491 msg="gpu library search" globs="[/usr/lib/ollama/libcudart.so* /lib/ollama/libcudart.so* /opt/rocm/lib/libcudart.so* /opt/amdgpu/lib/x86_64-linux-gnu/libcudart.so* /tmp/ollama3272431602/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]"
time=2024-09-07T03:15:22.675Z level=DEBUG source=gpu.go:525 msg="discovered GPU libraries" paths=[]
time=2024-09-07T03:15:22.675Z level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-09-07T03:15:22.675Z level=DEBUG source=amd_linux.go:127 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties"
time=2024-09-07T03:15:22.675Z level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties"
time=2024-09-07T03:15:22.675Z level=DEBUG source=amd_linux.go:217 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=29772 unique_id=4579446943467448766
time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_linux.go:251 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card1/device
time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_linux.go:283 msg="amdgpu memory" gpu=0 total="24.0 GiB"
time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_linux.go:284 msg="amdgpu memory" gpu=0 available="24.0 GiB"
time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:18 msg="evaluating potential rocm lib dir /usr/lib/ollama"
time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:21 msg="Checking glob pattern: /usr/lib/ollama/libhipblas.so.2*"
time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:21 msg="Checking glob pattern: /usr/lib/ollama/rocblas"
time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:18 msg="evaluating potential rocm lib dir /lib/ollama"
time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:21 msg="Checking glob pattern: /lib/ollama/libhipblas.so.2*"
time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:21 msg="Checking glob pattern: /lib/ollama/rocblas"
time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:18 msg="evaluating potential rocm lib dir /opt/rocm/lib"
time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:21 msg="Checking glob pattern: /opt/rocm/lib/libhipblas.so.2*"
time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:21 msg="Checking glob pattern: /opt/rocm/lib/rocblas"
time=2024-09-07T03:15:22.677Z level=DEBUG source=amd_linux.go:336 msg="rocm supported GPUs" types="[gfx1010 gfx1012 gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]"
time=2024-09-07T03:15:22.677Z level=INFO source=amd_linux.go:345 msg="amdgpu is supported" gpu=0 gpu_type=gfx1100
time=2024-09-07T03:15:22.677Z level=INFO source=types.go:107 msg="inference compute" id=0 library=rocm variant="" compute=gfx1100 driver=6.8 name=1002:744c total="24.0 GiB" available="24.0 GiB"

Then after running the llama model, we get this: Error: llama runner process has terminated: error:Could not initialize Tensile host: No devices found

time=2024-09-07T03:15:22.677Z level=INFO source=amd_linux.go:345 msg="amdgpu is supported" gpu=0 gpu_type=gfx1100
time=2024-09-07T03:15:22.677Z level=INFO source=types.go:107 msg="inference compute" id=0 library=rocm variant="" compute=gfx1100 driver=6.8 name=1002:744c total="24.0 GiB" available="24.0 GiB"
[GIN] 2024/09/07 - 03:18:40 | 200 |        23.3µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/09/07 - 03:18:40 | 200 |   12.174247ms |       127.0.0.1 | POST     "/api/show"
time=2024-09-07T03:18:40.103Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.8 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:40.103Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:40.103Z level=DEBUG source=sched.go:181 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=0x8263a0 gpu_count=1
time=2024-09-07T03:18:40.122Z level=DEBUG source=sched.go:224 msg="loading first model" model=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe
time=2024-09-07T03:18:40.122Z level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[24.0 GiB]"
time=2024-09-07T03:18:40.123Z level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe gpu=0 parallel=4 available=25716850688 required="6.2 GiB"
time=2024-09-07T03:18:40.123Z level=INFO source=server.go:101 msg="system memory" total="62.4 GiB" free="58.8 GiB" free_swap="8.0 GiB"
time=2024-09-07T03:18:40.123Z level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[24.0 GiB]"
time=2024-09-07T03:18:40.123Z level=INFO source=memory.go:326 msg="offload to rocm" layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[24.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="6.2 GiB" memory.required.partial="6.2 GiB" memory.required.kv="1.0 GiB" memory.required.allocations="[6.2 GiB]" memory.weights.total="4.7 GiB" memory.weights.repeating="4.3 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="560.0 MiB" memory.graph.partial="677.5 MiB"
time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu/ollama_llama_server
time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu_avx/ollama_llama_server
time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu_avx2/ollama_llama_server
time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/rocm_v60200/ollama_llama_server
time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu/ollama_llama_server
time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu_avx/ollama_llama_server
time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu_avx2/ollama_llama_server
time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/rocm_v60200/ollama_llama_server
time=2024-09-07T03:18:40.125Z level=INFO source=server.go:391 msg="starting llama server" cmd="/tmp/ollama3272431602/runners/rocm_v60200/ollama_llama_server --model /root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 33 --verbose --parallel 4 --port 42295"
time=2024-09-07T03:18:40.125Z level=DEBUG source=server.go:408 msg=subprocess environment="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin LD_LIBRARY_PATH=/opt/rocm/lib:/tmp/ollama3272431602/runners/rocm_v60200:/lib/ollama:/opt/rocm/lib:/opt/amdgpu/lib/x86_64-linux-gnu HIP_VISIBLE_DEVICES=0]"
time=2024-09-07T03:18:40.125Z level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2024-09-07T03:18:40.125Z level=INFO source=server.go:590 msg="waiting for llama runner to start responding"
time=2024-09-07T03:18:40.125Z level=INFO source=server.go:624 msg="waiting for server to become available" status="llm server error"
INFO [main] build info | build=1 commit="8962422b" tid="134934456185664" timestamp=1725679120
INFO [main] system info | n_threads=16 n_threads_batch=16 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="134934456185664" timestamp=1725679120 total_threads=32
INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="31" port="42295" tid="134934456185664" timestamp=1725679120
llama_model_loader: loaded meta data with 29 key-value pairs and 292 tensors from /root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Meta Llama 3.1 8B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Meta-Llama-3.1
llama_model_loader: - kv   5:                         general.size_label str              = 8B
llama_model_loader: - kv   6:                            general.license str              = llama3.1
llama_model_loader: - kv   7:                               general.tags arr[str,6]       = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv   8:                          general.languages arr[str,8]       = ["en", "de", "fr", "it", "pt", "hi", ...
llama_model_loader: - kv   9:                          llama.block_count u32              = 32
llama_model_loader: - kv  10:                       llama.context_length u32              = 131072
llama_model_loader: - kv  11:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv  12:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv  13:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv  14:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  15:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  16:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  17:                          general.file_type u32              = 2
llama_model_loader: - kv  18:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  19:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  20:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  21:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  22:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  23:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  24:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  25:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  26:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  27:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv  28:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   66 tensors
llama_model_loader: - type q4_0:  225 tensors
llama_model_loader: - type q6_K:    1 tensors
llm_load_vocab: special tokens cache size = 256
llm_load_vocab: token to piece cache size = 0.7999 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 128256
llm_load_print_meta: n_merges         = 280147
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 131072
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 14336
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 131072
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: ssm_dt_b_c_rms   = 0
llm_load_print_meta: model type       = 8B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 8.03 B
llm_load_print_meta: model size       = 4.33 GiB (4.64 BPW) 
llm_load_print_meta: general.name     = Meta Llama 3.1 8B Instruct
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128009 '<|eot_id|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'
llm_load_print_meta: max token length = 256

rocBLAS error: Could not initialize Tensile host: No devices found
time=2024-09-07T03:18:40.431Z level=DEBUG source=server.go:431 msg="llama runner terminated" error="signal: aborted (core dumped)"
time=2024-09-07T03:18:40.681Z level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: error:Could not initialize Tensile host: No devices found"
time=2024-09-07T03:18:40.681Z level=DEBUG source=sched.go:459 msg="triggering expiration for failed load" model=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe
time=2024-09-07T03:18:40.681Z level=DEBUG source=sched.go:360 msg="runner expired event received" modelPath=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe
[GIN] 2024/09/07 - 03:18:40 | 500 |  589.158933ms |       127.0.0.1 | POST     "/api/chat"
time=2024-09-07T03:18:40.681Z level=DEBUG source=sched.go:376 msg="got lock to unload" modelPath=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe
time=2024-09-07T03:18:40.681Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.8 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:40.682Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:40.682Z level=DEBUG source=server.go:1047 msg="stopping llama server"
time=2024-09-07T03:18:40.682Z level=DEBUG source=sched.go:381 msg="runner released" modelPath=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe
time=2024-09-07T03:18:40.933Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:40.933Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:41.182Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:41.182Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:41.432Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:41.432Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:41.683Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:41.683Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:41.932Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:41.932Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:42.182Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:42.182Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:42.433Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:42.433Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:42.682Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:42.682Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:42.932Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:42.932Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:43.183Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:43.183Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:43.432Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:43.432Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:43.682Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:43.682Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:43.932Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:43.933Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:44.182Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:44.182Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:44.432Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:44.432Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:44.683Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:44.683Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:44.932Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:44.932Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:45.182Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:45.182Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:45.432Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:45.433Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:45.682Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.000190324 model=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe
time=2024-09-07T03:18:45.682Z level=DEBUG source=sched.go:385 msg="sending an unloaded event" modelPath=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe
time=2024-09-07T03:18:45.682Z level=DEBUG source=sched.go:308 msg="ignoring unload event with no pending requests"
time=2024-09-07T03:18:45.682Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:45.682Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:45.932Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.250581314 model=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe
time=2024-09-07T03:18:45.932Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB"
time=2024-09-07T03:18:45.932Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB"
time=2024-09-07T03:18:46.182Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.500877044 model=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe

rocminfo:

ROCk module version 6.8.5 is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.14
Runtime Ext Version:     1.6
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 9 7950X3D 16-Core Processor
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 9 7950X3D 16-Core Processor
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   5759                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            32                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Memory Properties:       
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    65426568(0x3e65488) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    65426568(0x3e65488) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    65426568(0x3e65488) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1100                            
  Uuid:                    GPU-3f8d76bb6d2029be               
  Marketing Name:          Radeon RX 7900 XTX                 
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      32(0x20) KB                        
    L2:                      6144(0x1800) KB                    
    L3:                      98304(0x18000) KB                  
  Chip ID:                 29772(0x744c)                      
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2371                               
  BDFID:                   5888                               
  Internal Node ID:        1                                  
  Compute Unit:            96                                 
  SIMDs per CU:            2                                  
  Shader Engines:          6                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Memory Properties:       
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 232                                
  SDMA engine uCode::      21                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    25149440(0x17fc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    25149440(0x17fc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1100         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***  
<!-- gh-comment-id:2335020779 --> @rtaic-coder commented on GitHub (Sep 7, 2024): I have same issue. Tried to add render and video to ubuntu user. But still get the same error. After running the container. docker logs shows: ```log 2024/09/07 03:15:21 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]" time=2024-09-07T03:15:21.232Z level=INFO source=images.go:753 msg="total blobs: 22" time=2024-09-07T03:15:21.232Z level=INFO source=images.go:760 msg="total unused blobs removed: 0" time=2024-09-07T03:15:21.233Z level=INFO source=routes.go:1172 msg="Listening on [::]:11434 (version 0.3.9-11-g037a4d1)" time=2024-09-07T03:15:21.233Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama3272431602/runners time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/libggml.so.gz time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/libllama.so.gz time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu file=build/linux/x86_64/cpu/bin/ollama_llama_server.gz time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/libggml.so.gz time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/libllama.so.gz time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx file=build/linux/x86_64/cpu_avx/bin/ollama_llama_server.gz time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/libggml.so.gz time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/libllama.so.gz time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=cpu_avx2 file=build/linux/x86_64/cpu_avx2/bin/ollama_llama_server.gz time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60200 file=build/linux/x86_64/rocm_v60200/bin/libggml.so.gz time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60200 file=build/linux/x86_64/rocm_v60200/bin/libllama.so.gz time=2024-09-07T03:15:21.233Z level=DEBUG source=payload.go:182 msg=extracting variant=rocm_v60200 file=build/linux/x86_64/rocm_v60200/bin/ollama_llama_server.gz time=2024-09-07T03:15:22.674Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu/ollama_llama_server time=2024-09-07T03:15:22.674Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu_avx/ollama_llama_server time=2024-09-07T03:15:22.674Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu_avx2/ollama_llama_server time=2024-09-07T03:15:22.674Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/rocm_v60200/ollama_llama_server time=2024-09-07T03:15:22.674Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx2 rocm_v60200 cpu cpu_avx]" time=2024-09-07T03:15:22.674Z level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" time=2024-09-07T03:15:22.674Z level=DEBUG source=sched.go:105 msg="starting llm scheduler" time=2024-09-07T03:15:22.674Z level=INFO source=gpu.go:200 msg="looking for compatible GPUs" time=2024-09-07T03:15:22.674Z level=DEBUG source=gpu.go:86 msg="searching for GPU discovery libraries for NVIDIA" time=2024-09-07T03:15:22.674Z level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcuda.so* time=2024-09-07T03:15:22.674Z level=DEBUG source=gpu.go:491 msg="gpu library search" globs="[/usr/lib/ollama/libcuda.so* /lib/ollama/libcuda.so* /opt/rocm/lib/libcuda.so* /opt/amdgpu/lib/x86_64-linux-gnu/libcuda.so* /usr/local/cuda*/targets/*/lib/libcuda.so* /usr/lib/*-linux-gnu/nvidia/current/libcuda.so* /usr/lib/*-linux-gnu/libcuda.so* /usr/lib/wsl/lib/libcuda.so* /usr/lib/wsl/drivers/*/libcuda.so* /opt/cuda/lib*/libcuda.so* /usr/local/cuda/lib*/libcuda.so* /usr/lib*/libcuda.so* /usr/local/lib*/libcuda.so*]" time=2024-09-07T03:15:22.675Z level=DEBUG source=gpu.go:525 msg="discovered GPU libraries" paths=[] time=2024-09-07T03:15:22.675Z level=DEBUG source=gpu.go:468 msg="Searching for GPU library" name=libcudart.so* time=2024-09-07T03:15:22.675Z level=DEBUG source=gpu.go:491 msg="gpu library search" globs="[/usr/lib/ollama/libcudart.so* /lib/ollama/libcudart.so* /opt/rocm/lib/libcudart.so* /opt/amdgpu/lib/x86_64-linux-gnu/libcudart.so* /tmp/ollama3272431602/runners/cuda*/libcudart.so* /usr/local/cuda/lib64/libcudart.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/x86_64-linux-gnu/libcudart.so* /usr/lib/wsl/lib/libcudart.so* /usr/lib/wsl/drivers/*/libcudart.so* /opt/cuda/lib64/libcudart.so* /usr/local/cuda*/targets/aarch64-linux/lib/libcudart.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libcudart.so* /usr/lib/aarch64-linux-gnu/libcudart.so* /usr/local/cuda/lib*/libcudart.so* /usr/lib*/libcudart.so* /usr/local/lib*/libcudart.so*]" time=2024-09-07T03:15:22.675Z level=DEBUG source=gpu.go:525 msg="discovered GPU libraries" paths=[] time=2024-09-07T03:15:22.675Z level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-09-07T03:15:22.675Z level=DEBUG source=amd_linux.go:127 msg="detected CPU /sys/class/kfd/kfd/topology/nodes/0/properties" time=2024-09-07T03:15:22.675Z level=DEBUG source=amd_linux.go:102 msg="evaluating amdgpu node /sys/class/kfd/kfd/topology/nodes/1/properties" time=2024-09-07T03:15:22.675Z level=DEBUG source=amd_linux.go:217 msg="mapping amdgpu to drm sysfs nodes" amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties vendor=4098 device=29772 unique_id=4579446943467448766 time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_linux.go:251 msg=matched amdgpu=/sys/class/kfd/kfd/topology/nodes/1/properties drm=/sys/class/drm/card1/device time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_linux.go:283 msg="amdgpu memory" gpu=0 total="24.0 GiB" time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_linux.go:284 msg="amdgpu memory" gpu=0 available="24.0 GiB" time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:18 msg="evaluating potential rocm lib dir /usr/lib/ollama" time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:21 msg="Checking glob pattern: /usr/lib/ollama/libhipblas.so.2*" time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:21 msg="Checking glob pattern: /usr/lib/ollama/rocblas" time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:18 msg="evaluating potential rocm lib dir /lib/ollama" time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:21 msg="Checking glob pattern: /lib/ollama/libhipblas.so.2*" time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:21 msg="Checking glob pattern: /lib/ollama/rocblas" time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:18 msg="evaluating potential rocm lib dir /opt/rocm/lib" time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:21 msg="Checking glob pattern: /opt/rocm/lib/libhipblas.so.2*" time=2024-09-07T03:15:22.676Z level=DEBUG source=amd_common.go:21 msg="Checking glob pattern: /opt/rocm/lib/rocblas" time=2024-09-07T03:15:22.677Z level=DEBUG source=amd_linux.go:336 msg="rocm supported GPUs" types="[gfx1010 gfx1012 gfx1030 gfx1100 gfx1101 gfx1102 gfx900 gfx906 gfx908 gfx90a gfx940 gfx941 gfx942]" time=2024-09-07T03:15:22.677Z level=INFO source=amd_linux.go:345 msg="amdgpu is supported" gpu=0 gpu_type=gfx1100 time=2024-09-07T03:15:22.677Z level=INFO source=types.go:107 msg="inference compute" id=0 library=rocm variant="" compute=gfx1100 driver=6.8 name=1002:744c total="24.0 GiB" available="24.0 GiB" ``` Then after running the llama model, we get this: _**Error: llama runner process has terminated: error:Could not initialize Tensile host: No devices found**_ ```log time=2024-09-07T03:15:22.677Z level=INFO source=amd_linux.go:345 msg="amdgpu is supported" gpu=0 gpu_type=gfx1100 time=2024-09-07T03:15:22.677Z level=INFO source=types.go:107 msg="inference compute" id=0 library=rocm variant="" compute=gfx1100 driver=6.8 name=1002:744c total="24.0 GiB" available="24.0 GiB" [GIN] 2024/09/07 - 03:18:40 | 200 | 23.3µs | 127.0.0.1 | HEAD "/" [GIN] 2024/09/07 - 03:18:40 | 200 | 12.174247ms | 127.0.0.1 | POST "/api/show" time=2024-09-07T03:18:40.103Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.8 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:40.103Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:40.103Z level=DEBUG source=sched.go:181 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=0x8263a0 gpu_count=1 time=2024-09-07T03:18:40.122Z level=DEBUG source=sched.go:224 msg="loading first model" model=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe time=2024-09-07T03:18:40.122Z level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[24.0 GiB]" time=2024-09-07T03:18:40.123Z level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe gpu=0 parallel=4 available=25716850688 required="6.2 GiB" time=2024-09-07T03:18:40.123Z level=INFO source=server.go:101 msg="system memory" total="62.4 GiB" free="58.8 GiB" free_swap="8.0 GiB" time=2024-09-07T03:18:40.123Z level=DEBUG source=memory.go:103 msg=evaluating library=rocm gpu_count=1 available="[24.0 GiB]" time=2024-09-07T03:18:40.123Z level=INFO source=memory.go:326 msg="offload to rocm" layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[24.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="6.2 GiB" memory.required.partial="6.2 GiB" memory.required.kv="1.0 GiB" memory.required.allocations="[6.2 GiB]" memory.weights.total="4.7 GiB" memory.weights.repeating="4.3 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="560.0 MiB" memory.graph.partial="677.5 MiB" time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu/ollama_llama_server time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu_avx/ollama_llama_server time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu_avx2/ollama_llama_server time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/rocm_v60200/ollama_llama_server time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu/ollama_llama_server time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu_avx/ollama_llama_server time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/cpu_avx2/ollama_llama_server time=2024-09-07T03:18:40.123Z level=DEBUG source=payload.go:71 msg="availableServers : found" file=/tmp/ollama3272431602/runners/rocm_v60200/ollama_llama_server time=2024-09-07T03:18:40.125Z level=INFO source=server.go:391 msg="starting llama server" cmd="/tmp/ollama3272431602/runners/rocm_v60200/ollama_llama_server --model /root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 33 --verbose --parallel 4 --port 42295" time=2024-09-07T03:18:40.125Z level=DEBUG source=server.go:408 msg=subprocess environment="[PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin LD_LIBRARY_PATH=/opt/rocm/lib:/tmp/ollama3272431602/runners/rocm_v60200:/lib/ollama:/opt/rocm/lib:/opt/amdgpu/lib/x86_64-linux-gnu HIP_VISIBLE_DEVICES=0]" time=2024-09-07T03:18:40.125Z level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2024-09-07T03:18:40.125Z level=INFO source=server.go:590 msg="waiting for llama runner to start responding" time=2024-09-07T03:18:40.125Z level=INFO source=server.go:624 msg="waiting for server to become available" status="llm server error" INFO [main] build info | build=1 commit="8962422b" tid="134934456185664" timestamp=1725679120 INFO [main] system info | n_threads=16 n_threads_batch=16 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="134934456185664" timestamp=1725679120 total_threads=32 INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="31" port="42295" tid="134934456185664" timestamp=1725679120 llama_model_loader: loaded meta data with 29 key-value pairs and 292 tensors from /root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Meta Llama 3.1 8B Instruct llama_model_loader: - kv 3: general.finetune str = Instruct llama_model_loader: - kv 4: general.basename str = Meta-Llama-3.1 llama_model_loader: - kv 5: general.size_label str = 8B llama_model_loader: - kv 6: general.license str = llama3.1 llama_model_loader: - kv 7: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam... llama_model_loader: - kv 8: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ... llama_model_loader: - kv 9: llama.block_count u32 = 32 llama_model_loader: - kv 10: llama.context_length u32 = 131072 llama_model_loader: - kv 11: llama.embedding_length u32 = 4096 llama_model_loader: - kv 12: llama.feed_forward_length u32 = 14336 llama_model_loader: - kv 13: llama.attention.head_count u32 = 32 llama_model_loader: - kv 14: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 15: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 16: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 17: general.file_type u32 = 2 llama_model_loader: - kv 18: llama.vocab_size u32 = 128256 llama_model_loader: - kv 19: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 20: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 21: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 24: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 25: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 27: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ... llama_model_loader: - kv 28: general.quantization_version u32 = 2 llama_model_loader: - type f32: 66 tensors llama_model_loader: - type q4_0: 225 tensors llama_model_loader: - type q6_K: 1 tensors llm_load_vocab: special tokens cache size = 256 llm_load_vocab: token to piece cache size = 0.7999 MB llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = BPE llm_load_print_meta: n_vocab = 128256 llm_load_print_meta: n_merges = 280147 llm_load_print_meta: vocab_only = 0 llm_load_print_meta: n_ctx_train = 131072 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_swa = 0 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 4 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 14336 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 500000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_ctx_orig_yarn = 131072 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: ssm_dt_b_c_rms = 0 llm_load_print_meta: model type = 8B llm_load_print_meta: model ftype = Q4_0 llm_load_print_meta: model params = 8.03 B llm_load_print_meta: model size = 4.33 GiB (4.64 BPW) llm_load_print_meta: general.name = Meta Llama 3.1 8B Instruct llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' llm_load_print_meta: EOS token = 128009 '<|eot_id|>' llm_load_print_meta: LF token = 128 'Ä' llm_load_print_meta: EOT token = 128009 '<|eot_id|>' llm_load_print_meta: max token length = 256 rocBLAS error: Could not initialize Tensile host: No devices found time=2024-09-07T03:18:40.431Z level=DEBUG source=server.go:431 msg="llama runner terminated" error="signal: aborted (core dumped)" time=2024-09-07T03:18:40.681Z level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: error:Could not initialize Tensile host: No devices found" time=2024-09-07T03:18:40.681Z level=DEBUG source=sched.go:459 msg="triggering expiration for failed load" model=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe time=2024-09-07T03:18:40.681Z level=DEBUG source=sched.go:360 msg="runner expired event received" modelPath=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe [GIN] 2024/09/07 - 03:18:40 | 500 | 589.158933ms | 127.0.0.1 | POST "/api/chat" time=2024-09-07T03:18:40.681Z level=DEBUG source=sched.go:376 msg="got lock to unload" modelPath=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe time=2024-09-07T03:18:40.681Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.8 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:40.682Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:40.682Z level=DEBUG source=server.go:1047 msg="stopping llama server" time=2024-09-07T03:18:40.682Z level=DEBUG source=sched.go:381 msg="runner released" modelPath=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe time=2024-09-07T03:18:40.933Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:40.933Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:41.182Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:41.182Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:41.432Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:41.432Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:41.683Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:41.683Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:41.932Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:41.932Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:42.182Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:42.182Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:42.433Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:42.433Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:42.682Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:42.682Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:42.932Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:42.932Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:43.183Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:43.183Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:43.432Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:43.432Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:43.682Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:43.682Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:43.932Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:43.933Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:44.182Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:44.182Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:44.432Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:44.432Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:44.683Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:44.683Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:44.932Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:44.932Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:45.182Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:45.182Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:45.432Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:45.433Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:45.682Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.000190324 model=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe time=2024-09-07T03:18:45.682Z level=DEBUG source=sched.go:385 msg="sending an unloaded event" modelPath=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe time=2024-09-07T03:18:45.682Z level=DEBUG source=sched.go:308 msg="ignoring unload event with no pending requests" time=2024-09-07T03:18:45.682Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:45.682Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:45.932Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.250581314 model=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe time=2024-09-07T03:18:45.932Z level=DEBUG source=gpu.go:359 msg="updating system memory data" before.total="62.4 GiB" before.free="58.7 GiB" before.free_swap="8.0 GiB" now.total="62.4 GiB" now.free="58.7 GiB" now.free_swap="8.0 GiB" time=2024-09-07T03:18:45.932Z level=DEBUG source=amd_linux.go:440 msg="updating rocm free memory" gpu=0 name=1002:744c before="24.0 GiB" now="24.0 GiB" time=2024-09-07T03:18:46.182Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.500877044 model=/root/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe ``` rocminfo: ``` ROCk module version 6.8.5 is loaded ===================== HSA System Attributes ===================== Runtime Version: 1.14 Runtime Ext Version: 1.6 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE Mwaitx: DISABLED DMAbuf Support: YES ========== HSA Agents ========== ******* Agent 1 ******* Name: AMD Ryzen 9 7950X3D 16-Core Processor Uuid: CPU-XX Marketing Name: AMD Ryzen 9 7950X3D 16-Core Processor Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 5759 BDFID: 0 Internal Node ID: 0 Compute Unit: 32 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Memory Properties: Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 65426568(0x3e65488) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 65426568(0x3e65488) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 65426568(0x3e65488) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info: ******* Agent 2 ******* Name: gfx1100 Uuid: GPU-3f8d76bb6d2029be Marketing Name: Radeon RX 7900 XTX Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 32(0x20) KB L2: 6144(0x1800) KB L3: 98304(0x18000) KB Chip ID: 29772(0x744c) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 2371 BDFID: 5888 Internal Node ID: 1 Compute Unit: 96 SIMDs per CU: 2 Shader Engines: 6 Shader Arrs. per Eng.: 2 WatchPts on Addr. Ranges:4 Coherent Host Access: FALSE Memory Properties: Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 32(0x20) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 232 SDMA engine uCode:: 21 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 25149440(0x17fc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 25149440(0x17fc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Recommended Granule:0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx1100 Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 *** Done *** ```
Author
Owner

@rtaic-coder commented on GitHub (Sep 8, 2024):

BTW, ollama runs in GPU fine if I run directly on the host. Ubuntu 24.04 rcom 6.2.0 installed.

<!-- gh-comment-id:2336499519 --> @rtaic-coder commented on GitHub (Sep 8, 2024): BTW, ollama runs in GPU fine if I run directly on the host. Ubuntu 24.04 rcom 6.2.0 installed.
Author
Owner

@dhiltgen commented on GitHub (Sep 10, 2024):

@rtaic-coder from your logs, it sounds like you're hitting the same failure mode as #6685 Lets track it under that issue.

<!-- gh-comment-id:2341327386 --> @dhiltgen commented on GitHub (Sep 10, 2024): @rtaic-coder from your logs, it sounds like you're hitting the same failure mode as #6685 Lets track it under that issue.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#66076