[GH-ISSUE #10883] Updated Ollama, now it won't use my GPU #69210

Closed
opened 2026-05-04 17:28:52 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @JonJust on GitHub (May 28, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10883

What is the issue?

Today, Ollama prompted me to update when I tried to run Qwen3:14b. I ran the oneline install command (curl -fsSL https://ollama.com/install.sh | sh), and during the install, it said "Nvidia GPU installed.", even though I run an AMD card. Now, whenever I try to run a model, it won't use my gfx card.

Right before updating, I observed my GPU being utilized with radeontop. I watched the VRAM fill up, etc. Now, I can see that my Graphics card isn't being utilized at all in radeontop, but my cores all go to 100 in htop - even small models run painfully slow. I am certain that this behavior began immediately after running the install script. (And yes, I have ROCm installed.)

I tried to force the backend, which didn't seem to work:

OLLAMA_GPU_BACKEND=rocm ollama run mymodel

I can also verify that my GPU is visible to ROCm, and I have no NVIDIA card in my machine.

Here's what I get when I run journalctl -u ollama --no-pager --follow --pager-end

... msg="looking for compatible GPUs"
... msg="no nvidia devices detected by library /usr/lib/x86_64-linux-gnu/libcuda.so.570.133.07"
... msg="amdgpu is supported" gpu=0 gpu_type=gfx1102
... msg="inference compute" id=0 library=rocm variant="" compute=gfx1102 driver=6.12 name=1002:7480 total="16.0 GiB" available="15.8 GiB"
... msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e gpu=0 parallel=2 available=16404119552 required="11.2 GiB"
... msg="system memory" total="78.5 GiB" free="75.3 GiB" free_swap="8.0 GiB"
... msg=offload library=rocm layers.requested=-1 layers.model=41 layers.offload=41 layers.split="" memory.available="[15.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="11.2 GiB" memory.required.partial="11.2 GiB" memory.required.kv="1.2 GiB" memory.required.allocations="[11.2 GiB]" memory.weights.total="8.2 GiB" memory.weights.repeating="7.6 GiB" memory.weights.nonrepeating="608.6 MiB" memory.graph.full="1.0 GiB" memory.graph.partial="1.0 GiB"
.... llama_model_loader: loaded meta data with 27 key-value pairs and 443 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e (version GGUF V3 (latest))

So, from what I can tell, Ollama sees my GPU, identifies that there is enough VRAM to hold the model, identifies that rocm is installed, yet makes the conscious choice to try and run the model on my CPU instead.... Unless there something obvious I'm missing, I am pretty sure this is a bug?

Neofetch:

OS: Ubuntu 24.04.2 LTS x86_64
Host: B460MDS3HV2 -CF
Kernel: 6.11.0-26-generic
Uptime: 5 mins
Packages: 3559 (dpkg), 5 (flatpak), 16 (snap)
Shell: bash 5.2.21
Resolution: 2560x1440
DE: GNOME 46.0
WM: Mutter
WM Theme: Adwaita
Theme: Yaru-blue-dark [GTK2/3]
Icons: Yaru-blue [GTK2/3]
Terminal: gnome-terminal
CPU: Intel i7-10700K (16) @ 5.100GHz
GPU: AMD ATI Radeon RX 7600/7600 XT/7600M XT/7600S/7700S / PRO W7600
Memory: 4550MiB / 80346MiB

ollama --version:
ollama version is 0.7.1

Relevant log output

May 27 21:06:38 tank systemd[1]: Started ollama.service - Ollama Service.
May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.668-05:00 level=INFO source=routes.go:1205 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.677-05:00 level=INFO source=images.go:463 msg="total blobs: 37"
May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.677-05:00 level=INFO source=images.go:470 msg="total unused blobs removed: 0"
May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.677-05:00 level=INFO source=routes.go:1258 msg="Listening on 127.0.0.1:11434 (version 0.7.1)"
May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.678-05:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.701-05:00 level=INFO source=gpu.go:602 msg="no nvidia devices detected by library /usr/lib/x86_64-linux-gnu/libcuda.so.570.133.07"
May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.727-05:00 level=INFO source=amd_linux.go:386 msg="amdgpu is supported" gpu=0 gpu_type=gfx1102
May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.730-05:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=rocm variant="" compute=gfx1102 driver=6.12 name=1002:7480 total="16.0 GiB" available="15.8 GiB"
May 27 21:07:12 tank ollama[3099]: [GIN] 2025/05/27 - 21:07:12 | 204 |     554.384µs |       127.0.0.1 | OPTIONS  "/api/tags"
May 27 21:07:12 tank ollama[3099]: [GIN] 2025/05/27 - 21:07:12 | 200 |    2.145944ms |       127.0.0.1 | GET      "/api/tags"
May 27 21:07:26 tank ollama[3099]: [GIN] 2025/05/27 - 21:07:26 | 204 |      15.502µs |       127.0.0.1 | OPTIONS  "/api/chat"
May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.392-05:00 level=INFO source=sched.go:788 msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e gpu=0 parallel=2 available=16404119552 required="11.2 GiB"
May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.392-05:00 level=INFO source=server.go:135 msg="system memory" total="78.5 GiB" free="75.3 GiB" free_swap="8.0 GiB"
May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.393-05:00 level=INFO source=server.go:168 msg=offload library=rocm layers.requested=-1 layers.model=41 layers.offload=41 layers.split="" memory.available="[15.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="11.2 GiB" memory.required.partial="11.2 GiB" memory.required.kv="1.2 GiB" memory.required.allocations="[11.2 GiB]" memory.weights.total="8.2 GiB" memory.weights.repeating="7.6 GiB" memory.weights.nonrepeating="608.6 MiB" memory.graph.full="1.0 GiB" memory.graph.partial="1.0 GiB"
May 27 21:07:26 tank ollama[3099]: llama_model_loader: loaded meta data with 27 key-value pairs and 443 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e (version GGUF V3 (latest))
May 27 21:07:26 tank ollama[3099]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   0:                       general.architecture str              = qwen3
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   1:                               general.type str              = model
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   2:                               general.name str              = Qwen3 14B
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   3:                           general.basename str              = Qwen3
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   4:                         general.size_label str              = 14B
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   5:                          qwen3.block_count u32              = 40
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   6:                       qwen3.context_length u32              = 40960
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   7:                     qwen3.embedding_length u32              = 5120
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   8:                  qwen3.feed_forward_length u32              = 17408
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   9:                 qwen3.attention.head_count u32              = 40
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  10:              qwen3.attention.head_count_kv u32              = 8
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  11:                       qwen3.rope.freq_base f32              = 1000000.000000
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  12:     qwen3.attention.layer_norm_rms_epsilon f32              = 0.000001
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  13:                 qwen3.attention.key_length u32              = 128
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  14:               qwen3.attention.value_length u32              = 128
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = gpt2
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  16:                         tokenizer.ggml.pre str              = qwen2
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  17:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  19:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  20:                tokenizer.ggml.eos_token_id u32              = 151645
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 151643
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  22:                tokenizer.ggml.bos_token_id u32              = 151643
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  23:               tokenizer.ggml.add_bos_token bool             = false
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  24:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  25:               general.quantization_version u32              = 2
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  26:                          general.file_type u32              = 15
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type  f32:  161 tensors
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type  f16:   40 tensors
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type q4_K:  221 tensors
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type q6_K:   21 tensors
May 27 21:07:26 tank ollama[3099]: print_info: file format = GGUF V3 (latest)
May 27 21:07:26 tank ollama[3099]: print_info: file type   = Q4_K - Medium
May 27 21:07:26 tank ollama[3099]: print_info: file size   = 8.63 GiB (5.02 BPW)
May 27 21:07:26 tank ollama[3099]: load: special tokens cache size = 26
May 27 21:07:26 tank ollama[3099]: load: token to piece cache size = 0.9311 MB
May 27 21:07:26 tank ollama[3099]: print_info: arch             = qwen3
May 27 21:07:26 tank ollama[3099]: print_info: vocab_only       = 1
May 27 21:07:26 tank ollama[3099]: print_info: model type       = ?B
May 27 21:07:26 tank ollama[3099]: print_info: model params     = 14.77 B
May 27 21:07:26 tank ollama[3099]: print_info: general.name     = Qwen3 14B
May 27 21:07:26 tank ollama[3099]: print_info: vocab type       = BPE
May 27 21:07:26 tank ollama[3099]: print_info: n_vocab          = 151936
May 27 21:07:26 tank ollama[3099]: print_info: n_merges         = 151387
May 27 21:07:26 tank ollama[3099]: print_info: BOS token        = 151643 '<|endoftext|>'
May 27 21:07:26 tank ollama[3099]: print_info: EOS token        = 151645 '<|im_end|>'
May 27 21:07:26 tank ollama[3099]: print_info: EOT token        = 151645 '<|im_end|>'
May 27 21:07:26 tank ollama[3099]: print_info: PAD token        = 151643 '<|endoftext|>'
May 27 21:07:26 tank ollama[3099]: print_info: LF token         = 198 'Ċ'
May 27 21:07:26 tank ollama[3099]: print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
May 27 21:07:26 tank ollama[3099]: print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
May 27 21:07:26 tank ollama[3099]: print_info: FIM MID token    = 151660 '<|fim_middle|>'
May 27 21:07:26 tank ollama[3099]: print_info: FIM PAD token    = 151662 '<|fim_pad|>'
May 27 21:07:26 tank ollama[3099]: print_info: FIM REP token    = 151663 '<|repo_name|>'
May 27 21:07:26 tank ollama[3099]: print_info: FIM SEP token    = 151664 '<|file_sep|>'
May 27 21:07:26 tank ollama[3099]: print_info: EOG token        = 151643 '<|endoftext|>'
May 27 21:07:26 tank ollama[3099]: print_info: EOG token        = 151645 '<|im_end|>'
May 27 21:07:26 tank ollama[3099]: print_info: EOG token        = 151662 '<|fim_pad|>'
May 27 21:07:26 tank ollama[3099]: print_info: EOG token        = 151663 '<|repo_name|>'
May 27 21:07:26 tank ollama[3099]: print_info: EOG token        = 151664 '<|file_sep|>'
May 27 21:07:26 tank ollama[3099]: print_info: max token length = 256
May 27 21:07:26 tank ollama[3099]: llama_model_load: vocab only - skipping tensors
May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.584-05:00 level=INFO source=server.go:431 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e --ctx-size 8192 --batch-size 512 --n-gpu-layers 41 --threads 8 --parallel 2 --port 43613"
May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.584-05:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.584-05:00 level=INFO source=server.go:591 msg="waiting for llama runner to start responding"
May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.585-05:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server not responding"
May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.591-05:00 level=INFO source=runner.go:815 msg="starting go runner"
May 27 21:07:26 tank ollama[3099]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.601-05:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.602-05:00 level=INFO source=runner.go:874 msg="Server listening on 127.0.0.1:43613"
May 27 21:07:26 tank ollama[3099]: llama_model_loader: loaded meta data with 27 key-value pairs and 443 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e (version GGUF V3 (latest))
May 27 21:07:26 tank ollama[3099]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   0:                       general.architecture str              = qwen3
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   1:                               general.type str              = model
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   2:                               general.name str              = Qwen3 14B
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   3:                           general.basename str              = Qwen3
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   4:                         general.size_label str              = 14B
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   5:                          qwen3.block_count u32              = 40
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   6:                       qwen3.context_length u32              = 40960
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   7:                     qwen3.embedding_length u32              = 5120
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   8:                  qwen3.feed_forward_length u32              = 17408
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   9:                 qwen3.attention.head_count u32              = 40
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  10:              qwen3.attention.head_count_kv u32              = 8
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  11:                       qwen3.rope.freq_base f32              = 1000000.000000
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  12:     qwen3.attention.layer_norm_rms_epsilon f32              = 0.000001
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  13:                 qwen3.attention.key_length u32              = 128
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  14:               qwen3.attention.value_length u32              = 128
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = gpt2
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  16:                         tokenizer.ggml.pre str              = qwen2
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  17:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  19:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  20:                tokenizer.ggml.eos_token_id u32              = 151645
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 151643
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  22:                tokenizer.ggml.bos_token_id u32              = 151643
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  23:               tokenizer.ggml.add_bos_token bool             = false
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  24:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  25:               general.quantization_version u32              = 2
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  26:                          general.file_type u32              = 15
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type  f32:  161 tensors
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type  f16:   40 tensors
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type q4_K:  221 tensors
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type q6_K:   21 tensors
May 27 21:07:26 tank ollama[3099]: print_info: file format = GGUF V3 (latest)
May 27 21:07:26 tank ollama[3099]: print_info: file type   = Q4_K - Medium
May 27 21:07:26 tank ollama[3099]: print_info: file size   = 8.63 GiB (5.02 BPW)
May 27 21:07:26 tank ollama[3099]: load: special tokens cache size = 26
May 27 21:07:26 tank ollama[3099]: load: token to piece cache size = 0.9311 MB
May 27 21:07:26 tank ollama[3099]: print_info: arch             = qwen3
May 27 21:07:26 tank ollama[3099]: print_info: vocab_only       = 0
May 27 21:07:26 tank ollama[3099]: print_info: n_ctx_train      = 40960
May 27 21:07:26 tank ollama[3099]: print_info: n_embd           = 5120
May 27 21:07:26 tank ollama[3099]: print_info: n_layer          = 40
May 27 21:07:26 tank ollama[3099]: print_info: n_head           = 40
May 27 21:07:26 tank ollama[3099]: print_info: n_head_kv        = 8
May 27 21:07:26 tank ollama[3099]: print_info: n_rot            = 128
May 27 21:07:26 tank ollama[3099]: print_info: n_swa            = 0
May 27 21:07:26 tank ollama[3099]: print_info: n_swa_pattern    = 1
May 27 21:07:26 tank ollama[3099]: print_info: n_embd_head_k    = 128
May 27 21:07:26 tank ollama[3099]: print_info: n_embd_head_v    = 128
May 27 21:07:26 tank ollama[3099]: print_info: n_gqa            = 5
May 27 21:07:26 tank ollama[3099]: print_info: n_embd_k_gqa     = 1024
May 27 21:07:26 tank ollama[3099]: print_info: n_embd_v_gqa     = 1024
May 27 21:07:26 tank ollama[3099]: print_info: f_norm_eps       = 0.0e+00
May 27 21:07:26 tank ollama[3099]: print_info: f_norm_rms_eps   = 1.0e-06
May 27 21:07:26 tank ollama[3099]: print_info: f_clamp_kqv      = 0.0e+00
May 27 21:07:26 tank ollama[3099]: print_info: f_max_alibi_bias = 0.0e+00
May 27 21:07:26 tank ollama[3099]: print_info: f_logit_scale    = 0.0e+00
May 27 21:07:26 tank ollama[3099]: print_info: f_attn_scale     = 0.0e+00
May 27 21:07:26 tank ollama[3099]: print_info: n_ff             = 17408
May 27 21:07:26 tank ollama[3099]: print_info: n_expert         = 0
May 27 21:07:26 tank ollama[3099]: print_info: n_expert_used    = 0
May 27 21:07:26 tank ollama[3099]: print_info: causal attn      = 1
May 27 21:07:26 tank ollama[3099]: print_info: pooling type     = 0
May 27 21:07:26 tank ollama[3099]: print_info: rope type        = 2
May 27 21:07:26 tank ollama[3099]: print_info: rope scaling     = linear
May 27 21:07:26 tank ollama[3099]: print_info: freq_base_train  = 1000000.0
May 27 21:07:26 tank ollama[3099]: print_info: freq_scale_train = 1
May 27 21:07:26 tank ollama[3099]: print_info: n_ctx_orig_yarn  = 40960
May 27 21:07:26 tank ollama[3099]: print_info: rope_finetuned   = unknown
May 27 21:07:26 tank ollama[3099]: print_info: ssm_d_conv       = 0
May 27 21:07:26 tank ollama[3099]: print_info: ssm_d_inner      = 0
May 27 21:07:26 tank ollama[3099]: print_info: ssm_d_state      = 0
May 27 21:07:26 tank ollama[3099]: print_info: ssm_dt_rank      = 0
May 27 21:07:26 tank ollama[3099]: print_info: ssm_dt_b_c_rms   = 0
May 27 21:07:26 tank ollama[3099]: print_info: model type       = 14B
May 27 21:07:26 tank ollama[3099]: print_info: model params     = 14.77 B
May 27 21:07:26 tank ollama[3099]: print_info: general.name     = Qwen3 14B
May 27 21:07:26 tank ollama[3099]: print_info: vocab type       = BPE
May 27 21:07:26 tank ollama[3099]: print_info: n_vocab          = 151936
May 27 21:07:26 tank ollama[3099]: print_info: n_merges         = 151387
May 27 21:07:26 tank ollama[3099]: print_info: BOS token        = 151643 '<|endoftext|>'
May 27 21:07:26 tank ollama[3099]: print_info: EOS token        = 151645 '<|im_end|>'
May 27 21:07:26 tank ollama[3099]: print_info: EOT token        = 151645 '<|im_end|>'
May 27 21:07:26 tank ollama[3099]: print_info: PAD token        = 151643 '<|endoftext|>'
May 27 21:07:26 tank ollama[3099]: print_info: LF token         = 198 'Ċ'
May 27 21:07:26 tank ollama[3099]: print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
May 27 21:07:26 tank ollama[3099]: print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
May 27 21:07:26 tank ollama[3099]: print_info: FIM MID token    = 151660 '<|fim_middle|>'
May 27 21:07:26 tank ollama[3099]: print_info: FIM PAD token    = 151662 '<|fim_pad|>'
May 27 21:07:26 tank ollama[3099]: print_info: FIM REP token    = 151663 '<|repo_name|>'
May 27 21:07:26 tank ollama[3099]: print_info: FIM SEP token    = 151664 '<|file_sep|>'
May 27 21:07:26 tank ollama[3099]: print_info: EOG token        = 151643 '<|endoftext|>'
May 27 21:07:26 tank ollama[3099]: print_info: EOG token        = 151645 '<|im_end|>'
May 27 21:07:26 tank ollama[3099]: print_info: EOG token        = 151662 '<|fim_pad|>'
May 27 21:07:26 tank ollama[3099]: print_info: EOG token        = 151663 '<|repo_name|>'
May 27 21:07:26 tank ollama[3099]: print_info: EOG token        = 151664 '<|file_sep|>'
May 27 21:07:26 tank ollama[3099]: print_info: max token length = 256
May 27 21:07:26 tank ollama[3099]: load_tensors: loading model tensors, this can take a while... (mmap = true)
May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.836-05:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server loading model"
May 27 21:07:30 tank ollama[3099]: load_tensors:   CPU_Mapped model buffer size =  8840.78 MiB
May 27 21:07:30 tank ollama[3099]: llama_context: constructing llama_context
May 27 21:07:30 tank ollama[3099]: llama_context: n_seq_max     = 2
May 27 21:07:30 tank ollama[3099]: llama_context: n_ctx         = 8192
May 27 21:07:30 tank ollama[3099]: llama_context: n_ctx_per_seq = 4096
May 27 21:07:30 tank ollama[3099]: llama_context: n_batch       = 1024
May 27 21:07:30 tank ollama[3099]: llama_context: n_ubatch      = 512
May 27 21:07:30 tank ollama[3099]: llama_context: causal_attn   = 1
May 27 21:07:30 tank ollama[3099]: llama_context: flash_attn    = 0
May 27 21:07:30 tank ollama[3099]: llama_context: freq_base     = 1000000.0
May 27 21:07:30 tank ollama[3099]: llama_context: freq_scale    = 1
May 27 21:07:30 tank ollama[3099]: llama_context: n_ctx_per_seq (4096) < n_ctx_train (40960) -- the full capacity of the model will not be utilized
May 27 21:07:30 tank ollama[3099]: llama_context:        CPU  output buffer size =     1.20 MiB
May 27 21:07:30 tank ollama[3099]: llama_kv_cache_unified: kv_size = 8192, type_k = 'f16', type_v = 'f16', n_layer = 40, can_shift = 1, padding = 32
May 27 21:07:30 tank ollama[3099]: llama_kv_cache_unified:        CPU KV buffer size =  1280.00 MiB
May 27 21:07:30 tank ollama[3099]: llama_kv_cache_unified: KV self size  = 1280.00 MiB, K (f16):  640.00 MiB, V (f16):  640.00 MiB
May 27 21:07:30 tank ollama[3099]: llama_context:        CPU compute buffer size =   696.01 MiB
May 27 21:07:30 tank ollama[3099]: llama_context: graph nodes  = 1526
May 27 21:07:30 tank ollama[3099]: llama_context: graph splits = 1
May 27 21:07:30 tank ollama[3099]: time=2025-05-27T21:07:30.597-05:00 level=INFO source=server.go:630 msg="llama runner started in 4.01 seconds"
May 27 21:08:45 tank ollama[3099]: [GIN] 2025/05/27 - 21:08:45 | 200 |         1m19s |       127.0.0.1 | POST     "/api/chat"
May 27 21:13:50 tank ollama[3099]: time=2025-05-27T21:13:50.796-05:00 level=WARN source=sched.go:687 msg="gpu VRAM usage didn't recover within timeout" seconds=5.001224659 runner.size="11.2 GiB" runner.vram="11.2 GiB" runner.parallel=2 runner.pid=5382 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e
May 27 21:13:51 tank ollama[3099]: time=2025-05-27T21:13:51.045-05:00 level=WARN source=sched.go:687 msg="gpu VRAM usage didn't recover within timeout" seconds=5.250790349 runner.size="11.2 GiB" runner.vram="11.2 GiB" runner.parallel=2 runner.pid=5382 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e
May 27 21:13:51 tank ollama[3099]: time=2025-05-27T21:13:51.295-05:00 level=WARN source=sched.go:687 msg="gpu VRAM usage didn't recover within timeout" seconds=5.500354884 runner.size="11.2 GiB" runner.vram="11.2 GiB" runner.parallel=2 runner.pid=5382 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e

OS

Linux

GPU

AMD

CPU

AMD, Intel

Ollama version

0.7.1

Originally created by @JonJust on GitHub (May 28, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10883 ### What is the issue? Today, Ollama prompted me to update when I tried to run Qwen3:14b. I ran the oneline install command (`curl -fsSL https://ollama.com/install.sh | sh`), and during the install, it said "Nvidia GPU installed.", even though I run an AMD card. Now, whenever I try to run a model, it won't use my gfx card. Right before updating, I observed my GPU being utilized with **radeontop**. I watched the VRAM fill up, etc. Now, I can see that my Graphics card isn't being utilized at all in **radeontop**, but my cores all go to 100 in **htop** - even small models run painfully slow. I am certain that this behavior began immediately after running the install script. (And yes, I have ROCm installed.) I tried to force the backend, which didn't seem to work: `OLLAMA_GPU_BACKEND=rocm ollama run mymodel` I can also verify that my GPU is visible to ROCm, and I have no NVIDIA card in my machine. Here's what I get when I run `journalctl -u ollama --no-pager --follow --pager-end` ``` ... msg="looking for compatible GPUs" ... msg="no nvidia devices detected by library /usr/lib/x86_64-linux-gnu/libcuda.so.570.133.07" ... msg="amdgpu is supported" gpu=0 gpu_type=gfx1102 ... msg="inference compute" id=0 library=rocm variant="" compute=gfx1102 driver=6.12 name=1002:7480 total="16.0 GiB" available="15.8 GiB" ... msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e gpu=0 parallel=2 available=16404119552 required="11.2 GiB" ... msg="system memory" total="78.5 GiB" free="75.3 GiB" free_swap="8.0 GiB" ... msg=offload library=rocm layers.requested=-1 layers.model=41 layers.offload=41 layers.split="" memory.available="[15.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="11.2 GiB" memory.required.partial="11.2 GiB" memory.required.kv="1.2 GiB" memory.required.allocations="[11.2 GiB]" memory.weights.total="8.2 GiB" memory.weights.repeating="7.6 GiB" memory.weights.nonrepeating="608.6 MiB" memory.graph.full="1.0 GiB" memory.graph.partial="1.0 GiB" .... llama_model_loader: loaded meta data with 27 key-value pairs and 443 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e (version GGUF V3 (latest)) ``` So, from what I can tell, Ollama sees my GPU, identifies that there is enough VRAM to hold the model, identifies that rocm is installed, yet makes the conscious choice to try and run the model on my CPU instead.... Unless there something obvious I'm missing, I am pretty sure this is a bug? Neofetch: OS: Ubuntu 24.04.2 LTS x86_64 Host: B460MDS3HV2 -CF Kernel: 6.11.0-26-generic Uptime: 5 mins Packages: 3559 (dpkg), 5 (flatpak), 16 (snap) Shell: bash 5.2.21 Resolution: 2560x1440 DE: GNOME 46.0 WM: Mutter WM Theme: Adwaita Theme: Yaru-blue-dark [GTK2/3] Icons: Yaru-blue [GTK2/3] Terminal: gnome-terminal CPU: Intel i7-10700K (16) @ 5.100GHz GPU: AMD ATI Radeon RX 7600/7600 XT/7600M XT/7600S/7700S / PRO W7600 Memory: 4550MiB / 80346MiB ollama --version: ollama version is 0.7.1 ### Relevant log output ```shell May 27 21:06:38 tank systemd[1]: Started ollama.service - Ollama Service. May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.668-05:00 level=INFO source=routes.go:1205 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.677-05:00 level=INFO source=images.go:463 msg="total blobs: 37" May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.677-05:00 level=INFO source=images.go:470 msg="total unused blobs removed: 0" May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.677-05:00 level=INFO source=routes.go:1258 msg="Listening on 127.0.0.1:11434 (version 0.7.1)" May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.678-05:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.701-05:00 level=INFO source=gpu.go:602 msg="no nvidia devices detected by library /usr/lib/x86_64-linux-gnu/libcuda.so.570.133.07" May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.727-05:00 level=INFO source=amd_linux.go:386 msg="amdgpu is supported" gpu=0 gpu_type=gfx1102 May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.730-05:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=rocm variant="" compute=gfx1102 driver=6.12 name=1002:7480 total="16.0 GiB" available="15.8 GiB" May 27 21:07:12 tank ollama[3099]: [GIN] 2025/05/27 - 21:07:12 | 204 | 554.384µs | 127.0.0.1 | OPTIONS "/api/tags" May 27 21:07:12 tank ollama[3099]: [GIN] 2025/05/27 - 21:07:12 | 200 | 2.145944ms | 127.0.0.1 | GET "/api/tags" May 27 21:07:26 tank ollama[3099]: [GIN] 2025/05/27 - 21:07:26 | 204 | 15.502µs | 127.0.0.1 | OPTIONS "/api/chat" May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.392-05:00 level=INFO source=sched.go:788 msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e gpu=0 parallel=2 available=16404119552 required="11.2 GiB" May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.392-05:00 level=INFO source=server.go:135 msg="system memory" total="78.5 GiB" free="75.3 GiB" free_swap="8.0 GiB" May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.393-05:00 level=INFO source=server.go:168 msg=offload library=rocm layers.requested=-1 layers.model=41 layers.offload=41 layers.split="" memory.available="[15.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="11.2 GiB" memory.required.partial="11.2 GiB" memory.required.kv="1.2 GiB" memory.required.allocations="[11.2 GiB]" memory.weights.total="8.2 GiB" memory.weights.repeating="7.6 GiB" memory.weights.nonrepeating="608.6 MiB" memory.graph.full="1.0 GiB" memory.graph.partial="1.0 GiB" May 27 21:07:26 tank ollama[3099]: llama_model_loader: loaded meta data with 27 key-value pairs and 443 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e (version GGUF V3 (latest)) May 27 21:07:26 tank ollama[3099]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 0: general.architecture str = qwen3 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 1: general.type str = model May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 2: general.name str = Qwen3 14B May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 3: general.basename str = Qwen3 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 4: general.size_label str = 14B May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 5: qwen3.block_count u32 = 40 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 6: qwen3.context_length u32 = 40960 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 7: qwen3.embedding_length u32 = 5120 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 8: qwen3.feed_forward_length u32 = 17408 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 9: qwen3.attention.head_count u32 = 40 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 10: qwen3.attention.head_count_kv u32 = 8 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 11: qwen3.rope.freq_base f32 = 1000000.000000 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 12: qwen3.attention.layer_norm_rms_epsilon f32 = 0.000001 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 13: qwen3.attention.key_length u32 = 128 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 14: qwen3.attention.value_length u32 = 128 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 15: tokenizer.ggml.model str = gpt2 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 16: tokenizer.ggml.pre str = qwen2 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 17: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ... May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 19: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 151645 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 151643 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 22: tokenizer.ggml.bos_token_id u32 = 151643 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 23: tokenizer.ggml.add_bos_token bool = false May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 24: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 25: general.quantization_version u32 = 2 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 26: general.file_type u32 = 15 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type f32: 161 tensors May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type f16: 40 tensors May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type q4_K: 221 tensors May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type q6_K: 21 tensors May 27 21:07:26 tank ollama[3099]: print_info: file format = GGUF V3 (latest) May 27 21:07:26 tank ollama[3099]: print_info: file type = Q4_K - Medium May 27 21:07:26 tank ollama[3099]: print_info: file size = 8.63 GiB (5.02 BPW) May 27 21:07:26 tank ollama[3099]: load: special tokens cache size = 26 May 27 21:07:26 tank ollama[3099]: load: token to piece cache size = 0.9311 MB May 27 21:07:26 tank ollama[3099]: print_info: arch = qwen3 May 27 21:07:26 tank ollama[3099]: print_info: vocab_only = 1 May 27 21:07:26 tank ollama[3099]: print_info: model type = ?B May 27 21:07:26 tank ollama[3099]: print_info: model params = 14.77 B May 27 21:07:26 tank ollama[3099]: print_info: general.name = Qwen3 14B May 27 21:07:26 tank ollama[3099]: print_info: vocab type = BPE May 27 21:07:26 tank ollama[3099]: print_info: n_vocab = 151936 May 27 21:07:26 tank ollama[3099]: print_info: n_merges = 151387 May 27 21:07:26 tank ollama[3099]: print_info: BOS token = 151643 '<|endoftext|>' May 27 21:07:26 tank ollama[3099]: print_info: EOS token = 151645 '<|im_end|>' May 27 21:07:26 tank ollama[3099]: print_info: EOT token = 151645 '<|im_end|>' May 27 21:07:26 tank ollama[3099]: print_info: PAD token = 151643 '<|endoftext|>' May 27 21:07:26 tank ollama[3099]: print_info: LF token = 198 'Ċ' May 27 21:07:26 tank ollama[3099]: print_info: FIM PRE token = 151659 '<|fim_prefix|>' May 27 21:07:26 tank ollama[3099]: print_info: FIM SUF token = 151661 '<|fim_suffix|>' May 27 21:07:26 tank ollama[3099]: print_info: FIM MID token = 151660 '<|fim_middle|>' May 27 21:07:26 tank ollama[3099]: print_info: FIM PAD token = 151662 '<|fim_pad|>' May 27 21:07:26 tank ollama[3099]: print_info: FIM REP token = 151663 '<|repo_name|>' May 27 21:07:26 tank ollama[3099]: print_info: FIM SEP token = 151664 '<|file_sep|>' May 27 21:07:26 tank ollama[3099]: print_info: EOG token = 151643 '<|endoftext|>' May 27 21:07:26 tank ollama[3099]: print_info: EOG token = 151645 '<|im_end|>' May 27 21:07:26 tank ollama[3099]: print_info: EOG token = 151662 '<|fim_pad|>' May 27 21:07:26 tank ollama[3099]: print_info: EOG token = 151663 '<|repo_name|>' May 27 21:07:26 tank ollama[3099]: print_info: EOG token = 151664 '<|file_sep|>' May 27 21:07:26 tank ollama[3099]: print_info: max token length = 256 May 27 21:07:26 tank ollama[3099]: llama_model_load: vocab only - skipping tensors May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.584-05:00 level=INFO source=server.go:431 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e --ctx-size 8192 --batch-size 512 --n-gpu-layers 41 --threads 8 --parallel 2 --port 43613" May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.584-05:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.584-05:00 level=INFO source=server.go:591 msg="waiting for llama runner to start responding" May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.585-05:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server not responding" May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.591-05:00 level=INFO source=runner.go:815 msg="starting go runner" May 27 21:07:26 tank ollama[3099]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.601-05:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.602-05:00 level=INFO source=runner.go:874 msg="Server listening on 127.0.0.1:43613" May 27 21:07:26 tank ollama[3099]: llama_model_loader: loaded meta data with 27 key-value pairs and 443 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e (version GGUF V3 (latest)) May 27 21:07:26 tank ollama[3099]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 0: general.architecture str = qwen3 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 1: general.type str = model May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 2: general.name str = Qwen3 14B May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 3: general.basename str = Qwen3 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 4: general.size_label str = 14B May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 5: qwen3.block_count u32 = 40 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 6: qwen3.context_length u32 = 40960 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 7: qwen3.embedding_length u32 = 5120 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 8: qwen3.feed_forward_length u32 = 17408 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 9: qwen3.attention.head_count u32 = 40 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 10: qwen3.attention.head_count_kv u32 = 8 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 11: qwen3.rope.freq_base f32 = 1000000.000000 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 12: qwen3.attention.layer_norm_rms_epsilon f32 = 0.000001 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 13: qwen3.attention.key_length u32 = 128 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 14: qwen3.attention.value_length u32 = 128 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 15: tokenizer.ggml.model str = gpt2 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 16: tokenizer.ggml.pre str = qwen2 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 17: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ... May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 19: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 151645 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 151643 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 22: tokenizer.ggml.bos_token_id u32 = 151643 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 23: tokenizer.ggml.add_bos_token bool = false May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 24: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 25: general.quantization_version u32 = 2 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 26: general.file_type u32 = 15 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type f32: 161 tensors May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type f16: 40 tensors May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type q4_K: 221 tensors May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type q6_K: 21 tensors May 27 21:07:26 tank ollama[3099]: print_info: file format = GGUF V3 (latest) May 27 21:07:26 tank ollama[3099]: print_info: file type = Q4_K - Medium May 27 21:07:26 tank ollama[3099]: print_info: file size = 8.63 GiB (5.02 BPW) May 27 21:07:26 tank ollama[3099]: load: special tokens cache size = 26 May 27 21:07:26 tank ollama[3099]: load: token to piece cache size = 0.9311 MB May 27 21:07:26 tank ollama[3099]: print_info: arch = qwen3 May 27 21:07:26 tank ollama[3099]: print_info: vocab_only = 0 May 27 21:07:26 tank ollama[3099]: print_info: n_ctx_train = 40960 May 27 21:07:26 tank ollama[3099]: print_info: n_embd = 5120 May 27 21:07:26 tank ollama[3099]: print_info: n_layer = 40 May 27 21:07:26 tank ollama[3099]: print_info: n_head = 40 May 27 21:07:26 tank ollama[3099]: print_info: n_head_kv = 8 May 27 21:07:26 tank ollama[3099]: print_info: n_rot = 128 May 27 21:07:26 tank ollama[3099]: print_info: n_swa = 0 May 27 21:07:26 tank ollama[3099]: print_info: n_swa_pattern = 1 May 27 21:07:26 tank ollama[3099]: print_info: n_embd_head_k = 128 May 27 21:07:26 tank ollama[3099]: print_info: n_embd_head_v = 128 May 27 21:07:26 tank ollama[3099]: print_info: n_gqa = 5 May 27 21:07:26 tank ollama[3099]: print_info: n_embd_k_gqa = 1024 May 27 21:07:26 tank ollama[3099]: print_info: n_embd_v_gqa = 1024 May 27 21:07:26 tank ollama[3099]: print_info: f_norm_eps = 0.0e+00 May 27 21:07:26 tank ollama[3099]: print_info: f_norm_rms_eps = 1.0e-06 May 27 21:07:26 tank ollama[3099]: print_info: f_clamp_kqv = 0.0e+00 May 27 21:07:26 tank ollama[3099]: print_info: f_max_alibi_bias = 0.0e+00 May 27 21:07:26 tank ollama[3099]: print_info: f_logit_scale = 0.0e+00 May 27 21:07:26 tank ollama[3099]: print_info: f_attn_scale = 0.0e+00 May 27 21:07:26 tank ollama[3099]: print_info: n_ff = 17408 May 27 21:07:26 tank ollama[3099]: print_info: n_expert = 0 May 27 21:07:26 tank ollama[3099]: print_info: n_expert_used = 0 May 27 21:07:26 tank ollama[3099]: print_info: causal attn = 1 May 27 21:07:26 tank ollama[3099]: print_info: pooling type = 0 May 27 21:07:26 tank ollama[3099]: print_info: rope type = 2 May 27 21:07:26 tank ollama[3099]: print_info: rope scaling = linear May 27 21:07:26 tank ollama[3099]: print_info: freq_base_train = 1000000.0 May 27 21:07:26 tank ollama[3099]: print_info: freq_scale_train = 1 May 27 21:07:26 tank ollama[3099]: print_info: n_ctx_orig_yarn = 40960 May 27 21:07:26 tank ollama[3099]: print_info: rope_finetuned = unknown May 27 21:07:26 tank ollama[3099]: print_info: ssm_d_conv = 0 May 27 21:07:26 tank ollama[3099]: print_info: ssm_d_inner = 0 May 27 21:07:26 tank ollama[3099]: print_info: ssm_d_state = 0 May 27 21:07:26 tank ollama[3099]: print_info: ssm_dt_rank = 0 May 27 21:07:26 tank ollama[3099]: print_info: ssm_dt_b_c_rms = 0 May 27 21:07:26 tank ollama[3099]: print_info: model type = 14B May 27 21:07:26 tank ollama[3099]: print_info: model params = 14.77 B May 27 21:07:26 tank ollama[3099]: print_info: general.name = Qwen3 14B May 27 21:07:26 tank ollama[3099]: print_info: vocab type = BPE May 27 21:07:26 tank ollama[3099]: print_info: n_vocab = 151936 May 27 21:07:26 tank ollama[3099]: print_info: n_merges = 151387 May 27 21:07:26 tank ollama[3099]: print_info: BOS token = 151643 '<|endoftext|>' May 27 21:07:26 tank ollama[3099]: print_info: EOS token = 151645 '<|im_end|>' May 27 21:07:26 tank ollama[3099]: print_info: EOT token = 151645 '<|im_end|>' May 27 21:07:26 tank ollama[3099]: print_info: PAD token = 151643 '<|endoftext|>' May 27 21:07:26 tank ollama[3099]: print_info: LF token = 198 'Ċ' May 27 21:07:26 tank ollama[3099]: print_info: FIM PRE token = 151659 '<|fim_prefix|>' May 27 21:07:26 tank ollama[3099]: print_info: FIM SUF token = 151661 '<|fim_suffix|>' May 27 21:07:26 tank ollama[3099]: print_info: FIM MID token = 151660 '<|fim_middle|>' May 27 21:07:26 tank ollama[3099]: print_info: FIM PAD token = 151662 '<|fim_pad|>' May 27 21:07:26 tank ollama[3099]: print_info: FIM REP token = 151663 '<|repo_name|>' May 27 21:07:26 tank ollama[3099]: print_info: FIM SEP token = 151664 '<|file_sep|>' May 27 21:07:26 tank ollama[3099]: print_info: EOG token = 151643 '<|endoftext|>' May 27 21:07:26 tank ollama[3099]: print_info: EOG token = 151645 '<|im_end|>' May 27 21:07:26 tank ollama[3099]: print_info: EOG token = 151662 '<|fim_pad|>' May 27 21:07:26 tank ollama[3099]: print_info: EOG token = 151663 '<|repo_name|>' May 27 21:07:26 tank ollama[3099]: print_info: EOG token = 151664 '<|file_sep|>' May 27 21:07:26 tank ollama[3099]: print_info: max token length = 256 May 27 21:07:26 tank ollama[3099]: load_tensors: loading model tensors, this can take a while... (mmap = true) May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.836-05:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server loading model" May 27 21:07:30 tank ollama[3099]: load_tensors: CPU_Mapped model buffer size = 8840.78 MiB May 27 21:07:30 tank ollama[3099]: llama_context: constructing llama_context May 27 21:07:30 tank ollama[3099]: llama_context: n_seq_max = 2 May 27 21:07:30 tank ollama[3099]: llama_context: n_ctx = 8192 May 27 21:07:30 tank ollama[3099]: llama_context: n_ctx_per_seq = 4096 May 27 21:07:30 tank ollama[3099]: llama_context: n_batch = 1024 May 27 21:07:30 tank ollama[3099]: llama_context: n_ubatch = 512 May 27 21:07:30 tank ollama[3099]: llama_context: causal_attn = 1 May 27 21:07:30 tank ollama[3099]: llama_context: flash_attn = 0 May 27 21:07:30 tank ollama[3099]: llama_context: freq_base = 1000000.0 May 27 21:07:30 tank ollama[3099]: llama_context: freq_scale = 1 May 27 21:07:30 tank ollama[3099]: llama_context: n_ctx_per_seq (4096) < n_ctx_train (40960) -- the full capacity of the model will not be utilized May 27 21:07:30 tank ollama[3099]: llama_context: CPU output buffer size = 1.20 MiB May 27 21:07:30 tank ollama[3099]: llama_kv_cache_unified: kv_size = 8192, type_k = 'f16', type_v = 'f16', n_layer = 40, can_shift = 1, padding = 32 May 27 21:07:30 tank ollama[3099]: llama_kv_cache_unified: CPU KV buffer size = 1280.00 MiB May 27 21:07:30 tank ollama[3099]: llama_kv_cache_unified: KV self size = 1280.00 MiB, K (f16): 640.00 MiB, V (f16): 640.00 MiB May 27 21:07:30 tank ollama[3099]: llama_context: CPU compute buffer size = 696.01 MiB May 27 21:07:30 tank ollama[3099]: llama_context: graph nodes = 1526 May 27 21:07:30 tank ollama[3099]: llama_context: graph splits = 1 May 27 21:07:30 tank ollama[3099]: time=2025-05-27T21:07:30.597-05:00 level=INFO source=server.go:630 msg="llama runner started in 4.01 seconds" May 27 21:08:45 tank ollama[3099]: [GIN] 2025/05/27 - 21:08:45 | 200 | 1m19s | 127.0.0.1 | POST "/api/chat" May 27 21:13:50 tank ollama[3099]: time=2025-05-27T21:13:50.796-05:00 level=WARN source=sched.go:687 msg="gpu VRAM usage didn't recover within timeout" seconds=5.001224659 runner.size="11.2 GiB" runner.vram="11.2 GiB" runner.parallel=2 runner.pid=5382 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e May 27 21:13:51 tank ollama[3099]: time=2025-05-27T21:13:51.045-05:00 level=WARN source=sched.go:687 msg="gpu VRAM usage didn't recover within timeout" seconds=5.250790349 runner.size="11.2 GiB" runner.vram="11.2 GiB" runner.parallel=2 runner.pid=5382 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e May 27 21:13:51 tank ollama[3099]: time=2025-05-27T21:13:51.295-05:00 level=WARN source=sched.go:687 msg="gpu VRAM usage didn't recover within timeout" seconds=5.500354884 runner.size="11.2 GiB" runner.vram="11.2 GiB" runner.parallel=2 runner.pid=5382 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e ``` ### OS Linux ### GPU AMD ### CPU AMD, Intel ### Ollama version 0.7.1
GiteaMirror added the bug label 2026-05-04 17:28:52 -05:00
Author
Owner

@rick-github commented on GitHub (May 28, 2025):

May 27 21:07:26 tank ollama[3099]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.601-05:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)

No GPU backend found. It seems like the ROCm backends aren't installed. What do the following return:

lspci -d '1002:' | grep 'AMD'
sudo lshw -c display -numeric -disable network | grep 'vendor: .* \[1002\]'
<!-- gh-comment-id:2914744960 --> @rick-github commented on GitHub (May 28, 2025): ``` May 27 21:07:26 tank ollama[3099]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.601-05:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) ``` No GPU backend found. It seems like the ROCm backends aren't installed. What do the following return: ``` lspci -d '1002:' | grep 'AMD' sudo lshw -c display -numeric -disable network | grep 'vendor: .* \[1002\]' ```
Author
Owner

@JonJust commented on GitHub (May 28, 2025):

lspci -d '1002:' | grep 'AMD'
sudo lshw -c display -numeric -disable network | grep 'vendor: .* [1002]'
04:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev 12)
05:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch (rev 12)
06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 33 [Radeon RX 7600/7600 XT/7600M XT/7600S/7700S / PRO W7600] (rev c0)
06:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio
vendor: Advanced Micro Devices, Inc. [AMD/ATI] [1002]

I went through the steps on the AMD website to install/setup rocm a few days ago, and ollama was using the GPU just fine. After updating, it stopped working. I tried reinstalling, but it didn't seem to fix anything.

https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/quick-start.html

<!-- gh-comment-id:2916139510 --> @JonJust commented on GitHub (May 28, 2025): lspci -d '1002:' | grep 'AMD' sudo lshw -c display -numeric -disable network | grep 'vendor: .* \[1002\]' 04:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev 12) 05:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch (rev 12) 06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 33 [Radeon RX 7600/7600 XT/7600M XT/7600S/7700S / PRO W7600] (rev c0) 06:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio vendor: Advanced Micro Devices, Inc. [AMD/ATI] [1002] I went through the steps on the AMD website to install/setup rocm a few days ago, and ollama was using the GPU just fine. After updating, it stopped working. I tried reinstalling, but it didn't seem to fix anything. https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/quick-start.html
Author
Owner

@rick-github commented on GitHub (May 28, 2025):

The formatting makes it hard to read, but it looks like the lspci returned nothing.

<!-- gh-comment-id:2916153843 --> @rick-github commented on GitHub (May 28, 2025): The formatting makes it hard to read, but it looks like the `lspci` returned nothing.
Author
Owner

@JonJust commented on GitHub (May 28, 2025):

I can see how the formatting may be confusing; I copy/pasted from the terminal. lspci returns this block:

lspci -d '1002:' | grep 'AMD' 04:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev 12) 05:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch (rev 12) 06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 33 [Radeon RX 7600/7600 XT/7600M XT/7600S/7700S / PRO W7600] (rev c0) 06:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio

sudo lshw -c display -numeric -disable network | grep 'vendor: .* [1002]' returns:

vendor: Advanced Micro Devices, Inc. [AMD/ATI] [1002]

If I run rocminfo, I can see the card show up...


Agent 2


Name: gfx1102
Uuid: GPU-XX
Marketing Name: AMD Radeon™ RX 7600 XT
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 32(0x20) KB
L2: 2048(0x800) KB
Chip ID: 29824(0x7480)
ASIC Revision: 0(0x0)
Cacheline Size: 128(0x80)
Max Clock Freq. (MHz): 2539
BDFID: 1536
Internal Node ID: 1
Compute Unit: 32
SIMDs per CU: 2
Shader Engines: 2
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Memory Properties:
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 542
SDMA engine uCode:: 21
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1102
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
ISA 2
Name: amdgcn-amd-amdhsa--gfx11-generic
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***

The card also shows up if I run the smi:

/opt/rocm/bin/rocm-smi

======================================= ROCm System Management Interface =======================================
================================================= Concise Info =================================================
Device  Node  IDs              Temp    Power  Partitions          SCLK  MCLK    Fan  Perf  PwrCap  VRAM%  GPU%  
              (DID,     GUID)  (Edge)  (Avg)  (Mem, Compute, ID)                                                
================================================================================================================
0       1     0x7480,   41560  59.0°C  57.0W  N/A, N/A, 0         0Mhz  456Mhz  0%   auto  165.0W  6%     0%    
================================================================================================================
============================================= End of ROCm SMI Log ==============================================
<!-- gh-comment-id:2916185267 --> @JonJust commented on GitHub (May 28, 2025): I can see how the formatting may be confusing; I copy/pasted from the terminal. lspci returns this block: `lspci -d '1002:' | grep 'AMD' 04:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev 12) 05:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch (rev 12) 06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 33 [Radeon RX 7600/7600 XT/7600M XT/7600S/7700S / PRO W7600] (rev c0) 06:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio ` sudo lshw -c display -numeric -disable network | grep 'vendor: .* [1002]' returns: `vendor: Advanced Micro Devices, Inc. [AMD/ATI] [1002]` If I run rocminfo, I can see the card show up... ******* Agent 2 ******* Name: gfx1102 Uuid: GPU-XX Marketing Name: AMD Radeon™ RX 7600 XT Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 32(0x20) KB L2: 2048(0x800) KB Chip ID: 29824(0x7480) ASIC Revision: 0(0x0) Cacheline Size: 128(0x80) Max Clock Freq. (MHz): 2539 BDFID: 1536 Internal Node ID: 1 Compute Unit: 32 SIMDs per CU: 2 Shader Engines: 2 Shader Arrs. per Eng.: 2 WatchPts on Addr. Ranges:4 Coherent Host Access: FALSE Memory Properties: Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 32(0x20) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 542 SDMA engine uCode:: 21 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 16760832(0xffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 16760832(0xffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Recommended Granule:0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx1102 Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 ISA 2 Name: amdgcn-amd-amdhsa--gfx11-generic Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 *** Done *** The card also shows up if I run the smi: /opt/rocm/bin/rocm-smi ``` ======================================= ROCm System Management Interface ======================================= ================================================= Concise Info ================================================= Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU% (DID, GUID) (Edge) (Avg) (Mem, Compute, ID) ================================================================================================================ 0 1 0x7480, 41560 59.0°C 57.0W N/A, N/A, 0 0Mhz 456Mhz 0% auto 165.0W 6% 0% ================================================================================================================ ============================================= End of ROCm SMI Log ============================================== ```
Author
Owner

@JonJust commented on GitHub (May 28, 2025):

A ha! I fixed it.

Previously, I was using an Nvidia card on my machine, so I still had the Nvidia tools installed. When I switched over to my AMD card, I installed the ROCm drivers, and Ollama somehow picked up that I switched over to AMD. This was fine until I went to run the install.sh script again.

curl -fsSL https://ollama.com/install.sh | sh

Unfortunately, I didn't save the output, but there was a line in the installer output that said something about "NVIDIA GPU installed." I ran the script a few times while faffing about with my settings to see if reinstalling could fix it, but I always got the NVIDIA message.

I wound up completely uninstalling all NVIDIA-related stuff from my machine. I did an apt purge, and nuked the nvidia-smi, cuda, etc. After doing this, running the install command yields this:

>>> Cleaning up old version at /usr/local/lib/ollama
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
>>> Downloading Linux ROCm amd64 bundle
######################################################################## 100.0%
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
>>> AMD GPU ready.

Low and behold, I load up a model, and I see my GPU being used, tokens being output much faster, etc.

Looking at the installer script, I think I see the issue:

check_gpu() {
    # Look for devices based on vendor ID for NVIDIA and AMD
    case $1 in
        lspci)
            case $2 in
                nvidia) available lspci && lspci -d '10de:' | grep -q 'NVIDIA' || return 1 ;;
                amdgpu) available lspci && lspci -d '1002:' | grep -q 'AMD' || return 1 ;;
            esac ;;
        lshw)
            case $2 in
                nvidia) available lshw && $SUDO lshw -c display -numeric -disable network | grep -q 'vendor: .* \[10DE\]' || return 1 ;;
                amdgpu) available lshw && $SUDO lshw -c display -numeric -disable network | grep -q 'vendor: .* \[1002\]' || return 1 ;;
            esac ;;
        nvidia-smi) available nvidia-smi || return 1 ;;
    esac
}

if check_gpu nvidia-smi; then
    status "NVIDIA GPU installed."
    exit 0
fi

The first time the check_gpu() function is called, it only checks if nvidia-smi is installed. If it is installed, it assumes the user is using an NVIDIA card. This is problematic in the context of a user that has used an Nvidia card at some point, but switched to AMD. I would assume this case will become more common in the future, if AMD continues putting out higher-memory cards than Nvidia.

Potential fix:

if check_gpu nvidia-smi && check_gpu lspci nvidia; then
    status "NVIDIA GPU installed."
    exit 0
fi
<!-- gh-comment-id:2917689248 --> @JonJust commented on GitHub (May 28, 2025): A ha! I fixed it. Previously, I was using an Nvidia card on my machine, so I still had the Nvidia tools installed. When I switched over to my AMD card, I installed the ROCm drivers, and Ollama somehow picked up that I switched over to AMD. This was fine until I went to run the install.sh script again. `curl -fsSL https://ollama.com/install.sh | sh` Unfortunately, I didn't save the output, but there was a line in the installer output that said something about "NVIDIA GPU installed." I ran the script a few times while faffing about with my settings to see if reinstalling could fix it, but I always got the NVIDIA message. I wound up completely uninstalling all NVIDIA-related stuff from my machine. I did an apt purge, and nuked the nvidia-smi, cuda, etc. After doing this, running the install command yields this: ``` >>> Cleaning up old version at /usr/local/lib/ollama >>> Installing ollama to /usr/local >>> Downloading Linux amd64 bundle ######################################################################## 100.0% >>> Adding ollama user to render group... >>> Adding ollama user to video group... >>> Adding current user to ollama group... >>> Creating ollama systemd service... >>> Enabling and starting ollama service... >>> Downloading Linux ROCm amd64 bundle ######################################################################## 100.0% >>> The Ollama API is now available at 127.0.0.1:11434. >>> Install complete. Run "ollama" from the command line. >>> AMD GPU ready. ``` Low and behold, I load up a model, and I see my GPU being used, tokens being output much faster, etc. Looking at the installer script, I think I see the issue: ``` check_gpu() { # Look for devices based on vendor ID for NVIDIA and AMD case $1 in lspci) case $2 in nvidia) available lspci && lspci -d '10de:' | grep -q 'NVIDIA' || return 1 ;; amdgpu) available lspci && lspci -d '1002:' | grep -q 'AMD' || return 1 ;; esac ;; lshw) case $2 in nvidia) available lshw && $SUDO lshw -c display -numeric -disable network | grep -q 'vendor: .* \[10DE\]' || return 1 ;; amdgpu) available lshw && $SUDO lshw -c display -numeric -disable network | grep -q 'vendor: .* \[1002\]' || return 1 ;; esac ;; nvidia-smi) available nvidia-smi || return 1 ;; esac } if check_gpu nvidia-smi; then status "NVIDIA GPU installed." exit 0 fi ``` The first time the check_gpu() function is called, it only checks if nvidia-smi is installed. If it is installed, it assumes the user is using an NVIDIA card. This is problematic in the context of a user that has used an Nvidia card at some point, but switched to AMD. I would assume this case will become more common in the future, if AMD continues putting out higher-memory cards than Nvidia. Potential fix: ``` if check_gpu nvidia-smi && check_gpu lspci nvidia; then status "NVIDIA GPU installed." exit 0 fi ```
Author
Owner

@afrolino02 commented on GitHub (Jun 7, 2025):

You saved my life dude, i was thinking about "why nvidia if i have AMD CARD", and in my Arch setup was installed something with nvidia, thanks!!!

<!-- gh-comment-id:2953151857 --> @afrolino02 commented on GitHub (Jun 7, 2025): You saved my life dude, i was thinking about "why nvidia if i have AMD CARD", and in my Arch setup was installed something with nvidia, thanks!!!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#69210