[GH-ISSUE #10883] Updated Ollama, now it won't use my GPU #69210

New Issue

GiteaMirror · 2026-05-04T17:28:52-05:00

GiteaMirror commented

2026-05-04 17:28:52 -05:00

Originally created by @JonJust on GitHub (May 28, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10883

What is the issue?

Today, Ollama prompted me to update when I tried to run Qwen3:14b. I ran the oneline install command (curl -fsSL https://ollama.com/install.sh | sh), and during the install, it said "Nvidia GPU installed.", even though I run an AMD card. Now, whenever I try to run a model, it won't use my gfx card.

Right before updating, I observed my GPU being utilized with radeontop. I watched the VRAM fill up, etc. Now, I can see that my Graphics card isn't being utilized at all in radeontop, but my cores all go to 100 in htop - even small models run painfully slow. I am certain that this behavior began immediately after running the install script. (And yes, I have ROCm installed.)

I tried to force the backend, which didn't seem to work:

OLLAMA_GPU_BACKEND=rocm ollama run mymodel

I can also verify that my GPU is visible to ROCm, and I have no NVIDIA card in my machine.

Here's what I get when I run journalctl -u ollama --no-pager --follow --pager-end

... msg="looking for compatible GPUs"
... msg="no nvidia devices detected by library /usr/lib/x86_64-linux-gnu/libcuda.so.570.133.07"
... msg="amdgpu is supported" gpu=0 gpu_type=gfx1102
... msg="inference compute" id=0 library=rocm variant="" compute=gfx1102 driver=6.12 name=1002:7480 total="16.0 GiB" available="15.8 GiB"
... msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e gpu=0 parallel=2 available=16404119552 required="11.2 GiB"
... msg="system memory" total="78.5 GiB" free="75.3 GiB" free_swap="8.0 GiB"
... msg=offload library=rocm layers.requested=-1 layers.model=41 layers.offload=41 layers.split="" memory.available="[15.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="11.2 GiB" memory.required.partial="11.2 GiB" memory.required.kv="1.2 GiB" memory.required.allocations="[11.2 GiB]" memory.weights.total="8.2 GiB" memory.weights.repeating="7.6 GiB" memory.weights.nonrepeating="608.6 MiB" memory.graph.full="1.0 GiB" memory.graph.partial="1.0 GiB"
.... llama_model_loader: loaded meta data with 27 key-value pairs and 443 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e (version GGUF V3 (latest))

So, from what I can tell, Ollama sees my GPU, identifies that there is enough VRAM to hold the model, identifies that rocm is installed, yet makes the conscious choice to try and run the model on my CPU instead.... Unless there something obvious I'm missing, I am pretty sure this is a bug?

Neofetch:

OS: Ubuntu 24.04.2 LTS x86_64
Host: B460MDS3HV2 -CF
Kernel: 6.11.0-26-generic
Uptime: 5 mins
Packages: 3559 (dpkg), 5 (flatpak), 16 (snap)
Shell: bash 5.2.21
Resolution: 2560x1440
DE: GNOME 46.0
WM: Mutter
WM Theme: Adwaita
Theme: Yaru-blue-dark [GTK2/3]
Icons: Yaru-blue [GTK2/3]
Terminal: gnome-terminal
CPU: Intel i7-10700K (16) @ 5.100GHz
GPU: AMD ATI Radeon RX 7600/7600 XT/7600M XT/7600S/7700S / PRO W7600
Memory: 4550MiB / 80346MiB

ollama --version:
ollama version is 0.7.1

Relevant log output

May 27 21:06:38 tank systemd[1]: Started ollama.service - Ollama Service.
May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.668-05:00 level=INFO source=routes.go:1205 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.677-05:00 level=INFO source=images.go:463 msg="total blobs: 37"
May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.677-05:00 level=INFO source=images.go:470 msg="total unused blobs removed: 0"
May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.677-05:00 level=INFO source=routes.go:1258 msg="Listening on 127.0.0.1:11434 (version 0.7.1)"
May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.678-05:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.701-05:00 level=INFO source=gpu.go:602 msg="no nvidia devices detected by library /usr/lib/x86_64-linux-gnu/libcuda.so.570.133.07"
May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.727-05:00 level=INFO source=amd_linux.go:386 msg="amdgpu is supported" gpu=0 gpu_type=gfx1102
May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.730-05:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=rocm variant="" compute=gfx1102 driver=6.12 name=1002:7480 total="16.0 GiB" available="15.8 GiB"
May 27 21:07:12 tank ollama[3099]: [GIN] 2025/05/27 - 21:07:12 | 204 |     554.384µs |       127.0.0.1 | OPTIONS  "/api/tags"
May 27 21:07:12 tank ollama[3099]: [GIN] 2025/05/27 - 21:07:12 | 200 |    2.145944ms |       127.0.0.1 | GET      "/api/tags"
May 27 21:07:26 tank ollama[3099]: [GIN] 2025/05/27 - 21:07:26 | 204 |      15.502µs |       127.0.0.1 | OPTIONS  "/api/chat"
May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.392-05:00 level=INFO source=sched.go:788 msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e gpu=0 parallel=2 available=16404119552 required="11.2 GiB"
May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.392-05:00 level=INFO source=server.go:135 msg="system memory" total="78.5 GiB" free="75.3 GiB" free_swap="8.0 GiB"
May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.393-05:00 level=INFO source=server.go:168 msg=offload library=rocm layers.requested=-1 layers.model=41 layers.offload=41 layers.split="" memory.available="[15.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="11.2 GiB" memory.required.partial="11.2 GiB" memory.required.kv="1.2 GiB" memory.required.allocations="[11.2 GiB]" memory.weights.total="8.2 GiB" memory.weights.repeating="7.6 GiB" memory.weights.nonrepeating="608.6 MiB" memory.graph.full="1.0 GiB" memory.graph.partial="1.0 GiB"
May 27 21:07:26 tank ollama[3099]: llama_model_loader: loaded meta data with 27 key-value pairs and 443 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e (version GGUF V3 (latest))
May 27 21:07:26 tank ollama[3099]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   0:                       general.architecture str              = qwen3
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   1:                               general.type str              = model
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   2:                               general.name str              = Qwen3 14B
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   3:                           general.basename str              = Qwen3
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   4:                         general.size_label str              = 14B
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   5:                          qwen3.block_count u32              = 40
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   6:                       qwen3.context_length u32              = 40960
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   7:                     qwen3.embedding_length u32              = 5120
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   8:                  qwen3.feed_forward_length u32              = 17408
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   9:                 qwen3.attention.head_count u32              = 40
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  10:              qwen3.attention.head_count_kv u32              = 8
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  11:                       qwen3.rope.freq_base f32              = 1000000.000000
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  12:     qwen3.attention.layer_norm_rms_epsilon f32              = 0.000001
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  13:                 qwen3.attention.key_length u32              = 128
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  14:               qwen3.attention.value_length u32              = 128
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = gpt2
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  16:                         tokenizer.ggml.pre str              = qwen2
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  17:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  19:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  20:                tokenizer.ggml.eos_token_id u32              = 151645
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 151643
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  22:                tokenizer.ggml.bos_token_id u32              = 151643
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  23:               tokenizer.ggml.add_bos_token bool             = false
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  24:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  25:               general.quantization_version u32              = 2
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  26:                          general.file_type u32              = 15
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type  f32:  161 tensors
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type  f16:   40 tensors
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type q4_K:  221 tensors
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type q6_K:   21 tensors
May 27 21:07:26 tank ollama[3099]: print_info: file format = GGUF V3 (latest)
May 27 21:07:26 tank ollama[3099]: print_info: file type   = Q4_K - Medium
May 27 21:07:26 tank ollama[3099]: print_info: file size   = 8.63 GiB (5.02 BPW)
May 27 21:07:26 tank ollama[3099]: load: special tokens cache size = 26
May 27 21:07:26 tank ollama[3099]: load: token to piece cache size = 0.9311 MB
May 27 21:07:26 tank ollama[3099]: print_info: arch             = qwen3
May 27 21:07:26 tank ollama[3099]: print_info: vocab_only       = 1
May 27 21:07:26 tank ollama[3099]: print_info: model type       = ?B
May 27 21:07:26 tank ollama[3099]: print_info: model params     = 14.77 B
May 27 21:07:26 tank ollama[3099]: print_info: general.name     = Qwen3 14B
May 27 21:07:26 tank ollama[3099]: print_info: vocab type       = BPE
May 27 21:07:26 tank ollama[3099]: print_info: n_vocab          = 151936
May 27 21:07:26 tank ollama[3099]: print_info: n_merges         = 151387
May 27 21:07:26 tank ollama[3099]: print_info: BOS token        = 151643 '<|endoftext|>'
May 27 21:07:26 tank ollama[3099]: print_info: EOS token        = 151645 '<|im_end|>'
May 27 21:07:26 tank ollama[3099]: print_info: EOT token        = 151645 '<|im_end|>'
May 27 21:07:26 tank ollama[3099]: print_info: PAD token        = 151643 '<|endoftext|>'
May 27 21:07:26 tank ollama[3099]: print_info: LF token         = 198 'Ċ'
May 27 21:07:26 tank ollama[3099]: print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
May 27 21:07:26 tank ollama[3099]: print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
May 27 21:07:26 tank ollama[3099]: print_info: FIM MID token    = 151660 '<|fim_middle|>'
May 27 21:07:26 tank ollama[3099]: print_info: FIM PAD token    = 151662 '<|fim_pad|>'
May 27 21:07:26 tank ollama[3099]: print_info: FIM REP token    = 151663 '<|repo_name|>'
May 27 21:07:26 tank ollama[3099]: print_info: FIM SEP token    = 151664 '<|file_sep|>'
May 27 21:07:26 tank ollama[3099]: print_info: EOG token        = 151643 '<|endoftext|>'
May 27 21:07:26 tank ollama[3099]: print_info: EOG token        = 151645 '<|im_end|>'
May 27 21:07:26 tank ollama[3099]: print_info: EOG token        = 151662 '<|fim_pad|>'
May 27 21:07:26 tank ollama[3099]: print_info: EOG token        = 151663 '<|repo_name|>'
May 27 21:07:26 tank ollama[3099]: print_info: EOG token        = 151664 '<|file_sep|>'
May 27 21:07:26 tank ollama[3099]: print_info: max token length = 256
May 27 21:07:26 tank ollama[3099]: llama_model_load: vocab only - skipping tensors
May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.584-05:00 level=INFO source=server.go:431 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e --ctx-size 8192 --batch-size 512 --n-gpu-layers 41 --threads 8 --parallel 2 --port 43613"
May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.584-05:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.584-05:00 level=INFO source=server.go:591 msg="waiting for llama runner to start responding"
May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.585-05:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server not responding"
May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.591-05:00 level=INFO source=runner.go:815 msg="starting go runner"
May 27 21:07:26 tank ollama[3099]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.601-05:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.602-05:00 level=INFO source=runner.go:874 msg="Server listening on 127.0.0.1:43613"
May 27 21:07:26 tank ollama[3099]: llama_model_loader: loaded meta data with 27 key-value pairs and 443 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e (version GGUF V3 (latest))
May 27 21:07:26 tank ollama[3099]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   0:                       general.architecture str              = qwen3
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   1:                               general.type str              = model
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   2:                               general.name str              = Qwen3 14B
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   3:                           general.basename str              = Qwen3
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   4:                         general.size_label str              = 14B
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   5:                          qwen3.block_count u32              = 40
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   6:                       qwen3.context_length u32              = 40960
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   7:                     qwen3.embedding_length u32              = 5120
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   8:                  qwen3.feed_forward_length u32              = 17408
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv   9:                 qwen3.attention.head_count u32              = 40
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  10:              qwen3.attention.head_count_kv u32              = 8
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  11:                       qwen3.rope.freq_base f32              = 1000000.000000
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  12:     qwen3.attention.layer_norm_rms_epsilon f32              = 0.000001
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  13:                 qwen3.attention.key_length u32              = 128
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  14:               qwen3.attention.value_length u32              = 128
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = gpt2
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  16:                         tokenizer.ggml.pre str              = qwen2
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  17:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  19:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  20:                tokenizer.ggml.eos_token_id u32              = 151645
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 151643
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  22:                tokenizer.ggml.bos_token_id u32              = 151643
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  23:               tokenizer.ggml.add_bos_token bool             = false
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  24:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  25:               general.quantization_version u32              = 2
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv  26:                          general.file_type u32              = 15
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type  f32:  161 tensors
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type  f16:   40 tensors
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type q4_K:  221 tensors
May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type q6_K:   21 tensors
May 27 21:07:26 tank ollama[3099]: print_info: file format = GGUF V3 (latest)
May 27 21:07:26 tank ollama[3099]: print_info: file type   = Q4_K - Medium
May 27 21:07:26 tank ollama[3099]: print_info: file size   = 8.63 GiB (5.02 BPW)
May 27 21:07:26 tank ollama[3099]: load: special tokens cache size = 26
May 27 21:07:26 tank ollama[3099]: load: token to piece cache size = 0.9311 MB
May 27 21:07:26 tank ollama[3099]: print_info: arch             = qwen3
May 27 21:07:26 tank ollama[3099]: print_info: vocab_only       = 0
May 27 21:07:26 tank ollama[3099]: print_info: n_ctx_train      = 40960
May 27 21:07:26 tank ollama[3099]: print_info: n_embd           = 5120
May 27 21:07:26 tank ollama[3099]: print_info: n_layer          = 40
May 27 21:07:26 tank ollama[3099]: print_info: n_head           = 40
May 27 21:07:26 tank ollama[3099]: print_info: n_head_kv        = 8
May 27 21:07:26 tank ollama[3099]: print_info: n_rot            = 128
May 27 21:07:26 tank ollama[3099]: print_info: n_swa            = 0
May 27 21:07:26 tank ollama[3099]: print_info: n_swa_pattern    = 1
May 27 21:07:26 tank ollama[3099]: print_info: n_embd_head_k    = 128
May 27 21:07:26 tank ollama[3099]: print_info: n_embd_head_v    = 128
May 27 21:07:26 tank ollama[3099]: print_info: n_gqa            = 5
May 27 21:07:26 tank ollama[3099]: print_info: n_embd_k_gqa     = 1024
May 27 21:07:26 tank ollama[3099]: print_info: n_embd_v_gqa     = 1024
May 27 21:07:26 tank ollama[3099]: print_info: f_norm_eps       = 0.0e+00
May 27 21:07:26 tank ollama[3099]: print_info: f_norm_rms_eps   = 1.0e-06
May 27 21:07:26 tank ollama[3099]: print_info: f_clamp_kqv      = 0.0e+00
May 27 21:07:26 tank ollama[3099]: print_info: f_max_alibi_bias = 0.0e+00
May 27 21:07:26 tank ollama[3099]: print_info: f_logit_scale    = 0.0e+00
May 27 21:07:26 tank ollama[3099]: print_info: f_attn_scale     = 0.0e+00
May 27 21:07:26 tank ollama[3099]: print_info: n_ff             = 17408
May 27 21:07:26 tank ollama[3099]: print_info: n_expert         = 0
May 27 21:07:26 tank ollama[3099]: print_info: n_expert_used    = 0
May 27 21:07:26 tank ollama[3099]: print_info: causal attn      = 1
May 27 21:07:26 tank ollama[3099]: print_info: pooling type     = 0
May 27 21:07:26 tank ollama[3099]: print_info: rope type        = 2
May 27 21:07:26 tank ollama[3099]: print_info: rope scaling     = linear
May 27 21:07:26 tank ollama[3099]: print_info: freq_base_train  = 1000000.0
May 27 21:07:26 tank ollama[3099]: print_info: freq_scale_train = 1
May 27 21:07:26 tank ollama[3099]: print_info: n_ctx_orig_yarn  = 40960
May 27 21:07:26 tank ollama[3099]: print_info: rope_finetuned   = unknown
May 27 21:07:26 tank ollama[3099]: print_info: ssm_d_conv       = 0
May 27 21:07:26 tank ollama[3099]: print_info: ssm_d_inner      = 0
May 27 21:07:26 tank ollama[3099]: print_info: ssm_d_state      = 0
May 27 21:07:26 tank ollama[3099]: print_info: ssm_dt_rank      = 0
May 27 21:07:26 tank ollama[3099]: print_info: ssm_dt_b_c_rms   = 0
May 27 21:07:26 tank ollama[3099]: print_info: model type       = 14B
May 27 21:07:26 tank ollama[3099]: print_info: model params     = 14.77 B
May 27 21:07:26 tank ollama[3099]: print_info: general.name     = Qwen3 14B
May 27 21:07:26 tank ollama[3099]: print_info: vocab type       = BPE
May 27 21:07:26 tank ollama[3099]: print_info: n_vocab          = 151936
May 27 21:07:26 tank ollama[3099]: print_info: n_merges         = 151387
May 27 21:07:26 tank ollama[3099]: print_info: BOS token        = 151643 '<|endoftext|>'
May 27 21:07:26 tank ollama[3099]: print_info: EOS token        = 151645 '<|im_end|>'
May 27 21:07:26 tank ollama[3099]: print_info: EOT token        = 151645 '<|im_end|>'
May 27 21:07:26 tank ollama[3099]: print_info: PAD token        = 151643 '<|endoftext|>'
May 27 21:07:26 tank ollama[3099]: print_info: LF token         = 198 'Ċ'
May 27 21:07:26 tank ollama[3099]: print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
May 27 21:07:26 tank ollama[3099]: print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
May 27 21:07:26 tank ollama[3099]: print_info: FIM MID token    = 151660 '<|fim_middle|>'
May 27 21:07:26 tank ollama[3099]: print_info: FIM PAD token    = 151662 '<|fim_pad|>'
May 27 21:07:26 tank ollama[3099]: print_info: FIM REP token    = 151663 '<|repo_name|>'
May 27 21:07:26 tank ollama[3099]: print_info: FIM SEP token    = 151664 '<|file_sep|>'
May 27 21:07:26 tank ollama[3099]: print_info: EOG token        = 151643 '<|endoftext|>'
May 27 21:07:26 tank ollama[3099]: print_info: EOG token        = 151645 '<|im_end|>'
May 27 21:07:26 tank ollama[3099]: print_info: EOG token        = 151662 '<|fim_pad|>'
May 27 21:07:26 tank ollama[3099]: print_info: EOG token        = 151663 '<|repo_name|>'
May 27 21:07:26 tank ollama[3099]: print_info: EOG token        = 151664 '<|file_sep|>'
May 27 21:07:26 tank ollama[3099]: print_info: max token length = 256
May 27 21:07:26 tank ollama[3099]: load_tensors: loading model tensors, this can take a while... (mmap = true)
May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.836-05:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server loading model"
May 27 21:07:30 tank ollama[3099]: load_tensors:   CPU_Mapped model buffer size =  8840.78 MiB
May 27 21:07:30 tank ollama[3099]: llama_context: constructing llama_context
May 27 21:07:30 tank ollama[3099]: llama_context: n_seq_max     = 2
May 27 21:07:30 tank ollama[3099]: llama_context: n_ctx         = 8192
May 27 21:07:30 tank ollama[3099]: llama_context: n_ctx_per_seq = 4096
May 27 21:07:30 tank ollama[3099]: llama_context: n_batch       = 1024
May 27 21:07:30 tank ollama[3099]: llama_context: n_ubatch      = 512
May 27 21:07:30 tank ollama[3099]: llama_context: causal_attn   = 1
May 27 21:07:30 tank ollama[3099]: llama_context: flash_attn    = 0
May 27 21:07:30 tank ollama[3099]: llama_context: freq_base     = 1000000.0
May 27 21:07:30 tank ollama[3099]: llama_context: freq_scale    = 1
May 27 21:07:30 tank ollama[3099]: llama_context: n_ctx_per_seq (4096) < n_ctx_train (40960) -- the full capacity of the model will not be utilized
May 27 21:07:30 tank ollama[3099]: llama_context:        CPU  output buffer size =     1.20 MiB
May 27 21:07:30 tank ollama[3099]: llama_kv_cache_unified: kv_size = 8192, type_k = 'f16', type_v = 'f16', n_layer = 40, can_shift = 1, padding = 32
May 27 21:07:30 tank ollama[3099]: llama_kv_cache_unified:        CPU KV buffer size =  1280.00 MiB
May 27 21:07:30 tank ollama[3099]: llama_kv_cache_unified: KV self size  = 1280.00 MiB, K (f16):  640.00 MiB, V (f16):  640.00 MiB
May 27 21:07:30 tank ollama[3099]: llama_context:        CPU compute buffer size =   696.01 MiB
May 27 21:07:30 tank ollama[3099]: llama_context: graph nodes  = 1526
May 27 21:07:30 tank ollama[3099]: llama_context: graph splits = 1
May 27 21:07:30 tank ollama[3099]: time=2025-05-27T21:07:30.597-05:00 level=INFO source=server.go:630 msg="llama runner started in 4.01 seconds"
May 27 21:08:45 tank ollama[3099]: [GIN] 2025/05/27 - 21:08:45 | 200 |         1m19s |       127.0.0.1 | POST     "/api/chat"
May 27 21:13:50 tank ollama[3099]: time=2025-05-27T21:13:50.796-05:00 level=WARN source=sched.go:687 msg="gpu VRAM usage didn't recover within timeout" seconds=5.001224659 runner.size="11.2 GiB" runner.vram="11.2 GiB" runner.parallel=2 runner.pid=5382 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e
May 27 21:13:51 tank ollama[3099]: time=2025-05-27T21:13:51.045-05:00 level=WARN source=sched.go:687 msg="gpu VRAM usage didn't recover within timeout" seconds=5.250790349 runner.size="11.2 GiB" runner.vram="11.2 GiB" runner.parallel=2 runner.pid=5382 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e
May 27 21:13:51 tank ollama[3099]: time=2025-05-27T21:13:51.295-05:00 level=WARN source=sched.go:687 msg="gpu VRAM usage didn't recover within timeout" seconds=5.500354884 runner.size="11.2 GiB" runner.vram="11.2 GiB" runner.parallel=2 runner.pid=5382 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e

OS

Linux

GPU

AMD

CPU

AMD, Intel

Ollama version

0.7.1

Originally created by @JonJust on GitHub (May 28, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10883 ### What is the issue? Today, Ollama prompted me to update when I tried to run Qwen3:14b. I ran the oneline install command (`curl -fsSL https://ollama.com/install.sh | sh`), and during the install, it said "Nvidia GPU installed.", even though I run an AMD card. Now, whenever I try to run a model, it won't use my gfx card. Right before updating, I observed my GPU being utilized with **radeontop**. I watched the VRAM fill up, etc. Now, I can see that my Graphics card isn't being utilized at all in **radeontop**, but my cores all go to 100 in **htop** - even small models run painfully slow. I am certain that this behavior began immediately after running the install script. (And yes, I have ROCm installed.) I tried to force the backend, which didn't seem to work: `OLLAMA_GPU_BACKEND=rocm ollama run mymodel` I can also verify that my GPU is visible to ROCm, and I have no NVIDIA card in my machine. Here's what I get when I run `journalctl -u ollama --no-pager --follow --pager-end` ``` ... msg="looking for compatible GPUs" ... msg="no nvidia devices detected by library /usr/lib/x86_64-linux-gnu/libcuda.so.570.133.07" ... msg="amdgpu is supported" gpu=0 gpu_type=gfx1102 ... msg="inference compute" id=0 library=rocm variant="" compute=gfx1102 driver=6.12 name=1002:7480 total="16.0 GiB" available="15.8 GiB" ... msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e gpu=0 parallel=2 available=16404119552 required="11.2 GiB" ... msg="system memory" total="78.5 GiB" free="75.3 GiB" free_swap="8.0 GiB" ... msg=offload library=rocm layers.requested=-1 layers.model=41 layers.offload=41 layers.split="" memory.available="[15.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="11.2 GiB" memory.required.partial="11.2 GiB" memory.required.kv="1.2 GiB" memory.required.allocations="[11.2 GiB]" memory.weights.total="8.2 GiB" memory.weights.repeating="7.6 GiB" memory.weights.nonrepeating="608.6 MiB" memory.graph.full="1.0 GiB" memory.graph.partial="1.0 GiB" .... llama_model_loader: loaded meta data with 27 key-value pairs and 443 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e (version GGUF V3 (latest)) ``` So, from what I can tell, Ollama sees my GPU, identifies that there is enough VRAM to hold the model, identifies that rocm is installed, yet makes the conscious choice to try and run the model on my CPU instead.... Unless there something obvious I'm missing, I am pretty sure this is a bug? Neofetch: OS: Ubuntu 24.04.2 LTS x86_64 Host: B460MDS3HV2 -CF Kernel: 6.11.0-26-generic Uptime: 5 mins Packages: 3559 (dpkg), 5 (flatpak), 16 (snap) Shell: bash 5.2.21 Resolution: 2560x1440 DE: GNOME 46.0 WM: Mutter WM Theme: Adwaita Theme: Yaru-blue-dark [GTK2/3] Icons: Yaru-blue [GTK2/3] Terminal: gnome-terminal CPU: Intel i7-10700K (16) @ 5.100GHz GPU: AMD ATI Radeon RX 7600/7600 XT/7600M XT/7600S/7700S / PRO W7600 Memory: 4550MiB / 80346MiB ollama --version: ollama version is 0.7.1 ### Relevant log output ```shell May 27 21:06:38 tank systemd[1]: Started ollama.service - Ollama Service. May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.668-05:00 level=INFO source=routes.go:1205 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/usr/share/ollama/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.677-05:00 level=INFO source=images.go:463 msg="total blobs: 37" May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.677-05:00 level=INFO source=images.go:470 msg="total unused blobs removed: 0" May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.677-05:00 level=INFO source=routes.go:1258 msg="Listening on 127.0.0.1:11434 (version 0.7.1)" May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.678-05:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.701-05:00 level=INFO source=gpu.go:602 msg="no nvidia devices detected by library /usr/lib/x86_64-linux-gnu/libcuda.so.570.133.07" May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.727-05:00 level=INFO source=amd_linux.go:386 msg="amdgpu is supported" gpu=0 gpu_type=gfx1102 May 27 21:06:38 tank ollama[3099]: time=2025-05-27T21:06:38.730-05:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=rocm variant="" compute=gfx1102 driver=6.12 name=1002:7480 total="16.0 GiB" available="15.8 GiB" May 27 21:07:12 tank ollama[3099]: [GIN] 2025/05/27 - 21:07:12 | 204 | 554.384µs | 127.0.0.1 | OPTIONS "/api/tags" May 27 21:07:12 tank ollama[3099]: [GIN] 2025/05/27 - 21:07:12 | 200 | 2.145944ms | 127.0.0.1 | GET "/api/tags" May 27 21:07:26 tank ollama[3099]: [GIN] 2025/05/27 - 21:07:26 | 204 | 15.502µs | 127.0.0.1 | OPTIONS "/api/chat" May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.392-05:00 level=INFO source=sched.go:788 msg="new model will fit in available VRAM in single GPU, loading" model=/usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e gpu=0 parallel=2 available=16404119552 required="11.2 GiB" May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.392-05:00 level=INFO source=server.go:135 msg="system memory" total="78.5 GiB" free="75.3 GiB" free_swap="8.0 GiB" May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.393-05:00 level=INFO source=server.go:168 msg=offload library=rocm layers.requested=-1 layers.model=41 layers.offload=41 layers.split="" memory.available="[15.3 GiB]" memory.gpu_overhead="0 B" memory.required.full="11.2 GiB" memory.required.partial="11.2 GiB" memory.required.kv="1.2 GiB" memory.required.allocations="[11.2 GiB]" memory.weights.total="8.2 GiB" memory.weights.repeating="7.6 GiB" memory.weights.nonrepeating="608.6 MiB" memory.graph.full="1.0 GiB" memory.graph.partial="1.0 GiB" May 27 21:07:26 tank ollama[3099]: llama_model_loader: loaded meta data with 27 key-value pairs and 443 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e (version GGUF V3 (latest)) May 27 21:07:26 tank ollama[3099]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 0: general.architecture str = qwen3 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 1: general.type str = model May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 2: general.name str = Qwen3 14B May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 3: general.basename str = Qwen3 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 4: general.size_label str = 14B May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 5: qwen3.block_count u32 = 40 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 6: qwen3.context_length u32 = 40960 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 7: qwen3.embedding_length u32 = 5120 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 8: qwen3.feed_forward_length u32 = 17408 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 9: qwen3.attention.head_count u32 = 40 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 10: qwen3.attention.head_count_kv u32 = 8 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 11: qwen3.rope.freq_base f32 = 1000000.000000 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 12: qwen3.attention.layer_norm_rms_epsilon f32 = 0.000001 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 13: qwen3.attention.key_length u32 = 128 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 14: qwen3.attention.value_length u32 = 128 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 15: tokenizer.ggml.model str = gpt2 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 16: tokenizer.ggml.pre str = qwen2 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 17: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ... May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 19: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 151645 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 151643 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 22: tokenizer.ggml.bos_token_id u32 = 151643 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 23: tokenizer.ggml.add_bos_token bool = false May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 24: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 25: general.quantization_version u32 = 2 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 26: general.file_type u32 = 15 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type f32: 161 tensors May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type f16: 40 tensors May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type q4_K: 221 tensors May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type q6_K: 21 tensors May 27 21:07:26 tank ollama[3099]: print_info: file format = GGUF V3 (latest) May 27 21:07:26 tank ollama[3099]: print_info: file type = Q4_K - Medium May 27 21:07:26 tank ollama[3099]: print_info: file size = 8.63 GiB (5.02 BPW) May 27 21:07:26 tank ollama[3099]: load: special tokens cache size = 26 May 27 21:07:26 tank ollama[3099]: load: token to piece cache size = 0.9311 MB May 27 21:07:26 tank ollama[3099]: print_info: arch = qwen3 May 27 21:07:26 tank ollama[3099]: print_info: vocab_only = 1 May 27 21:07:26 tank ollama[3099]: print_info: model type = ?B May 27 21:07:26 tank ollama[3099]: print_info: model params = 14.77 B May 27 21:07:26 tank ollama[3099]: print_info: general.name = Qwen3 14B May 27 21:07:26 tank ollama[3099]: print_info: vocab type = BPE May 27 21:07:26 tank ollama[3099]: print_info: n_vocab = 151936 May 27 21:07:26 tank ollama[3099]: print_info: n_merges = 151387 May 27 21:07:26 tank ollama[3099]: print_info: BOS token = 151643 '<|endoftext|>' May 27 21:07:26 tank ollama[3099]: print_info: EOS token = 151645 '<|im_end|>' May 27 21:07:26 tank ollama[3099]: print_info: EOT token = 151645 '<|im_end|>' May 27 21:07:26 tank ollama[3099]: print_info: PAD token = 151643 '<|endoftext|>' May 27 21:07:26 tank ollama[3099]: print_info: LF token = 198 'Ċ' May 27 21:07:26 tank ollama[3099]: print_info: FIM PRE token = 151659 '<|fim_prefix|>' May 27 21:07:26 tank ollama[3099]: print_info: FIM SUF token = 151661 '<|fim_suffix|>' May 27 21:07:26 tank ollama[3099]: print_info: FIM MID token = 151660 '<|fim_middle|>' May 27 21:07:26 tank ollama[3099]: print_info: FIM PAD token = 151662 '<|fim_pad|>' May 27 21:07:26 tank ollama[3099]: print_info: FIM REP token = 151663 '<|repo_name|>' May 27 21:07:26 tank ollama[3099]: print_info: FIM SEP token = 151664 '<|file_sep|>' May 27 21:07:26 tank ollama[3099]: print_info: EOG token = 151643 '<|endoftext|>' May 27 21:07:26 tank ollama[3099]: print_info: EOG token = 151645 '<|im_end|>' May 27 21:07:26 tank ollama[3099]: print_info: EOG token = 151662 '<|fim_pad|>' May 27 21:07:26 tank ollama[3099]: print_info: EOG token = 151663 '<|repo_name|>' May 27 21:07:26 tank ollama[3099]: print_info: EOG token = 151664 '<|file_sep|>' May 27 21:07:26 tank ollama[3099]: print_info: max token length = 256 May 27 21:07:26 tank ollama[3099]: llama_model_load: vocab only - skipping tensors May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.584-05:00 level=INFO source=server.go:431 msg="starting llama server" cmd="/usr/local/bin/ollama runner --model /usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e --ctx-size 8192 --batch-size 512 --n-gpu-layers 41 --threads 8 --parallel 2 --port 43613" May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.584-05:00 level=INFO source=sched.go:483 msg="loaded runners" count=1 May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.584-05:00 level=INFO source=server.go:591 msg="waiting for llama runner to start responding" May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.585-05:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server not responding" May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.591-05:00 level=INFO source=runner.go:815 msg="starting go runner" May 27 21:07:26 tank ollama[3099]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.601-05:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.602-05:00 level=INFO source=runner.go:874 msg="Server listening on 127.0.0.1:43613" May 27 21:07:26 tank ollama[3099]: llama_model_loader: loaded meta data with 27 key-value pairs and 443 tensors from /usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e (version GGUF V3 (latest)) May 27 21:07:26 tank ollama[3099]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 0: general.architecture str = qwen3 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 1: general.type str = model May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 2: general.name str = Qwen3 14B May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 3: general.basename str = Qwen3 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 4: general.size_label str = 14B May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 5: qwen3.block_count u32 = 40 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 6: qwen3.context_length u32 = 40960 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 7: qwen3.embedding_length u32 = 5120 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 8: qwen3.feed_forward_length u32 = 17408 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 9: qwen3.attention.head_count u32 = 40 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 10: qwen3.attention.head_count_kv u32 = 8 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 11: qwen3.rope.freq_base f32 = 1000000.000000 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 12: qwen3.attention.layer_norm_rms_epsilon f32 = 0.000001 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 13: qwen3.attention.key_length u32 = 128 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 14: qwen3.attention.value_length u32 = 128 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 15: tokenizer.ggml.model str = gpt2 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 16: tokenizer.ggml.pre str = qwen2 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 17: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ... May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 19: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 151645 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 151643 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 22: tokenizer.ggml.bos_token_id u32 = 151643 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 23: tokenizer.ggml.add_bos_token bool = false May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 24: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 25: general.quantization_version u32 = 2 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - kv 26: general.file_type u32 = 15 May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type f32: 161 tensors May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type f16: 40 tensors May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type q4_K: 221 tensors May 27 21:07:26 tank ollama[3099]: llama_model_loader: - type q6_K: 21 tensors May 27 21:07:26 tank ollama[3099]: print_info: file format = GGUF V3 (latest) May 27 21:07:26 tank ollama[3099]: print_info: file type = Q4_K - Medium May 27 21:07:26 tank ollama[3099]: print_info: file size = 8.63 GiB (5.02 BPW) May 27 21:07:26 tank ollama[3099]: load: special tokens cache size = 26 May 27 21:07:26 tank ollama[3099]: load: token to piece cache size = 0.9311 MB May 27 21:07:26 tank ollama[3099]: print_info: arch = qwen3 May 27 21:07:26 tank ollama[3099]: print_info: vocab_only = 0 May 27 21:07:26 tank ollama[3099]: print_info: n_ctx_train = 40960 May 27 21:07:26 tank ollama[3099]: print_info: n_embd = 5120 May 27 21:07:26 tank ollama[3099]: print_info: n_layer = 40 May 27 21:07:26 tank ollama[3099]: print_info: n_head = 40 May 27 21:07:26 tank ollama[3099]: print_info: n_head_kv = 8 May 27 21:07:26 tank ollama[3099]: print_info: n_rot = 128 May 27 21:07:26 tank ollama[3099]: print_info: n_swa = 0 May 27 21:07:26 tank ollama[3099]: print_info: n_swa_pattern = 1 May 27 21:07:26 tank ollama[3099]: print_info: n_embd_head_k = 128 May 27 21:07:26 tank ollama[3099]: print_info: n_embd_head_v = 128 May 27 21:07:26 tank ollama[3099]: print_info: n_gqa = 5 May 27 21:07:26 tank ollama[3099]: print_info: n_embd_k_gqa = 1024 May 27 21:07:26 tank ollama[3099]: print_info: n_embd_v_gqa = 1024 May 27 21:07:26 tank ollama[3099]: print_info: f_norm_eps = 0.0e+00 May 27 21:07:26 tank ollama[3099]: print_info: f_norm_rms_eps = 1.0e-06 May 27 21:07:26 tank ollama[3099]: print_info: f_clamp_kqv = 0.0e+00 May 27 21:07:26 tank ollama[3099]: print_info: f_max_alibi_bias = 0.0e+00 May 27 21:07:26 tank ollama[3099]: print_info: f_logit_scale = 0.0e+00 May 27 21:07:26 tank ollama[3099]: print_info: f_attn_scale = 0.0e+00 May 27 21:07:26 tank ollama[3099]: print_info: n_ff = 17408 May 27 21:07:26 tank ollama[3099]: print_info: n_expert = 0 May 27 21:07:26 tank ollama[3099]: print_info: n_expert_used = 0 May 27 21:07:26 tank ollama[3099]: print_info: causal attn = 1 May 27 21:07:26 tank ollama[3099]: print_info: pooling type = 0 May 27 21:07:26 tank ollama[3099]: print_info: rope type = 2 May 27 21:07:26 tank ollama[3099]: print_info: rope scaling = linear May 27 21:07:26 tank ollama[3099]: print_info: freq_base_train = 1000000.0 May 27 21:07:26 tank ollama[3099]: print_info: freq_scale_train = 1 May 27 21:07:26 tank ollama[3099]: print_info: n_ctx_orig_yarn = 40960 May 27 21:07:26 tank ollama[3099]: print_info: rope_finetuned = unknown May 27 21:07:26 tank ollama[3099]: print_info: ssm_d_conv = 0 May 27 21:07:26 tank ollama[3099]: print_info: ssm_d_inner = 0 May 27 21:07:26 tank ollama[3099]: print_info: ssm_d_state = 0 May 27 21:07:26 tank ollama[3099]: print_info: ssm_dt_rank = 0 May 27 21:07:26 tank ollama[3099]: print_info: ssm_dt_b_c_rms = 0 May 27 21:07:26 tank ollama[3099]: print_info: model type = 14B May 27 21:07:26 tank ollama[3099]: print_info: model params = 14.77 B May 27 21:07:26 tank ollama[3099]: print_info: general.name = Qwen3 14B May 27 21:07:26 tank ollama[3099]: print_info: vocab type = BPE May 27 21:07:26 tank ollama[3099]: print_info: n_vocab = 151936 May 27 21:07:26 tank ollama[3099]: print_info: n_merges = 151387 May 27 21:07:26 tank ollama[3099]: print_info: BOS token = 151643 '<|endoftext|>' May 27 21:07:26 tank ollama[3099]: print_info: EOS token = 151645 '<|im_end|>' May 27 21:07:26 tank ollama[3099]: print_info: EOT token = 151645 '<|im_end|>' May 27 21:07:26 tank ollama[3099]: print_info: PAD token = 151643 '<|endoftext|>' May 27 21:07:26 tank ollama[3099]: print_info: LF token = 198 'Ċ' May 27 21:07:26 tank ollama[3099]: print_info: FIM PRE token = 151659 '<|fim_prefix|>' May 27 21:07:26 tank ollama[3099]: print_info: FIM SUF token = 151661 '<|fim_suffix|>' May 27 21:07:26 tank ollama[3099]: print_info: FIM MID token = 151660 '<|fim_middle|>' May 27 21:07:26 tank ollama[3099]: print_info: FIM PAD token = 151662 '<|fim_pad|>' May 27 21:07:26 tank ollama[3099]: print_info: FIM REP token = 151663 '<|repo_name|>' May 27 21:07:26 tank ollama[3099]: print_info: FIM SEP token = 151664 '<|file_sep|>' May 27 21:07:26 tank ollama[3099]: print_info: EOG token = 151643 '<|endoftext|>' May 27 21:07:26 tank ollama[3099]: print_info: EOG token = 151645 '<|im_end|>' May 27 21:07:26 tank ollama[3099]: print_info: EOG token = 151662 '<|fim_pad|>' May 27 21:07:26 tank ollama[3099]: print_info: EOG token = 151663 '<|repo_name|>' May 27 21:07:26 tank ollama[3099]: print_info: EOG token = 151664 '<|file_sep|>' May 27 21:07:26 tank ollama[3099]: print_info: max token length = 256 May 27 21:07:26 tank ollama[3099]: load_tensors: loading model tensors, this can take a while... (mmap = true) May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.836-05:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server loading model" May 27 21:07:30 tank ollama[3099]: load_tensors: CPU_Mapped model buffer size = 8840.78 MiB May 27 21:07:30 tank ollama[3099]: llama_context: constructing llama_context May 27 21:07:30 tank ollama[3099]: llama_context: n_seq_max = 2 May 27 21:07:30 tank ollama[3099]: llama_context: n_ctx = 8192 May 27 21:07:30 tank ollama[3099]: llama_context: n_ctx_per_seq = 4096 May 27 21:07:30 tank ollama[3099]: llama_context: n_batch = 1024 May 27 21:07:30 tank ollama[3099]: llama_context: n_ubatch = 512 May 27 21:07:30 tank ollama[3099]: llama_context: causal_attn = 1 May 27 21:07:30 tank ollama[3099]: llama_context: flash_attn = 0 May 27 21:07:30 tank ollama[3099]: llama_context: freq_base = 1000000.0 May 27 21:07:30 tank ollama[3099]: llama_context: freq_scale = 1 May 27 21:07:30 tank ollama[3099]: llama_context: n_ctx_per_seq (4096) < n_ctx_train (40960) -- the full capacity of the model will not be utilized May 27 21:07:30 tank ollama[3099]: llama_context: CPU output buffer size = 1.20 MiB May 27 21:07:30 tank ollama[3099]: llama_kv_cache_unified: kv_size = 8192, type_k = 'f16', type_v = 'f16', n_layer = 40, can_shift = 1, padding = 32 May 27 21:07:30 tank ollama[3099]: llama_kv_cache_unified: CPU KV buffer size = 1280.00 MiB May 27 21:07:30 tank ollama[3099]: llama_kv_cache_unified: KV self size = 1280.00 MiB, K (f16): 640.00 MiB, V (f16): 640.00 MiB May 27 21:07:30 tank ollama[3099]: llama_context: CPU compute buffer size = 696.01 MiB May 27 21:07:30 tank ollama[3099]: llama_context: graph nodes = 1526 May 27 21:07:30 tank ollama[3099]: llama_context: graph splits = 1 May 27 21:07:30 tank ollama[3099]: time=2025-05-27T21:07:30.597-05:00 level=INFO source=server.go:630 msg="llama runner started in 4.01 seconds" May 27 21:08:45 tank ollama[3099]: [GIN] 2025/05/27 - 21:08:45 | 200 | 1m19s | 127.0.0.1 | POST "/api/chat" May 27 21:13:50 tank ollama[3099]: time=2025-05-27T21:13:50.796-05:00 level=WARN source=sched.go:687 msg="gpu VRAM usage didn't recover within timeout" seconds=5.001224659 runner.size="11.2 GiB" runner.vram="11.2 GiB" runner.parallel=2 runner.pid=5382 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e May 27 21:13:51 tank ollama[3099]: time=2025-05-27T21:13:51.045-05:00 level=WARN source=sched.go:687 msg="gpu VRAM usage didn't recover within timeout" seconds=5.250790349 runner.size="11.2 GiB" runner.vram="11.2 GiB" runner.parallel=2 runner.pid=5382 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e May 27 21:13:51 tank ollama[3099]: time=2025-05-27T21:13:51.295-05:00 level=WARN source=sched.go:687 msg="gpu VRAM usage didn't recover within timeout" seconds=5.500354884 runner.size="11.2 GiB" runner.vram="11.2 GiB" runner.parallel=2 runner.pid=5382 runner.model=/usr/share/ollama/.ollama/models/blobs/sha256-a8cc1361f3145dc01f6d77c6c82c9116b9ffe3c97b34716fe20418455876c40e ``` ### OS Linux ### GPU AMD ### CPU AMD, Intel ### Ollama version 0.7.1

GiteaMirror added the bug label 2026-05-04 17:28:52 -05:00

GiteaMirror closed this issue

2026-05-04 17:28:54 -05:00

GiteaMirror commented

2026-05-04 17:28:56 -05:00

@rick-github commented on GitHub (May 28, 2025):

May 27 21:07:26 tank ollama[3099]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so
May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.601-05:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)

No GPU backend found. It seems like the ROCm backends aren't installed. What do the following return:

lspci -d '1002:' | grep 'AMD'
sudo lshw -c display -numeric -disable network | grep 'vendor: .* \[1002\]'

@rick-github commented on GitHub (May 28, 2025): ``` May 27 21:07:26 tank ollama[3099]: load_backend: loaded CPU backend from /usr/local/lib/ollama/libggml-cpu-haswell.so May 27 21:07:26 tank ollama[3099]: time=2025-05-27T21:07:26.601-05:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) ``` No GPU backend found. It seems like the ROCm backends aren't installed. What do the following return: ``` lspci -d '1002:' | grep 'AMD' sudo lshw -c display -numeric -disable network | grep 'vendor: .* \[1002\]' ```

GiteaMirror commented

2026-05-04 17:28:57 -05:00

@JonJust commented on GitHub (May 28, 2025):

lspci -d '1002:' | grep 'AMD'
sudo lshw -c display -numeric -disable network | grep 'vendor: .* [1002]'
04:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev 12)
05:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch (rev 12)
06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 33 [Radeon RX 7600/7600 XT/7600M XT/7600S/7700S / PRO W7600] (rev c0)
06:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio
vendor: Advanced Micro Devices, Inc. [AMD/ATI] [1002]

I went through the steps on the AMD website to install/setup rocm a few days ago, and ollama was using the GPU just fine. After updating, it stopped working. I tried reinstalling, but it didn't seem to fix anything.

https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/quick-start.html

@JonJust commented on GitHub (May 28, 2025): lspci -d '1002:' | grep 'AMD' sudo lshw -c display -numeric -disable network | grep 'vendor: .* \[1002\]' 04:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev 12) 05:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch (rev 12) 06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 33 [Radeon RX 7600/7600 XT/7600M XT/7600S/7700S / PRO W7600] (rev c0) 06:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio vendor: Advanced Micro Devices, Inc. [AMD/ATI] [1002] I went through the steps on the AMD website to install/setup rocm a few days ago, and ollama was using the GPU just fine. After updating, it stopped working. I tried reinstalling, but it didn't seem to fix anything. https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/quick-start.html

GiteaMirror commented

2026-05-04 17:28:59 -05:00

@rick-github commented on GitHub (May 28, 2025):

The formatting makes it hard to read, but it looks like the lspci returned nothing.

@rick-github commented on GitHub (May 28, 2025): The formatting makes it hard to read, but it looks like the `lspci` returned nothing.

GiteaMirror commented

2026-05-04 17:29:00 -05:00

@JonJust commented on GitHub (May 28, 2025):

I can see how the formatting may be confusing; I copy/pasted from the terminal. lspci returns this block:

lspci -d '1002:' | grep 'AMD' 04:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev 12) 05:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch (rev 12) 06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 33 [Radeon RX 7600/7600 XT/7600M XT/7600S/7700S / PRO W7600] (rev c0) 06:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio

sudo lshw -c display -numeric -disable network | grep 'vendor: .* [1002]' returns:

vendor: Advanced Micro Devices, Inc. [AMD/ATI] [1002]

If I run rocminfo, I can see the card show up...

Agent 2

Name: gfx1102
Uuid: GPU-XX
Marketing Name: AMD Radeon™ RX 7600 XT
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 32(0x20) KB
L2: 2048(0x800) KB
Chip ID: 29824(0x7480)
ASIC Revision: 0(0x0)
Cacheline Size: 128(0x80)
Max Clock Freq. (MHz): 2539
BDFID: 1536
Internal Node ID: 1
Compute Unit: 32
SIMDs per CU: 2
Shader Engines: 2
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Memory Properties:
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 542
SDMA engine uCode:: 21
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1102
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
ISA 2
Name: amdgcn-amd-amdhsa--gfx11-generic
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***

The card also shows up if I run the smi:

/opt/rocm/bin/rocm-smi

======================================= ROCm System Management Interface =======================================
================================================= Concise Info =================================================
Device  Node  IDs              Temp    Power  Partitions          SCLK  MCLK    Fan  Perf  PwrCap  VRAM%  GPU%  
              (DID,     GUID)  (Edge)  (Avg)  (Mem, Compute, ID)                                                
================================================================================================================
0       1     0x7480,   41560  59.0°C  57.0W  N/A, N/A, 0         0Mhz  456Mhz  0%   auto  165.0W  6%     0%    
================================================================================================================
============================================= End of ROCm SMI Log ==============================================

@JonJust commented on GitHub (May 28, 2025): I can see how the formatting may be confusing; I copy/pasted from the terminal. lspci returns this block: `lspci -d '1002:' | grep 'AMD' 04:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev 12) 05:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch (rev 12) 06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 33 [Radeon RX 7600/7600 XT/7600M XT/7600S/7700S / PRO W7600] (rev c0) 06:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio ` sudo lshw -c display -numeric -disable network | grep 'vendor: .* [1002]' returns: `vendor: Advanced Micro Devices, Inc. [AMD/ATI] [1002]` If I run rocminfo, I can see the card show up... ******* Agent 2 ******* Name: gfx1102 Uuid: GPU-XX Marketing Name: AMD Radeon™ RX 7600 XT Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 32(0x20) KB L2: 2048(0x800) KB Chip ID: 29824(0x7480) ASIC Revision: 0(0x0) Cacheline Size: 128(0x80) Max Clock Freq. (MHz): 2539 BDFID: 1536 Internal Node ID: 1 Compute Unit: 32 SIMDs per CU: 2 Shader Engines: 2 Shader Arrs. per Eng.: 2 WatchPts on Addr. Ranges:4 Coherent Host Access: FALSE Memory Properties: Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 32(0x20) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 542 SDMA engine uCode:: 21 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 16760832(0xffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 16760832(0xffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Recommended Granule:0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx1102 Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 ISA 2 Name: amdgcn-amd-amdhsa--gfx11-generic Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 *** Done *** The card also shows up if I run the smi: /opt/rocm/bin/rocm-smi ``` ======================================= ROCm System Management Interface ======================================= ================================================= Concise Info ================================================= Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU% (DID, GUID) (Edge) (Avg) (Mem, Compute, ID) ================================================================================================================ 0 1 0x7480, 41560 59.0°C 57.0W N/A, N/A, 0 0Mhz 456Mhz 0% auto 165.0W 6% 0% ================================================================================================================ ============================================= End of ROCm SMI Log ============================================== ```

GiteaMirror commented

2026-05-04 17:29:01 -05:00

@JonJust commented on GitHub (May 28, 2025):

A ha! I fixed it.

Previously, I was using an Nvidia card on my machine, so I still had the Nvidia tools installed. When I switched over to my AMD card, I installed the ROCm drivers, and Ollama somehow picked up that I switched over to AMD. This was fine until I went to run the install.sh script again.

curl -fsSL https://ollama.com/install.sh | sh

Unfortunately, I didn't save the output, but there was a line in the installer output that said something about "NVIDIA GPU installed." I ran the script a few times while faffing about with my settings to see if reinstalling could fix it, but I always got the NVIDIA message.

I wound up completely uninstalling all NVIDIA-related stuff from my machine. I did an apt purge, and nuked the nvidia-smi, cuda, etc. After doing this, running the install command yields this:

>>> Cleaning up old version at /usr/local/lib/ollama
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Adding ollama user to render group...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> Enabling and starting ollama service...
>>> Downloading Linux ROCm amd64 bundle
######################################################################## 100.0%
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
>>> AMD GPU ready.

Low and behold, I load up a model, and I see my GPU being used, tokens being output much faster, etc.

Looking at the installer script, I think I see the issue:

check_gpu() {
    # Look for devices based on vendor ID for NVIDIA and AMD
    case $1 in
        lspci)
            case $2 in
                nvidia) available lspci && lspci -d '10de:' | grep -q 'NVIDIA' || return 1 ;;
                amdgpu) available lspci && lspci -d '1002:' | grep -q 'AMD' || return 1 ;;
            esac ;;
        lshw)
            case $2 in
                nvidia) available lshw && $SUDO lshw -c display -numeric -disable network | grep -q 'vendor: .* \[10DE\]' || return 1 ;;
                amdgpu) available lshw && $SUDO lshw -c display -numeric -disable network | grep -q 'vendor: .* \[1002\]' || return 1 ;;
            esac ;;
        nvidia-smi) available nvidia-smi || return 1 ;;
    esac
}

if check_gpu nvidia-smi; then
    status "NVIDIA GPU installed."
    exit 0
fi

The first time the check_gpu() function is called, it only checks if nvidia-smi is installed. If it is installed, it assumes the user is using an NVIDIA card. This is problematic in the context of a user that has used an Nvidia card at some point, but switched to AMD. I would assume this case will become more common in the future, if AMD continues putting out higher-memory cards than Nvidia.

Potential fix:

if check_gpu nvidia-smi && check_gpu lspci nvidia; then
    status "NVIDIA GPU installed."
    exit 0
fi

@JonJust commented on GitHub (May 28, 2025): A ha! I fixed it. Previously, I was using an Nvidia card on my machine, so I still had the Nvidia tools installed. When I switched over to my AMD card, I installed the ROCm drivers, and Ollama somehow picked up that I switched over to AMD. This was fine until I went to run the install.sh script again. `curl -fsSL https://ollama.com/install.sh | sh` Unfortunately, I didn't save the output, but there was a line in the installer output that said something about "NVIDIA GPU installed." I ran the script a few times while faffing about with my settings to see if reinstalling could fix it, but I always got the NVIDIA message. I wound up completely uninstalling all NVIDIA-related stuff from my machine. I did an apt purge, and nuked the nvidia-smi, cuda, etc. After doing this, running the install command yields this: ``` >>> Cleaning up old version at /usr/local/lib/ollama >>> Installing ollama to /usr/local >>> Downloading Linux amd64 bundle ######################################################################## 100.0% >>> Adding ollama user to render group... >>> Adding ollama user to video group... >>> Adding current user to ollama group... >>> Creating ollama systemd service... >>> Enabling and starting ollama service... >>> Downloading Linux ROCm amd64 bundle ######################################################################## 100.0% >>> The Ollama API is now available at 127.0.0.1:11434. >>> Install complete. Run "ollama" from the command line. >>> AMD GPU ready. ``` Low and behold, I load up a model, and I see my GPU being used, tokens being output much faster, etc. Looking at the installer script, I think I see the issue: ``` check_gpu() { # Look for devices based on vendor ID for NVIDIA and AMD case $1 in lspci) case $2 in nvidia) available lspci && lspci -d '10de:' | grep -q 'NVIDIA' || return 1 ;; amdgpu) available lspci && lspci -d '1002:' | grep -q 'AMD' || return 1 ;; esac ;; lshw) case $2 in nvidia) available lshw && $SUDO lshw -c display -numeric -disable network | grep -q 'vendor: .* \[10DE\]' || return 1 ;; amdgpu) available lshw && $SUDO lshw -c display -numeric -disable network | grep -q 'vendor: .* \[1002\]' || return 1 ;; esac ;; nvidia-smi) available nvidia-smi || return 1 ;; esac } if check_gpu nvidia-smi; then status "NVIDIA GPU installed." exit 0 fi ``` The first time the check_gpu() function is called, it only checks if nvidia-smi is installed. If it is installed, it assumes the user is using an NVIDIA card. This is problematic in the context of a user that has used an Nvidia card at some point, but switched to AMD. I would assume this case will become more common in the future, if AMD continues putting out higher-memory cards than Nvidia. Potential fix: ``` if check_gpu nvidia-smi && check_gpu lspci nvidia; then status "NVIDIA GPU installed." exit 0 fi ```

GiteaMirror commented

2026-05-04 17:29:02 -05:00

@afrolino02 commented on GitHub (Jun 7, 2025):

You saved my life dude, i was thinking about "why nvidia if i have AMD CARD", and in my Arch setup was installed something with nvidia, thanks!!!

@afrolino02 commented on GitHub (Jun 7, 2025): You saved my life dude, i was thinking about "why nvidia if i have AMD CARD", and in my Arch setup was installed something with nvidia, thanks!!!

Sign in to join this conversation.

Branches Tags

main

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-remove-claude-desktop-launch

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#69210