[GH-ISSUE #10075] GPU Not Being Used Despite CUDA Installation and GPU Detection (Ollama 0.6.3 on Arch Linux( #32364

Closed
opened 2026-04-22 13:34:12 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @mi6i on GitHub (Apr 1, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/10075

What is the issue?

Hello,

I have CUDA installed and my GPU is correctly detected by Ollama. However, when I give a prompt to the model, it's not using my GPU but rather the CPU for processing.

I’m running Arch Linux and using Ollama version 0.6.3. My GPU is an RTX 3060, but the model seems to default to CPU usage despite the GPU being detected.

Could you please help me resolve this issue? Thank you!

Relevant log output

❯ ollama serve
2025/04/01 16:45:44 routes.go:1230: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/mega/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-04-01T16:45:44.831+03:30 level=INFO source=images.go:432 msg="total blobs: 10"
time=2025-04-01T16:45:44.831+03:30 level=INFO source=images.go:439 msg="total unused blobs removed: 0"
time=2025-04-01T16:45:44.831+03:30 level=INFO source=routes.go:1297 msg="Listening on 127.0.0.1:11434 (version 0.6.3)"
time=2025-04-01T16:45:44.831+03:30 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-04-01T16:45:45.205+03:30 level=INFO source=types.go:130 msg="inference compute" id=GPU-b711a4e3-abdb-c690-583b-3cabb56218c2 library=cuda variant=v12 compute=8.6 driver=12.8 name="NVIDIA GeForce RTX 3060" total="11.6 GiB" available="11.1 GiB"
[GIN] 2025/04/01 - 16:46:15 | 200 |      41.556µs |       127.0.0.1 | HEAD     "/"
[GIN] 2025/04/01 - 16:46:15 | 200 |   44.976211ms |       127.0.0.1 | POST     "/api/show"
time=2025-04-01T16:46:15.948+03:30 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/home/mega/.ollama/models/blobs/sha256-e8ad13eff07a78d89926e9e8b882317d082ef5bf9768ad7b50fcdbbcd63748de gpu=GPU-b711a4e3-abdb-c690-583b-3cabb56218c2 parallel=1 available=11866931200 required="10.3 GiB"
time=2025-04-01T16:46:16.104+03:30 level=INFO source=server.go:105 msg="system memory" total="46.8 GiB" free="42.6 GiB" free_swap="4.0 GiB"
time=2025-04-01T16:46:16.106+03:30 level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=49 layers.offload=49 layers.split="" memory.available="[11.1 GiB]" memory.gpu_overhead="0 B" memory.required.full="10.3 GiB" memory.required.partial="10.3 GiB" memory.required.kv="608.0 MiB" memory.required.allocations="[10.3 GiB]" memory.weights.total="6.8 GiB" memory.weights.repeating="6.0 GiB" memory.weights.nonrepeating="787.5 MiB" memory.graph.full="519.5 MiB" memory.graph.partial="1.3 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB"
time=2025-04-01T16:46:16.206+03:30 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-04-01T16:46:16.209+03:30 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-04-01T16:46:16.212+03:30 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-04-01T16:46:16.217+03:30 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-04-01T16:46:16.217+03:30 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-04-01T16:46:16.217+03:30 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-04-01T16:46:16.217+03:30 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-04-01T16:46:16.217+03:30 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-04-01T16:46:16.218+03:30 level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/bin/ollama runner --ollama-engine --model /home/mega/.ollama/models/blobs/sha256-e8ad13eff07a78d89926e9e8b882317d082ef5bf9768ad7b50fcdbbcd63748de --ctx-size 2048 --batch-size 512 --n-gpu-layers 49 --threads 6 --parallel 1 --port 45043"
time=2025-04-01T16:46:16.218+03:30 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2025-04-01T16:46:16.218+03:30 level=INFO source=server.go:580 msg="waiting for llama runner to start responding"
time=2025-04-01T16:46:16.219+03:30 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error"
time=2025-04-01T16:46:16.229+03:30 level=INFO source=runner.go:765 msg="starting ollama engine"
time=2025-04-01T16:46:16.229+03:30 level=INFO source=runner.go:828 msg="Server listening on 127.0.0.1:45043"
time=2025-04-01T16:46:16.331+03:30 level=WARN source=ggml.go:149 msg="key not found" key=general.name default=""
time=2025-04-01T16:46:16.331+03:30 level=WARN source=ggml.go:149 msg="key not found" key=general.description default=""
time=2025-04-01T16:46:16.331+03:30 level=INFO source=ggml.go:69 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1065 num_key_values=37
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so
time=2025-04-01T16:46:16.335+03:30 level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc)
time=2025-04-01T16:46:16.339+03:30 level=INFO source=ggml.go:291 msg="model weights" buffer=CPU size="8.3 GiB"
time=2025-04-01T16:46:16.472+03:30 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server loading model"
time=2025-04-01T16:46:18.301+03:30 level=INFO source=ggml.go:383 msg="compute graph" backend=CPU buffer_type=CPU
time=2025-04-01T16:46:18.301+03:30 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-04-01T16:46:18.303+03:30 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false
time=2025-04-01T16:46:18.306+03:30 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+"
time=2025-04-01T16:46:18.311+03:30 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07
time=2025-04-01T16:46:18.311+03:30 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000
time=2025-04-01T16:46:18.311+03:30 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06
time=2025-04-01T16:46:18.311+03:30 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1
time=2025-04-01T16:46:18.311+03:30 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256
time=2025-04-01T16:46:18.500+03:30 level=INFO source=server.go:619 msg="llama runner started in 2.28 seconds"
[GIN] 2025/04/01 - 16:46:18 | 200 |  2.867606175s |       127.0.0.1 | POST     "/api/generate"
[GIN] 2025/04/01 - 16:47:52 | 200 |   6.32049411s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/04/01 - 16:48:10 | 200 |     420.654µs |       127.0.0.1 | GET      "/api/tags"
[GIN] 2025/04/01 - 16:48:10 | 200 |      61.949µs |       127.0.0.1 | GET      "/api/version"
[GIN] 2025/04/01 - 16:48:25 | 200 |  7.412573647s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/04/01 - 16:48:46 | 200 | 20.127895174s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/04/01 - 16:49:17 | 200 | 41.185738074s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/04/01 - 16:51:02 | 200 |     893.056µs |       127.0.0.1 | GET      "/api/tags"
[GIN] 2025/04/01 - 16:51:07 | 200 |     125.365µs |       127.0.0.1 | GET      "/api/version"
[GIN] 2025/04/01 - 16:53:12 | 200 |      98.056µs |       127.0.0.1 | GET      "/api/version"
[GIN] 2025/04/01 - 16:54:38 | 200 |         5m52s |       127.0.0.1 | POST     "/api/chat"
[GIN] 2025/04/01 - 16:54:57 | 200 |         6m11s |       127.0.0.1 | POST     "/api/chat"
^[[1586;6u^[[1586;6u`

OS

Arch

GPU

Rtx 3060

CPU

Ryzen 5 Pro

Ollama version

0.6.3

Originally created by @mi6i on GitHub (Apr 1, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/10075 ### What is the issue? Hello, I have CUDA installed and my GPU is correctly detected by Ollama. However, when I give a prompt to the model, it's not using my GPU but rather the CPU for processing. I’m running Arch Linux and using Ollama version 0.6.3. My GPU is an RTX 3060, but the model seems to default to CPU usage despite the GPU being detected. Could you please help me resolve this issue? Thank you! ### Relevant log output ```shell ❯ ollama serve 2025/04/01 16:45:44 routes.go:1230: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:2048 OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/mega/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]" time=2025-04-01T16:45:44.831+03:30 level=INFO source=images.go:432 msg="total blobs: 10" time=2025-04-01T16:45:44.831+03:30 level=INFO source=images.go:439 msg="total unused blobs removed: 0" time=2025-04-01T16:45:44.831+03:30 level=INFO source=routes.go:1297 msg="Listening on 127.0.0.1:11434 (version 0.6.3)" time=2025-04-01T16:45:44.831+03:30 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-04-01T16:45:45.205+03:30 level=INFO source=types.go:130 msg="inference compute" id=GPU-b711a4e3-abdb-c690-583b-3cabb56218c2 library=cuda variant=v12 compute=8.6 driver=12.8 name="NVIDIA GeForce RTX 3060" total="11.6 GiB" available="11.1 GiB" [GIN] 2025/04/01 - 16:46:15 | 200 | 41.556µs | 127.0.0.1 | HEAD "/" [GIN] 2025/04/01 - 16:46:15 | 200 | 44.976211ms | 127.0.0.1 | POST "/api/show" time=2025-04-01T16:46:15.948+03:30 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=/home/mega/.ollama/models/blobs/sha256-e8ad13eff07a78d89926e9e8b882317d082ef5bf9768ad7b50fcdbbcd63748de gpu=GPU-b711a4e3-abdb-c690-583b-3cabb56218c2 parallel=1 available=11866931200 required="10.3 GiB" time=2025-04-01T16:46:16.104+03:30 level=INFO source=server.go:105 msg="system memory" total="46.8 GiB" free="42.6 GiB" free_swap="4.0 GiB" time=2025-04-01T16:46:16.106+03:30 level=INFO source=server.go:138 msg=offload library=cuda layers.requested=-1 layers.model=49 layers.offload=49 layers.split="" memory.available="[11.1 GiB]" memory.gpu_overhead="0 B" memory.required.full="10.3 GiB" memory.required.partial="10.3 GiB" memory.required.kv="608.0 MiB" memory.required.allocations="[10.3 GiB]" memory.weights.total="6.8 GiB" memory.weights.repeating="6.0 GiB" memory.weights.nonrepeating="787.5 MiB" memory.graph.full="519.5 MiB" memory.graph.partial="1.3 GiB" projector.weights="795.9 MiB" projector.graph="1.0 GiB" time=2025-04-01T16:46:16.206+03:30 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" time=2025-04-01T16:46:16.209+03:30 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-04-01T16:46:16.212+03:30 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" time=2025-04-01T16:46:16.217+03:30 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-04-01T16:46:16.217+03:30 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-04-01T16:46:16.217+03:30 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-04-01T16:46:16.217+03:30 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-04-01T16:46:16.217+03:30 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-04-01T16:46:16.218+03:30 level=INFO source=server.go:405 msg="starting llama server" cmd="/usr/bin/ollama runner --ollama-engine --model /home/mega/.ollama/models/blobs/sha256-e8ad13eff07a78d89926e9e8b882317d082ef5bf9768ad7b50fcdbbcd63748de --ctx-size 2048 --batch-size 512 --n-gpu-layers 49 --threads 6 --parallel 1 --port 45043" time=2025-04-01T16:46:16.218+03:30 level=INFO source=sched.go:450 msg="loaded runners" count=1 time=2025-04-01T16:46:16.218+03:30 level=INFO source=server.go:580 msg="waiting for llama runner to start responding" time=2025-04-01T16:46:16.219+03:30 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server error" time=2025-04-01T16:46:16.229+03:30 level=INFO source=runner.go:765 msg="starting ollama engine" time=2025-04-01T16:46:16.229+03:30 level=INFO source=runner.go:828 msg="Server listening on 127.0.0.1:45043" time=2025-04-01T16:46:16.331+03:30 level=WARN source=ggml.go:149 msg="key not found" key=general.name default="" time=2025-04-01T16:46:16.331+03:30 level=WARN source=ggml.go:149 msg="key not found" key=general.description default="" time=2025-04-01T16:46:16.331+03:30 level=INFO source=ggml.go:69 msg="" architecture=gemma3 file_type=Q4_K_M name="" description="" num_tensors=1065 num_key_values=37 load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so time=2025-04-01T16:46:16.335+03:30 level=INFO source=ggml.go:109 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) time=2025-04-01T16:46:16.339+03:30 level=INFO source=ggml.go:291 msg="model weights" buffer=CPU size="8.3 GiB" time=2025-04-01T16:46:16.472+03:30 level=INFO source=server.go:614 msg="waiting for server to become available" status="llm server loading model" time=2025-04-01T16:46:18.301+03:30 level=INFO source=ggml.go:383 msg="compute graph" backend=CPU buffer_type=CPU time=2025-04-01T16:46:18.301+03:30 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" time=2025-04-01T16:46:18.303+03:30 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.add_eot_token default=false time=2025-04-01T16:46:18.306+03:30 level=WARN source=ggml.go:149 msg="key not found" key=tokenizer.ggml.pretokenizer default="(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+" time=2025-04-01T16:46:18.311+03:30 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.attention.layer_norm_rms_epsilon default=9.999999974752427e-07 time=2025-04-01T16:46:18.311+03:30 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.local.freq_base default=10000 time=2025-04-01T16:46:18.311+03:30 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.global.freq_base default=1e+06 time=2025-04-01T16:46:18.311+03:30 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.rope.freq_scale default=1 time=2025-04-01T16:46:18.311+03:30 level=WARN source=ggml.go:149 msg="key not found" key=gemma3.mm_tokens_per_image default=256 time=2025-04-01T16:46:18.500+03:30 level=INFO source=server.go:619 msg="llama runner started in 2.28 seconds" [GIN] 2025/04/01 - 16:46:18 | 200 | 2.867606175s | 127.0.0.1 | POST "/api/generate" [GIN] 2025/04/01 - 16:47:52 | 200 | 6.32049411s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/04/01 - 16:48:10 | 200 | 420.654µs | 127.0.0.1 | GET "/api/tags" [GIN] 2025/04/01 - 16:48:10 | 200 | 61.949µs | 127.0.0.1 | GET "/api/version" [GIN] 2025/04/01 - 16:48:25 | 200 | 7.412573647s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/04/01 - 16:48:46 | 200 | 20.127895174s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/04/01 - 16:49:17 | 200 | 41.185738074s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/04/01 - 16:51:02 | 200 | 893.056µs | 127.0.0.1 | GET "/api/tags" [GIN] 2025/04/01 - 16:51:07 | 200 | 125.365µs | 127.0.0.1 | GET "/api/version" [GIN] 2025/04/01 - 16:53:12 | 200 | 98.056µs | 127.0.0.1 | GET "/api/version" [GIN] 2025/04/01 - 16:54:38 | 200 | 5m52s | 127.0.0.1 | POST "/api/chat" [GIN] 2025/04/01 - 16:54:57 | 200 | 6m11s | 127.0.0.1 | POST "/api/chat" ^[[1586;6u^[[1586;6u` ``` ### OS Arch ### GPU Rtx 3060 ### CPU Ryzen 5 Pro ### Ollama version 0.6.3
GiteaMirror added the bug label 2026-04-22 13:34:12 -05:00
Author
Owner

@rick-github commented on GitHub (Apr 1, 2025):

load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so

Found the CPU backend but not the CUDA backend. Did you install the Arch ollama-cuda package?

<!-- gh-comment-id:2769406741 --> @rick-github commented on GitHub (Apr 1, 2025): ``` load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so ``` Found the CPU backend but not the CUDA backend. Did you install the Arch [ollama-cuda](https://archlinux.org/packages/extra/x86_64/ollama-cuda/) package?
Author
Owner

@mi6i commented on GitHub (Apr 1, 2025):

load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so

Found the CPU backend but not the CUDA backend. Did you install the Arch ollama-cuda package?

I just installed the Ollama package. Should I install ollama-cuda instead?

<!-- gh-comment-id:2769435847 --> @mi6i commented on GitHub (Apr 1, 2025): > ``` > load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so > ``` > > Found the CPU backend but not the CUDA backend. Did you install the Arch [ollama-cuda](https://archlinux.org/packages/extra/x86_64/ollama-cuda/) package? I just installed the Ollama package. Should I install ollama-cuda instead?
Author
Owner

@rick-github commented on GitHub (Apr 1, 2025):

I believe (not an Arch user) that you install both - ollama for the base, ollama-cuda for CUDA support.

<!-- gh-comment-id:2769440101 --> @rick-github commented on GitHub (Apr 1, 2025): I believe (not an Arch user) that you install both - ollama for the base, ollama-cuda for CUDA support.
Author
Owner

@sieveLau commented on GitHub (Apr 2, 2025):

load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so

Found the CPU backend but not the CUDA backend. Did you install the Arch ollama-cuda package?

I just installed the Ollama package. Should I install ollama-cuda instead?

ollama package on Arch is just ollama binary. If you want CUDA acceleration, you ALSO need to install ollama-cuda. It's not a replacement, but an optional dependency.

<!-- gh-comment-id:2771333409 --> @sieveLau commented on GitHub (Apr 2, 2025): > > ``` > > load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-haswell.so > > ``` > > > > Found the CPU backend but not the CUDA backend. Did you install the Arch [ollama-cuda](https://archlinux.org/packages/extra/x86_64/ollama-cuda/) package? > > I just installed the Ollama package. Should I install ollama-cuda instead? `ollama` package on Arch is just ollama binary. If you want CUDA acceleration, you ALSO need to install `ollama-cuda`. It's not a replacement, but an optional dependency.
Author
Owner

@mi6i commented on GitHub (Apr 7, 2025):

Ah, thanks! Installing ollama-cuda did the trick—everything’s running smooth now.

<!-- gh-comment-id:2784103753 --> @mi6i commented on GitHub (Apr 7, 2025): Ah, thanks! Installing ollama-cuda did the trick—everything’s running smooth now.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#32364