[GH-ISSUE #2587] Running on GPU #48034

New Issue

GiteaMirror · 2026-04-28T06:29:52-05:00

GiteaMirror commented

2026-04-28 06:29:52 -05:00

Originally created by @shersoni610 on GitHub (Feb 19, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2587

Originally assigned to: @dhiltgen on GitHub.

Hello,
It seems, the response time of llama2:7b is slow on my linux machine. I am not sure if the code
is running on Nvidia card.

In a python code, how to ensure that Ollama models run on GPU?

Originally created by @shersoni610 on GitHub (Feb 19, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2587 Originally assigned to: @dhiltgen on GitHub. Hello, It seems, the response time of llama2:7b is slow on my linux machine. I am not sure if the code is running on Nvidia card. In a python code, how to ensure that Ollama models run on GPU?

GiteaMirror closed this issue

2026-04-28 06:29:53 -05:00

GiteaMirror commented

2026-04-28 06:29:55 -05:00

@jaifar530 commented on GitHub (Feb 19, 2024):

Hi

sudo apt install nvtop

during asking the question to the LLM, run nvtop and check the percentage

@jaifar530 commented on GitHub (Feb 19, 2024): Hi sudo apt install nvtop during asking the question to the LLM, run nvtop and check the percentage

GiteaMirror commented

2026-04-28 06:29:56 -05:00

@shersoni610 commented on GitHub (Feb 20, 2024):

Hello,

Thanks for the into: I see the that GPU usage is 0% and CPU 794%/ At least this confirms that the code is
running on CPU. How should I utilize GPU?

@shersoni610 commented on GitHub (Feb 20, 2024): Hello, Thanks for the into: I see the that GPU usage is 0% and CPU 794%/ At least this confirms that the code is running on CPU. How should I utilize GPU?

GiteaMirror commented

2026-04-28 06:29:56 -05:00

@jaifar530 commented on GitHub (Feb 20, 2024):

first you need to make sure that those two commends should show a valid outputs

$ nvidia-smi
$ nvcc --verison

if one of them is not giving an output, you will be given suggest CLI to install them "sudo apt install ... cuda .." or "sudo apt install ... nvidia .. driver" DON'T install them. and follow bellow steps

go to the BIOS setting and disable secure boot
then install the missing driver suggested to you above.

@jaifar530 commented on GitHub (Feb 20, 2024): first you need to make sure that those two commends should show a valid outputs $ nvidia-smi $ nvcc --verison if one of them is not giving an output, you will be given suggest CLI to install them "sudo apt install ... cuda .." or "sudo apt install ... nvidia .. driver" DON'T install them. and follow bellow steps 1. go to the BIOS setting and disable secure boot 2. then install the missing driver suggested to you above.

GiteaMirror commented

2026-04-28 06:29:57 -05:00

@shersoni610 commented on GitHub (Feb 20, 2024):

Hello,

Both the commands are working. I still see high cpu usage and zero for GPU.

@shersoni610 commented on GitHub (Feb 20, 2024): Hello, Both the commands are working. I still see high cpu usage and zero for GPU.

GiteaMirror commented

2026-04-28 06:29:58 -05:00

@jaifar530 commented on GitHub (Feb 20, 2024):

Hello,

Both the commands are working. I still see high cpu usage and zero for GPU.

Do one more thing,

Make sure the ollama prompt is closed. During that run the nvtop command and check the GPU Ram utlization..
Then ollama run llama2:7b
At the same time of (2) check the GPU ram utilisation, is it same as before running ollama?

If same, then maybe the gpu is not suppoting cuda,

If not same, it goes up to 3-6 GB, then everything works fine with you and it is only ollama issue that many people has raised with current version which is GPU not supporting on higher layers

@jaifar530 commented on GitHub (Feb 20, 2024): > Hello, > > Both the commands are working. I still see high cpu usage and zero for GPU. > Do one more thing, 1. Make sure the ollama prompt is closed. During that run the nvtop command and check the GPU Ram utlization.. 2. Then ollama run llama2:7b 3. At the same time of (2) check the GPU ram utilisation, is it same as before running ollama? If same, then maybe the gpu is not suppoting cuda, If not same, it goes up to 3-6 GB, then everything works fine with you and it is only ollama issue that many people has raised with current version which is GPU not supporting on higher layers

GiteaMirror commented

2026-04-28 06:30:00 -05:00

@jaifar530 commented on GitHub (Feb 21, 2024):

Also, try to do freash installation or reinstall using this script it should show you if the GPU is dedected or not

@jaifar530 commented on GitHub (Feb 21, 2024): Also, try to do freash installation or reinstall using this script it should show you if the GPU is dedected or not <img width="893" alt="image" src="https://github.com/ollama/ollama/assets/31308766/59b6709a-3ce9-4b2e-a82a-16120d405635">

GiteaMirror commented

2026-04-28 06:30:04 -05:00

@shersoni610 commented on GitHub (Feb 21, 2024):

Thanks. I see the following:

Adding ollama user to render group...
Adding current user to ollama group...
Creating ollama systemd service...
Enabling and starting ollama service...
NVIDIA GPU installed.

I still see the high CPU usages and zero GPU utilization

@shersoni610 commented on GitHub (Feb 21, 2024): Thanks. I see the following: >>> Adding ollama user to render group... >>> Adding current user to ollama group... >>> Creating ollama systemd service... >>> Enabling and starting ollama service... >>> NVIDIA GPU installed. I still see the high CPU usages and zero GPU utilization

GiteaMirror commented

2026-04-28 06:30:08 -05:00

@oodzchen commented on GitHub (Feb 24, 2024):

Same here, I use RTX 3080 on Linux, the install script shows "NVIDIA GPU installed.", but neither nvtop or nvidia-smi outputs show any GPU usage when running the models, even the intel GPU is zero percentage.

@oodzchen commented on GitHub (Feb 24, 2024): Same here, I use RTX 3080 on Linux, the install script shows "NVIDIA GPU installed.", but neither `nvtop` or `nvidia-smi` outputs show any GPU usage when running the models, even the intel GPU is zero percentage.

GiteaMirror commented

2026-04-28 06:30:11 -05:00

@jaifar530 commented on GitHub (Feb 24, 2024):

Same here, I use RTX 3080 on Linux, the install script shows "NVIDIA GPU installed.", but neither nvtop or nvidia-smi outputs show any GPU usage when running the models, even the intel GPU is zero percentage.

Which LLM mosel you have used?

@jaifar530 commented on GitHub (Feb 24, 2024): > Same here, I use RTX 3080 on Linux, the install script shows "NVIDIA GPU installed.", but neither `nvtop` or `nvidia-smi` outputs show any GPU usage when running the models, even the intel GPU is zero percentage. Which LLM mosel you have used?

GiteaMirror commented

2026-04-28 06:30:14 -05:00

@oodzchen commented on GitHub (Feb 24, 2024):

@jaifar530 I've tried llama2, mistral and gemma, all the same.

@oodzchen commented on GitHub (Feb 24, 2024): @jaifar530 I've tried llama2, mistral and gemma, all the same.

GiteaMirror commented

2026-04-28 06:30:16 -05:00

@jaifar530 commented on GitHub (Feb 24, 2024):

@jaifar530 I've tried llama2, mistral and gemma, all the same.

Does nvcc --version show output?

@jaifar530 commented on GitHub (Feb 24, 2024): > @jaifar530 I've tried llama2, mistral and gemma, all the same. Does `nvcc --version` show output?

GiteaMirror commented

2026-04-28 06:30:17 -05:00

@oodzchen commented on GitHub (Feb 24, 2024):

Does nvcc --version show output?

I'm using openSUSE Tumbleweed, successfully installed cuda and cuda-tookit, but could not found the nvcc command. The nvidia-smi outputs show CUDA version is 12.3 .

@oodzchen commented on GitHub (Feb 24, 2024): > Does `nvcc --version` show output? I'm using openSUSE Tumbleweed, successfully installed `cuda` and `cuda-tookit`, but could not found the `nvcc` command. The `nvidia-smi` outputs show CUDA version is 12.3 .

GiteaMirror commented

2026-04-28 06:30:18 -05:00

@oodzchen commented on GitHub (Feb 24, 2024):

Does nvcc --version show output?

I just found the nvcc binary, the output is

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Nov_22_10:17:15_PST_2023
Cuda compilation tools, release 12.3, V12.3.107
Build cuda_12.3.r12.3/compiler.33567101_0

@oodzchen commented on GitHub (Feb 24, 2024): > Does `nvcc --version` show output? I just found the nvcc binary, the output is ```shell nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Wed_Nov_22_10:17:15_PST_2023 Cuda compilation tools, release 12.3, V12.3.107 Build cuda_12.3.r12.3/compiler.33567101_0 ```

GiteaMirror commented

2026-04-28 06:30:19 -05:00

@dhiltgen commented on GitHub (Feb 26, 2024):

Can you share the server log so we can see why it's not able to detect your GPU?

@dhiltgen commented on GitHub (Feb 26, 2024): Can you share the server log so we can see why it's not able to detect your GPU?

GiteaMirror commented

2026-04-28 06:30:20 -05:00

@oodzchen commented on GitHub (Feb 27, 2024):

Can you share the server log so we can see why it's not able to detect your GPU?

Here's the output of journalctl -xeu ollama.service

Feb 26 22:45:01 pc-opss systemd[1]: Started Ollama Service.
░░ Subject: A start job for unit ollama.service has finished successfully
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░ 
░░ A start job for unit ollama.service has finished successfully.
░░ 
░░ The job identifier is 302.
Feb 26 22:45:01 pc-opss ollama[1248]: time=2024-02-26T22:45:01.556+08:00 level=INFO source=images.go:710 msg="total blobs: 18"
Feb 26 22:45:01 pc-opss ollama[1248]: time=2024-02-26T22:45:01.558+08:00 level=INFO source=images.go:717 msg="total unused blobs removed: 0"
Feb 26 22:45:01 pc-opss ollama[1248]: time=2024-02-26T22:45:01.559+08:00 level=INFO source=routes.go:1019 msg="Listening on 127.0.0.1:11434 (version 0.1.27)"
Feb 26 22:45:01 pc-opss ollama[1248]: time=2024-02-26T22:45:01.560+08:00 level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
Feb 26 22:45:03 pc-opss ollama[1248]: time=2024-02-26T22:45:03.043+08:00 level=INFO source=payload_common.go:146 msg="Dynamic LLM libraries [rocm_v6 cpu_avx2 rocm_v5 cpu cpu_avx cuda_v11]"
Feb 26 22:45:03 pc-opss ollama[1248]: time=2024-02-26T22:45:03.043+08:00 level=INFO source=gpu.go:94 msg="Detecting GPU type"
Feb 26 22:45:03 pc-opss ollama[1248]: time=2024-02-26T22:45:03.043+08:00 level=INFO source=gpu.go:265 msg="Searching for GPU management library libnvidia-ml.so"
Feb 26 22:45:03 pc-opss ollama[1248]: time=2024-02-26T22:45:03.059+08:00 level=INFO source=gpu.go:311 msg="Discovered GPU libraries: [/usr/lib/libnvidia-ml.so.545.29.06 /usr/lib64/libnvidia-ml.so.545.29.06]"
Feb 26 22:45:03 pc-opss ollama[1248]: time=2024-02-26T22:45:03.059+08:00 level=INFO source=gpu.go:323 msg="Unable to load CUDA management library /usr/lib/libnvidia-ml.so.545.29.06: Unable to load /usr/lib/libnvidia-ml.so.545.29.06 library to query for Nvidia GPUs: /usr/lib/libnvidia-ml.so.545.29.06: wrong ELF class: ELFCLASS32"
Feb 26 22:45:03 pc-opss ollama[1248]: time=2024-02-26T22:45:03.065+08:00 level=INFO source=gpu.go:323 msg="Unable to load CUDA management library /usr/lib64/libnvidia-ml.so.545.29.06: nvml vram init failure: 4"
Feb 26 22:45:03 pc-opss ollama[1248]: time=2024-02-26T22:45:03.065+08:00 level=INFO source=gpu.go:265 msg="Searching for GPU management library librocm_smi64.so"
Feb 26 22:45:03 pc-opss ollama[1248]: time=2024-02-26T22:45:03.065+08:00 level=INFO source=gpu.go:311 msg="Discovered GPU libraries: []"
Feb 26 22:45:03 pc-opss ollama[1248]: time=2024-02-26T22:45:03.065+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
Feb 26 22:45:03 pc-opss ollama[1248]: time=2024-02-26T22:45:03.065+08:00 level=INFO source=routes.go:1042 msg="no GPU detected"
Feb 27 10:14:44 pc-opss ollama[1248]: [GIN] 2024/02/27 - 10:14:44 | 200 |      28.694µs |       127.0.0.1 | HEAD     "/"
Feb 27 10:14:44 pc-opss ollama[1248]: [GIN] 2024/02/27 - 10:14:44 | 200 |    1.607655ms |       127.0.0.1 | POST     "/api/show"
Feb 27 10:14:44 pc-opss ollama[1248]: [GIN] 2024/02/27 - 10:14:44 | 200 |    1.136526ms |       127.0.0.1 | POST     "/api/show"
Feb 27 10:14:44 pc-opss ollama[1248]: time=2024-02-27T10:14:44.974+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
Feb 27 10:14:44 pc-opss ollama[1248]: time=2024-02-27T10:14:44.974+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
Feb 27 10:14:44 pc-opss ollama[1248]: time=2024-02-27T10:14:44.974+08:00 level=INFO source=llm.go:77 msg="GPU not available, falling back to CPU"
Feb 27 10:14:44 pc-opss ollama[1248]: time=2024-02-27T10:14:44.974+08:00 level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama1034691504/cpu_avx2/libext_server.so"
Feb 27 10:14:44 pc-opss ollama[1248]: time=2024-02-27T10:14:44.974+08:00 level=INFO source=dyn_ext_server.go:150 msg="Initializing llama server"
Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: loaded meta data with 23 key-value pairs and 291 tensors from /usr/share/ollama/.ollama/models/blobs/sha256:8934d96d3f08982e95922b2b7a2c626a1fe873d7c3b06e8e56d7bc0a1fef9246 (version GGUF V3 (latest))
Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: - kv   0:                       general.architecture str              = llama
Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: - kv   4:                          llama.block_count u32              = 32
Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008
Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 32
Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: - kv  10:                          general.file_type u32              = 2
Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
Feb 27 10:14:45 pc-opss ollama[1248]: llama_model_loader: - kv  15:                      tokenizer.ggml.merges arr[str,61249]   = ["▁ t", "e r", "i n", "▁ a", "e n...
Feb 27 10:14:45 pc-opss ollama[1248]: llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 1
Feb 27 10:14:45 pc-opss ollama[1248]: llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 2
Feb 27 10:14:45 pc-opss ollama[1248]: llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32              = 0
Feb 27 10:14:45 pc-opss ollama[1248]: llama_model_loader: - kv  19:               tokenizer.ggml.add_bos_token bool             = true
Feb 27 10:14:45 pc-opss ollama[1248]: llama_model_loader: - kv  20:               tokenizer.ggml.add_eos_token bool             = false
Feb 27 10:14:45 pc-opss ollama[1248]: llama_model_loader: - kv  21:                    tokenizer.chat_template str              = {% if messages[0]['role'] == 'system'...
Feb 27 10:14:45 pc-opss ollama[1248]: llama_model_loader: - kv  22:               general.quantization_version u32              = 2
Feb 27 10:14:45 pc-opss ollama[1248]: llama_model_loader: - type  f32:   65 tensors
Feb 27 10:14:45 pc-opss ollama[1248]: llama_model_loader: - type q4_0:  225 tensors
Feb 27 10:14:45 pc-opss ollama[1248]: llama_model_loader: - type q6_K:    1 tensors
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_vocab: special tokens definition check successful ( 259/32000 ).
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: format           = GGUF V3 (latest)
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: arch             = llama
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: vocab type       = SPM
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_vocab          = 32000
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_merges         = 0
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_ctx_train      = 4096
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_embd           = 4096
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_head           = 32
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_head_kv        = 32
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_layer          = 32
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_rot            = 128
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_embd_head_k    = 128
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_embd_head_v    = 128
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_gqa            = 1
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_embd_k_gqa     = 4096
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_embd_v_gqa     = 4096
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: f_norm_eps       = 0.0e+00
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: f_clamp_kqv      = 0.0e+00
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_ff             = 11008
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_expert         = 0
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_expert_used    = 0
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: rope scaling     = linear
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: freq_base_train  = 10000.0
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: freq_scale_train = 1
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_yarn_orig_ctx  = 4096
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: rope_finetuned   = unknown
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: model type       = 7B
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: model ftype      = Q4_0
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: model params     = 6.74 B
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: model size       = 3.56 GiB (4.54 BPW)
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: general.name     = LLaMA v2
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: BOS token        = 1 '<s>'
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: EOS token        = 2 '</s>'
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: UNK token        = 0 '<unk>'
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: LF token         = 13 '<0x0A>'
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_tensors: ggml ctx size =    0.11 MiB
Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_tensors:        CPU buffer size =  3647.87 MiB
Feb 27 10:14:45 pc-opss ollama[1248]: ..................................................................................................
Feb 27 10:14:45 pc-opss ollama[1248]: llama_new_context_with_model: n_ctx      = 2048
Feb 27 10:14:45 pc-opss ollama[1248]: llama_new_context_with_model: freq_base  = 10000.0
Feb 27 10:14:45 pc-opss ollama[1248]: llama_new_context_with_model: freq_scale = 1
Feb 27 10:14:45 pc-opss ollama[1248]: llama_kv_cache_init:        CPU KV buffer size =  1024.00 MiB
Feb 27 10:14:45 pc-opss ollama[1248]: llama_new_context_with_model: KV self size  = 1024.00 MiB, K (f16):  512.00 MiB, V (f16):  512.00 MiB
Feb 27 10:14:45 pc-opss ollama[1248]: llama_new_context_with_model:        CPU input buffer size   =    13.02 MiB
Feb 27 10:14:45 pc-opss ollama[1248]: llama_new_context_with_model:        CPU compute buffer size =   160.00 MiB
Feb 27 10:14:45 pc-opss ollama[1248]: llama_new_context_with_model: graph splits (measure): 1
Feb 27 10:14:46 pc-opss ollama[1248]: time=2024-02-27T10:14:46.008+08:00 level=INFO source=dyn_ext_server.go:161 msg="Starting llama main loop"
Feb 27 10:14:46 pc-opss ollama[1248]: [GIN] 2024/02/27 - 10:14:46 | 200 |  1.156281055s |       127.0.0.1 | POST     "/api/chat"
Feb 27 10:19:42 pc-opss ollama[1248]: [GIN] 2024/02/27 - 10:19:42 | 200 |      12.678µs |       127.0.0.1 | HEAD     "/"
Feb 27 10:19:42 pc-opss ollama[1248]: [GIN] 2024/02/27 - 10:19:42 | 200 |      298.38µs |       127.0.0.1 | POST     "/api/show"
Feb 27 10:19:51 pc-opss ollama[1248]: [GIN] 2024/02/27 - 10:19:51 | 200 |  9.558827934s |       127.0.0.1 | POST     "/api/generate"
Feb 27 10:22:13 pc-opss ollama[1248]: [GIN] 2024/02/27 - 10:22:13 | 200 |      12.532µs |       127.0.0.1 | HEAD     "/"
Feb 27 10:22:13 pc-opss ollama[1248]: [GIN] 2024/02/27 - 10:22:13 | 200 |     202.579µs |       127.0.0.1 | POST     "/api/show"
Feb 27 10:22:13 pc-opss ollama[1248]: [GIN] 2024/02/27 - 10:22:13 | 200 |      232.93µs |       127.0.0.1 | POST     "/api/show"
Feb 27 10:22:13 pc-opss ollama[1248]: [GIN] 2024/02/27 - 10:22:13 | 200 |      96.649µs |       127.0.0.1 | POST     "/api/chat"
Feb 27 10:23:00 pc-opss ollama[1248]: [GIN] 2024/02/27 - 10:23:00 | 200 | 30.669833967s |       127.0.0.1 | POST     "/api/chat"

@oodzchen commented on GitHub (Feb 27, 2024): > Can you share the server log so we can see why it's not able to detect your GPU? Here's the output of `journalctl -xeu ollama.service` ``` Feb 26 22:45:01 pc-opss systemd[1]: Started Ollama Service. ░░ Subject: A start job for unit ollama.service has finished successfully ░░ Defined-By: systemd ░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel ░░ ░░ A start job for unit ollama.service has finished successfully. ░░ ░░ The job identifier is 302. Feb 26 22:45:01 pc-opss ollama[1248]: time=2024-02-26T22:45:01.556+08:00 level=INFO source=images.go:710 msg="total blobs: 18" Feb 26 22:45:01 pc-opss ollama[1248]: time=2024-02-26T22:45:01.558+08:00 level=INFO source=images.go:717 msg="total unused blobs removed: 0" Feb 26 22:45:01 pc-opss ollama[1248]: time=2024-02-26T22:45:01.559+08:00 level=INFO source=routes.go:1019 msg="Listening on 127.0.0.1:11434 (version 0.1.27)" Feb 26 22:45:01 pc-opss ollama[1248]: time=2024-02-26T22:45:01.560+08:00 level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." Feb 26 22:45:03 pc-opss ollama[1248]: time=2024-02-26T22:45:03.043+08:00 level=INFO source=payload_common.go:146 msg="Dynamic LLM libraries [rocm_v6 cpu_avx2 rocm_v5 cpu cpu_avx cuda_v11]" Feb 26 22:45:03 pc-opss ollama[1248]: time=2024-02-26T22:45:03.043+08:00 level=INFO source=gpu.go:94 msg="Detecting GPU type" Feb 26 22:45:03 pc-opss ollama[1248]: time=2024-02-26T22:45:03.043+08:00 level=INFO source=gpu.go:265 msg="Searching for GPU management library libnvidia-ml.so" Feb 26 22:45:03 pc-opss ollama[1248]: time=2024-02-26T22:45:03.059+08:00 level=INFO source=gpu.go:311 msg="Discovered GPU libraries: [/usr/lib/libnvidia-ml.so.545.29.06 /usr/lib64/libnvidia-ml.so.545.29.06]" Feb 26 22:45:03 pc-opss ollama[1248]: time=2024-02-26T22:45:03.059+08:00 level=INFO source=gpu.go:323 msg="Unable to load CUDA management library /usr/lib/libnvidia-ml.so.545.29.06: Unable to load /usr/lib/libnvidia-ml.so.545.29.06 library to query for Nvidia GPUs: /usr/lib/libnvidia-ml.so.545.29.06: wrong ELF class: ELFCLASS32" Feb 26 22:45:03 pc-opss ollama[1248]: time=2024-02-26T22:45:03.065+08:00 level=INFO source=gpu.go:323 msg="Unable to load CUDA management library /usr/lib64/libnvidia-ml.so.545.29.06: nvml vram init failure: 4" Feb 26 22:45:03 pc-opss ollama[1248]: time=2024-02-26T22:45:03.065+08:00 level=INFO source=gpu.go:265 msg="Searching for GPU management library librocm_smi64.so" Feb 26 22:45:03 pc-opss ollama[1248]: time=2024-02-26T22:45:03.065+08:00 level=INFO source=gpu.go:311 msg="Discovered GPU libraries: []" Feb 26 22:45:03 pc-opss ollama[1248]: time=2024-02-26T22:45:03.065+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" Feb 26 22:45:03 pc-opss ollama[1248]: time=2024-02-26T22:45:03.065+08:00 level=INFO source=routes.go:1042 msg="no GPU detected" Feb 27 10:14:44 pc-opss ollama[1248]: [GIN] 2024/02/27 - 10:14:44 | 200 | 28.694µs | 127.0.0.1 | HEAD "/" Feb 27 10:14:44 pc-opss ollama[1248]: [GIN] 2024/02/27 - 10:14:44 | 200 | 1.607655ms | 127.0.0.1 | POST "/api/show" Feb 27 10:14:44 pc-opss ollama[1248]: [GIN] 2024/02/27 - 10:14:44 | 200 | 1.136526ms | 127.0.0.1 | POST "/api/show" Feb 27 10:14:44 pc-opss ollama[1248]: time=2024-02-27T10:14:44.974+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" Feb 27 10:14:44 pc-opss ollama[1248]: time=2024-02-27T10:14:44.974+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" Feb 27 10:14:44 pc-opss ollama[1248]: time=2024-02-27T10:14:44.974+08:00 level=INFO source=llm.go:77 msg="GPU not available, falling back to CPU" Feb 27 10:14:44 pc-opss ollama[1248]: time=2024-02-27T10:14:44.974+08:00 level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama1034691504/cpu_avx2/libext_server.so" Feb 27 10:14:44 pc-opss ollama[1248]: time=2024-02-27T10:14:44.974+08:00 level=INFO source=dyn_ext_server.go:150 msg="Initializing llama server" Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: loaded meta data with 23 key-value pairs and 291 tensors from /usr/share/ollama/.ollama/models/blobs/sha256:8934d96d3f08982e95922b2b7a2c626a1fe873d7c3b06e8e56d7bc0a1fef9246 (version GGUF V3 (latest)) Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: - kv 0: general.architecture str = llama Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: - kv 1: general.name str = LLaMA v2 Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: - kv 2: llama.context_length u32 = 4096 Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: - kv 3: llama.embedding_length u32 = 4096 Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: - kv 4: llama.block_count u32 = 32 Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: - kv 5: llama.feed_forward_length u32 = 11008 Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128 Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 32 Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: - kv 10: general.file_type u32 = 2 Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: - kv 11: tokenizer.ggml.model str = llama Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: - kv 12: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<... Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: - kv 13: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000... Feb 27 10:14:44 pc-opss ollama[1248]: llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... Feb 27 10:14:45 pc-opss ollama[1248]: llama_model_loader: - kv 15: tokenizer.ggml.merges arr[str,61249] = ["▁ t", "e r", "i n", "▁ a", "e n... Feb 27 10:14:45 pc-opss ollama[1248]: llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1 Feb 27 10:14:45 pc-opss ollama[1248]: llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2 Feb 27 10:14:45 pc-opss ollama[1248]: llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0 Feb 27 10:14:45 pc-opss ollama[1248]: llama_model_loader: - kv 19: tokenizer.ggml.add_bos_token bool = true Feb 27 10:14:45 pc-opss ollama[1248]: llama_model_loader: - kv 20: tokenizer.ggml.add_eos_token bool = false Feb 27 10:14:45 pc-opss ollama[1248]: llama_model_loader: - kv 21: tokenizer.chat_template str = {% if messages[0]['role'] == 'system'... Feb 27 10:14:45 pc-opss ollama[1248]: llama_model_loader: - kv 22: general.quantization_version u32 = 2 Feb 27 10:14:45 pc-opss ollama[1248]: llama_model_loader: - type f32: 65 tensors Feb 27 10:14:45 pc-opss ollama[1248]: llama_model_loader: - type q4_0: 225 tensors Feb 27 10:14:45 pc-opss ollama[1248]: llama_model_loader: - type q6_K: 1 tensors Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_vocab: special tokens definition check successful ( 259/32000 ). Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: format = GGUF V3 (latest) Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: arch = llama Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: vocab type = SPM Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_vocab = 32000 Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_merges = 0 Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_ctx_train = 4096 Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_embd = 4096 Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_head = 32 Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_head_kv = 32 Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_layer = 32 Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_rot = 128 Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_embd_head_k = 128 Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_embd_head_v = 128 Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_gqa = 1 Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_embd_k_gqa = 4096 Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_embd_v_gqa = 4096 Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: f_norm_eps = 0.0e+00 Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: f_norm_rms_eps = 1.0e-05 Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: f_clamp_kqv = 0.0e+00 Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: f_max_alibi_bias = 0.0e+00 Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_ff = 11008 Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_expert = 0 Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_expert_used = 0 Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: rope scaling = linear Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: freq_base_train = 10000.0 Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: freq_scale_train = 1 Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: n_yarn_orig_ctx = 4096 Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: rope_finetuned = unknown Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: model type = 7B Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: model ftype = Q4_0 Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: model params = 6.74 B Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: model size = 3.56 GiB (4.54 BPW) Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: general.name = LLaMA v2 Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: BOS token = 1 '<s>' Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: EOS token = 2 '</s>' Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: UNK token = 0 '<unk>' Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_print_meta: LF token = 13 '<0x0A>' Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_tensors: ggml ctx size = 0.11 MiB Feb 27 10:14:45 pc-opss ollama[1248]: llm_load_tensors: CPU buffer size = 3647.87 MiB Feb 27 10:14:45 pc-opss ollama[1248]: .................................................................................................. Feb 27 10:14:45 pc-opss ollama[1248]: llama_new_context_with_model: n_ctx = 2048 Feb 27 10:14:45 pc-opss ollama[1248]: llama_new_context_with_model: freq_base = 10000.0 Feb 27 10:14:45 pc-opss ollama[1248]: llama_new_context_with_model: freq_scale = 1 Feb 27 10:14:45 pc-opss ollama[1248]: llama_kv_cache_init: CPU KV buffer size = 1024.00 MiB Feb 27 10:14:45 pc-opss ollama[1248]: llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB Feb 27 10:14:45 pc-opss ollama[1248]: llama_new_context_with_model: CPU input buffer size = 13.02 MiB Feb 27 10:14:45 pc-opss ollama[1248]: llama_new_context_with_model: CPU compute buffer size = 160.00 MiB Feb 27 10:14:45 pc-opss ollama[1248]: llama_new_context_with_model: graph splits (measure): 1 Feb 27 10:14:46 pc-opss ollama[1248]: time=2024-02-27T10:14:46.008+08:00 level=INFO source=dyn_ext_server.go:161 msg="Starting llama main loop" Feb 27 10:14:46 pc-opss ollama[1248]: [GIN] 2024/02/27 - 10:14:46 | 200 | 1.156281055s | 127.0.0.1 | POST "/api/chat" Feb 27 10:19:42 pc-opss ollama[1248]: [GIN] 2024/02/27 - 10:19:42 | 200 | 12.678µs | 127.0.0.1 | HEAD "/" Feb 27 10:19:42 pc-opss ollama[1248]: [GIN] 2024/02/27 - 10:19:42 | 200 | 298.38µs | 127.0.0.1 | POST "/api/show" Feb 27 10:19:51 pc-opss ollama[1248]: [GIN] 2024/02/27 - 10:19:51 | 200 | 9.558827934s | 127.0.0.1 | POST "/api/generate" Feb 27 10:22:13 pc-opss ollama[1248]: [GIN] 2024/02/27 - 10:22:13 | 200 | 12.532µs | 127.0.0.1 | HEAD "/" Feb 27 10:22:13 pc-opss ollama[1248]: [GIN] 2024/02/27 - 10:22:13 | 200 | 202.579µs | 127.0.0.1 | POST "/api/show" Feb 27 10:22:13 pc-opss ollama[1248]: [GIN] 2024/02/27 - 10:22:13 | 200 | 232.93µs | 127.0.0.1 | POST "/api/show" Feb 27 10:22:13 pc-opss ollama[1248]: [GIN] 2024/02/27 - 10:22:13 | 200 | 96.649µs | 127.0.0.1 | POST "/api/chat" Feb 27 10:23:00 pc-opss ollama[1248]: [GIN] 2024/02/27 - 10:23:00 | 200 | 30.669833967s | 127.0.0.1 | POST "/api/chat" ```

GiteaMirror commented

2026-04-28 06:30:21 -05:00

@alienatorZ commented on GitHub (Feb 27, 2024):

I am having the same issue. I run with Ubuntu server 22.04, 2 Nvidia Tesla p40s. I run llama.cpp on the GPUs no problem. Ollama detected Nvidia GPU during installation but still runs on CPU.

@alienatorZ commented on GitHub (Feb 27, 2024): I am having the same issue. I run with Ubuntu server 22.04, 2 Nvidia Tesla p40s. I run llama.cpp on the GPUs no problem. Ollama detected Nvidia GPU during installation but still runs on CPU.

GiteaMirror commented

2026-04-28 06:30:22 -05:00

@alienatorZ commented on GitHub (Feb 27, 2024):

my systemctl log logs suspect:
Feb 27 13:21:19 llmsrv ollama[285412]: time=2024-02-27T13:21:19.370Z level=INFO source=routes.go:1019 msg="Listening on 127.0.0.1:11434 (version 0.1.27)"
Feb 27 13:21:19 llmsrv ollama[285412]: time=2024-02-27T13:21:19.371Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
Feb 27 13:21:21 llmsrv ollama[285412]: time=2024-02-27T13:21:21.698Z level=INFO source=payload_common.go:146 msg="Dynamic LLM libraries [rocm_v5 cpu_avx cpu cpu_avx2 rocm_v6 cuda_v11>
Feb 27 13:21:21 llmsrv ollama[285412]: time=2024-02-27T13:21:21.698Z level=INFO source=gpu.go:94 msg="Detecting GPU type"
Feb 27 13:21:21 llmsrv ollama[285412]: time=2024-02-27T13:21:21.698Z level=INFO source=gpu.go:265 msg="Searching for GPU management library libnvidia-ml.so"
Feb 27 13:21:21 llmsrv ollama[285412]: time=2024-02-27T13:21:21.701Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.535.154.05>
Feb 27 13:21:22 llmsrv ollama[285412]: time=2024-02-27T13:21:22.159Z level=INFO source=gpu.go:99 msg="Nvidia GPU detected"
Feb 27 13:21:22 llmsrv ollama[285412]: time=2024-02-27T13:21:22.159Z level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions"
Feb 27 13:21:22 llmsrv ollama[285412]: time=2024-02-27T13:21:22.159Z level=WARN source=gpu.go:128 msg="CPU does not have AVX or AVX2, disabling GPU support."
Feb 27 13:21:22 llmsrv ollama[285412]: time=2024-02-27T13:21:22.159Z level=INFO source=routes.go:1042 msg="no GPU detected"

Why does it detect GPU then because of No AVX it disables GPU?

@alienatorZ commented on GitHub (Feb 27, 2024): my systemctl log logs suspect: Feb 27 13:21:19 llmsrv ollama[285412]: time=2024-02-27T13:21:19.370Z level=INFO source=routes.go:1019 msg="Listening on 127.0.0.1:11434 (version 0.1.27)" Feb 27 13:21:19 llmsrv ollama[285412]: time=2024-02-27T13:21:19.371Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." Feb 27 13:21:21 llmsrv ollama[285412]: time=2024-02-27T13:21:21.698Z level=INFO source=payload_common.go:146 msg="Dynamic LLM libraries [rocm_v5 cpu_avx cpu cpu_avx2 rocm_v6 cuda_v11> Feb 27 13:21:21 llmsrv ollama[285412]: time=2024-02-27T13:21:21.698Z level=INFO source=gpu.go:94 msg="Detecting GPU type" Feb 27 13:21:21 llmsrv ollama[285412]: time=2024-02-27T13:21:21.698Z level=INFO source=gpu.go:265 msg="Searching for GPU management library libnvidia-ml.so" Feb 27 13:21:21 llmsrv ollama[285412]: time=2024-02-27T13:21:21.701Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.535.154.05> Feb 27 13:21:22 llmsrv ollama[285412]: time=2024-02-27T13:21:22.159Z level=INFO source=gpu.go:99 msg="Nvidia GPU detected" Feb 27 13:21:22 llmsrv ollama[285412]: time=2024-02-27T13:21:22.159Z level=INFO source=cpu_common.go:18 msg="CPU does not have vector extensions" Feb 27 13:21:22 llmsrv ollama[285412]: time=2024-02-27T13:21:22.159Z level=WARN source=gpu.go:128 msg="CPU does not have AVX or AVX2, disabling GPU support." Feb 27 13:21:22 llmsrv ollama[285412]: time=2024-02-27T13:21:22.159Z level=INFO source=routes.go:1042 msg="no GPU detected" Why does it detect GPU then because of No AVX it disables GPU?

GiteaMirror commented

2026-04-28 06:30:22 -05:00

@jaifar530 commented on GitHub (Feb 27, 2024):

I am having the same issue. I run with Ubuntu server 22.04, 2 Nvidia Tesla p40s. I run llama.cpp on the GPUs no problem. Ollama detected Nvidia GPU during installation but still runs on CPU.

Can you try it on small LLM ex. 2b , at same time run nvtop and see if gpu is utilised

@jaifar530 commented on GitHub (Feb 27, 2024): > I am having the same issue. I run with Ubuntu server 22.04, 2 Nvidia Tesla p40s. I run llama.cpp on the GPUs no problem. Ollama detected Nvidia GPU during installation but still runs on CPU. Can you try it on small LLM ex. 2b , at same time run nvtop and see if gpu is utilised

GiteaMirror commented

2026-04-28 06:30:23 -05:00

@alienatorZ commented on GitHub (Feb 27, 2024):

using Phi 2.7b still maxing CPU not using GPU.

@alienatorZ commented on GitHub (Feb 27, 2024): using Phi 2.7b still maxing CPU not using GPU.

GiteaMirror commented

2026-04-28 06:30:24 -05:00

@dhiltgen commented on GitHub (Feb 27, 2024):

@oodzchen thanks for the logs.

Feb 26 22:45:03 pc-opss ollama[1248]: time=2024-02-26T22:45:03.065+08:00 level=INFO source=gpu.go:323 msg="Unable to load CUDA management library /usr/lib64/libnvidia-ml.so.545.29.06: nvml vram init failure: 4"

This is the root cause of not being able to discover the GPU. I believe this maps to NVML_ERROR_NO_PERMISSION = 4 The current user does not have permission for operation.

As a quick test, you can try shutting down the system service (which runs as user ollama) and try to run it as root and confirm it properly detects the GPU.

What linux distro are you using?

@dhiltgen commented on GitHub (Feb 27, 2024): @oodzchen thanks for the logs. ``` Feb 26 22:45:03 pc-opss ollama[1248]: time=2024-02-26T22:45:03.065+08:00 level=INFO source=gpu.go:323 msg="Unable to load CUDA management library /usr/lib64/libnvidia-ml.so.545.29.06: nvml vram init failure: 4" ``` This is the root cause of not being able to discover the GPU. I believe this maps to `NVML_ERROR_NO_PERMISSION = 4 The current user does not have permission for operation.` As a quick test, you can try shutting down the system service (which runs as user `ollama`) and try to run it as root and confirm it properly detects the GPU. What linux distro are you using?

GiteaMirror commented

2026-04-28 06:30:24 -05:00

@dhiltgen commented on GitHub (Feb 27, 2024):

@shersoni610 can you provide your server log?

@dhiltgen commented on GitHub (Feb 27, 2024): @shersoni610 can you provide your server log?

GiteaMirror commented

2026-04-28 06:30:25 -05:00

@dhiltgen commented on GitHub (Feb 27, 2024):

@alienatorZ you hit issue #2187 - at present, we require AVX support for GPUs to be activated.

@dhiltgen commented on GitHub (Feb 27, 2024): @alienatorZ you hit issue #2187 - at present, we require AVX support for GPUs to be activated.

GiteaMirror commented

2026-04-28 06:30:25 -05:00

@alienatorZ commented on GitHub (Feb 28, 2024):

@dhiltgen Ahhhh I see now. Yes Proxmox was not enabling AVX! Thank you.

@alienatorZ commented on GitHub (Feb 28, 2024): @dhiltgen Ahhhh I see now. Yes Proxmox was not enabling AVX! Thank you.

GiteaMirror commented

2026-04-28 06:30:26 -05:00

@oodzchen commented on GitHub (Feb 28, 2024):

@dhiltgen I've edit the /etc/systemd/system/ollama.service, changed the Group= and User= to root, reload the daemons, restart the ollama service, now the GPU could be used. Thanks very much.

The distro I'm using is openSUSE Tumbleweed.

@oodzchen commented on GitHub (Feb 28, 2024): @dhiltgen I've edit the `/etc/systemd/system/ollama.service`, changed the `Group=` and `User=` to root, reload the daemons, restart the ollama service, now the GPU could be used. Thanks very much. The distro I'm using is openSUSE Tumbleweed.

GiteaMirror commented

2026-04-28 06:30:27 -05:00

@dhiltgen commented on GitHub (Feb 28, 2024):

@oodzchen that's great to hear. We would like to set things up to be able to run deprivileged so Ollama isn't running as root. There's probably a minor fix needed here in our install script to wire things up properly. We add the ollama user to the render group, but that must not be correct on openSUSE. Are you able to tell what group we should add from your system? Presumably it's some group your user account was added to so you can run tools like nvidia-smi

If you want to experiment, try adding the ollama user to the plausible groups until it works. See https://github.com/ollama/ollama/blob/main/scripts/install.sh#L87-L90 for reference.

@dhiltgen commented on GitHub (Feb 28, 2024): @oodzchen that's great to hear. We would like to set things up to be able to run deprivileged so Ollama isn't running as root. There's probably a minor fix needed here in our install script to wire things up properly. We add the ollama user to the `render` group, but that must not be correct on openSUSE. Are you able to tell what group we should add from your system? Presumably it's some group your user account was added to so you can run tools like `nvidia-smi` If you want to experiment, try adding the ollama user to the plausible groups until it works. See https://github.com/ollama/ollama/blob/main/scripts/install.sh#L87-L90 for reference.

GiteaMirror commented

2026-04-28 06:30:29 -05:00

@oodzchen commented on GitHub (Feb 29, 2024):

@dhiltgen I've manually reset the ollama service file, and tried adding ollama user to different groups and reload the services, turns out video is the right group.

@oodzchen commented on GitHub (Feb 29, 2024): @dhiltgen I've manually reset the ollama service file, and tried adding ollama user to different groups and reload the services, turns out `video` is the right group.

GiteaMirror commented

2026-04-28 06:30:30 -05:00

@xiaotianfotos commented on GitHub (Feb 29, 2024):

i 'm using Dify to connect to ollama service
when using ollama API,always load to CPU memory,
but when using Open AI API (by ollama),will load to GPU

@xiaotianfotos commented on GitHub (Feb 29, 2024): i 'm using Dify to connect to ollama service when using ollama API,always load to CPU memory, but when using Open AI API (by ollama),will load to GPU ![111111](https://github.com/ollama/ollama/assets/25025807/e3c6e59d-030f-4e1b-bca0-d448cf2830d4)

GiteaMirror commented

2026-04-28 06:30:31 -05:00

@dhiltgen commented on GitHub (Feb 29, 2024):

@shersoni610 since you opened the issue, I'd like to confirm the fix I'm about to merge for @oodzchen will fix your problem as well before I close this issue.

@dhiltgen commented on GitHub (Feb 29, 2024): @shersoni610 since you opened the issue, I'd like to confirm the fix I'm about to merge for @oodzchen will fix your problem as well before I close this issue.

GiteaMirror commented

2026-04-28 06:30:31 -05:00

@dhiltgen commented on GitHub (Feb 29, 2024):

@xiaotianfotos can you open a new issue with logs attached for showing the two runs.

@dhiltgen commented on GitHub (Feb 29, 2024): @xiaotianfotos can you open a new issue with logs attached for showing the two runs.

GiteaMirror commented

2026-04-28 06:30:32 -05:00

@raffaelemancuso commented on GitHub (Jun 7, 2025):

nvtop seems only for Linux. If you have Windows, use nvitop: uvx nvitop

@raffaelemancuso commented on GitHub (Jun 7, 2025): `nvtop` seems only for Linux. If you have Windows, use `nvitop`: `uvx nvitop`

Sign in to join this conversation.

Branches Tags

main

parth-mlx-decode-checkpoints

dhiltgen/ci

hoyyeva/editor-config-repair

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

hoyyeva/launch-backup-ux

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

brucemacd/download-before-remove

parth/update-claude-docs

parth-anthropic-reference-images-path

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#48034