[GH-ISSUE #4008] Compute Capability Misidentification with PhysX cudart library #48996

Closed
opened 2026-04-28 10:35:08 -05:00 by GiteaMirror · 24 comments
Owner

Originally created by @aaronjrod on GitHub (Apr 28, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/4008

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

Ollama server incorrectly identifies the Compute Capability of my GPU (detects 1.0 instead of 5.2). It seems to me that this is due to a recent change in gpu/gpu.go. Thanks!

Previously: CUDART CUDA Compute Capability detected: 5.2
Now: CUDA GPU is too old. Compute Capability detected: 1.0

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.1.33-rc5

Workaround

Remove c:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\ from your PATH environment variable so Ollama does not use this cuda runtime library.

Originally created by @aaronjrod on GitHub (Apr 28, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/4008 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? Ollama server incorrectly identifies the Compute Capability of my GPU (detects 1.0 instead of 5.2). It seems to me that this is due to a recent change in [gpu/gpu.go](https://github.com/ollama/ollama/commit/34b9db5afc43b352c5ef04fe6ef52684bfdd57b5#diff-b3bde438f86c17903c484c6a1f48f7c98437f5ed1906742c3075342d748ce7ec). Thanks! Previously: CUDART CUDA Compute Capability detected: 5.2 Now: CUDA GPU is too old. Compute Capability detected: 1.0 ### OS Windows ### GPU Nvidia ### CPU Intel ### Ollama version 0.1.33-rc5 ## Workaround Remove `c:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\` from your `PATH` environment variable so Ollama does not use this cuda runtime library.
GiteaMirror added the bugnvidia labels 2026-04-28 10:35:08 -05:00
Author
Owner

@dhiltgen commented on GitHub (Apr 28, 2024):

I'll try to find it by code inspection, but could you share a server log with OLLAMA_DEBUG=1 set?

<!-- gh-comment-id:2081666938 --> @dhiltgen commented on GitHub (Apr 28, 2024): I'll try to find it by code inspection, but could you share a server log with OLLAMA_DEBUG=1 set?
Author
Owner

@aaronjrod commented on GitHub (Apr 28, 2024):

PS C:\Users\Aaron> $env:OLLAMA_DEBUG="1"
PS C:\Users\Aaron> ollama serve
time=2024-04-28T16:56:46.238-04:00 level=INFO source=images.go:821 msg="total blobs: 5"
time=2024-04-28T16:56:46.240-04:00 level=INFO source=images.go:828 msg="total unused blobs removed: 0"
time=2024-04-28T16:56:46.241-04:00 level=INFO source=routes.go:1074 msg="Listening on 127.0.0.1:11434 (version 0.1.33-rc5)"
time=2024-04-28T16:56:46.241-04:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=C:\Users\Aaron\AppData\Local\Programs\Ollama\ollama_runners\cpu
time=2024-04-28T16:56:46.241-04:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=C:\Users\Aaron\AppData\Local\Programs\Ollama\ollama_runners\cpu_avx
time=2024-04-28T16:56:46.241-04:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=C:\Users\Aaron\AppData\Local\Programs\Ollama\ollama_runners\cpu_avx2
time=2024-04-28T16:56:46.241-04:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=C:\Users\Aaron\AppData\Local\Programs\Ollama\ollama_runners\cuda_v11.3
time=2024-04-28T16:56:46.241-04:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=C:\Users\Aaron\AppData\Local\Programs\Ollama\ollama_runners\rocm_v5.7
time=2024-04-28T16:56:46.241-04:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cuda_v11.3 rocm_v5.7 cpu]"
time=2024-04-28T16:56:46.242-04:00 level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-04-28T16:56:46.242-04:00 level=DEBUG source=sched.go:105 msg="starting llm scheduler"
time=2024-04-28T16:56:46.242-04:00 level=INFO source=gpu.go:96 msg="Detecting GPUs"
time=2024-04-28T16:56:46.243-04:00 level=DEBUG source=gpu.go:203 msg="Searching for GPU library" name=cudart64_*.dll
time=2024-04-28T16:56:46.243-04:00 level=DEBUG source=gpu.go:221 msg="gpu library search" globs="[C:\\Users\\Aaron\\AppData\\Local\\Programs\\Ollama\\cudart64_*.dll c:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v*\\bin\\cudart64_*.dll C:\\Python38\\Scripts\\cudart64_*.dll* C:\\Python38\\cudart64_*.dll* C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common\\cudart64_*.dll* C:\\Program Files (x86)\\Intel\\iCLS Client\\cudart64_*.dll* C:\\Program Files\\Intel\\iCLS Client\\cudart64_*.dll* C:\\WINDOWS\\system32\\cudart64_*.dll* C:\\WINDOWS\\cudart64_*.dll* C:\\WINDOWS\\System32\\Wbem\\cudart64_*.dll* C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\cudart64_*.dll* C:\\Program Files (x86)\\Intel\\Intel(R) Management Engine Components\\DAL\\cudart64_*.dll* C:\\Program Files\\Intel\\Intel(R) Management Engine Components\\DAL\\cudart64_*.dll* C:\\Program Files (x86)\\Intel\\Intel(R) Management Engine Components\\IPT\\cudart64_*.dll* C:\\Program Files\\Intel\\Intel(R) Management Engine Components\\IPT\\cudart64_*.dll* C:\\Program Files\\TortoiseGit\\bin\\cudart64_*.dll* C:\\Program Files\\nodejs\\cudart64_*.dll* C:\\UnxTools\\cudart64_*.dll* C:\\WINDOWS\\System32\\OpenSSH\\cudart64_*.dll* C:\\Program Files\\dotnet\\cudart64_*.dll* C:\\Program Files\\Microsoft SQL Server\\130\\Tools\\Binn\\cudart64_*.dll* C:\\Program Files\\Microsoft SQL Server\\Client SDK\\ODBC\\170\\Tools\\Binn\\cudart64_*.dll* C:\\Program Files (x86)\\Yarn\\bin\\cudart64_*.dll* C:\\ProgramData\\chocolatey\\bin\\cudart64_*.dll* c:\\k\\cudart64_*.dll* C:\\tools\\java\\jdk1.8.0_221\\bin\\cudart64_*.dll* C:\\Program Files\\Git\\cmd\\cudart64_*.dll* C:\\Program Files\\NVIDIA Corporation\\Nsight Compute 2024.1.1\\cudart64_*.dll* C:\\Users\\Aaron\\AppData\\Local\\Microsoft\\WindowsApps\\cudart64_*.dll* C:\\Users\\Aaron\\AppData\\Local\\atom\\bin\\cudart64_*.dll* C:\\Users\\Aaron\\AppData\\Local\\Programs\\Git\\cmd\\cudart64_*.dll* C:\\Users\\Aaron\\AppData\\Roaming\\npm\\cudart64_*.dll* C:\\Users\\Aaron\\AppData\\Local\\Microsoft\\WindowsApps\\cudart64_*.dll* C:\\Users\\Aaron\\cudart64_*.dll* C:\\Users\\Aaron\\AppData\\Local\\Programs\\Microsoft VS Code\\bin\\cudart64_*.dll* C:\\Users\\Aaron\\AppData\\Local\\Programs\\Ollama\\cudart64_*.dll*]"
time=2024-04-28T16:56:46.262-04:00 level=DEBUG source=gpu.go:249 msg="discovered GPU libraries" paths="[C:\\Users\\Aaron\\AppData\\Local\\Programs\\Ollama\\cudart64_110.dll C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common\\cudart64_60.dll]"
cudaSetDevice err: 3
time=2024-04-28T16:56:46.462-04:00 level=DEBUG source=gpu.go:261 msg="Unable to load cudart" library=C:\Users\Aaron\AppData\Local\Programs\Ollama\cudart64_110.dll error="cudart init failure: 3"
CUDA driver version: 9-1
time=2024-04-28T16:56:46.469-04:00 level=INFO source=gpu.go:101 msg="detected GPUs" library="C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common\\cudart64_60.dll" count=1
time=2024-04-28T16:56:46.470-04:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
[GPU-00000000-0100-0000-00c0-000000000000] CUDA totalMem 0
[GPU-00000000-0100-0000-00c0-000000000000] CUDA freeMem 3540602060
[GPU-00000000-0100-0000-00c0-000000000000] Compute Capability 1.0
time=2024-04-28T16:56:46.552-04:00 level=INFO source=gpu.go:148 msg="[0] CUDA GPU is too old. Compute Capability detected: 1.0"
time=2024-04-28T16:56:46.554-04:00 level=DEBUG source=amd_windows.go:32 msg="unable to load amdhip64.dll: The specified module could not be found."
<!-- gh-comment-id:2081698782 --> @aaronjrod commented on GitHub (Apr 28, 2024): ``` PS C:\Users\Aaron> $env:OLLAMA_DEBUG="1" PS C:\Users\Aaron> ollama serve time=2024-04-28T16:56:46.238-04:00 level=INFO source=images.go:821 msg="total blobs: 5" time=2024-04-28T16:56:46.240-04:00 level=INFO source=images.go:828 msg="total unused blobs removed: 0" time=2024-04-28T16:56:46.241-04:00 level=INFO source=routes.go:1074 msg="Listening on 127.0.0.1:11434 (version 0.1.33-rc5)" time=2024-04-28T16:56:46.241-04:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=C:\Users\Aaron\AppData\Local\Programs\Ollama\ollama_runners\cpu time=2024-04-28T16:56:46.241-04:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=C:\Users\Aaron\AppData\Local\Programs\Ollama\ollama_runners\cpu_avx time=2024-04-28T16:56:46.241-04:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=C:\Users\Aaron\AppData\Local\Programs\Ollama\ollama_runners\cpu_avx2 time=2024-04-28T16:56:46.241-04:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=C:\Users\Aaron\AppData\Local\Programs\Ollama\ollama_runners\cuda_v11.3 time=2024-04-28T16:56:46.241-04:00 level=DEBUG source=payload.go:71 msg="availableServers : found" file=C:\Users\Aaron\AppData\Local\Programs\Ollama\ollama_runners\rocm_v5.7 time=2024-04-28T16:56:46.241-04:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cuda_v11.3 rocm_v5.7 cpu]" time=2024-04-28T16:56:46.242-04:00 level=DEBUG source=payload.go:45 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" time=2024-04-28T16:56:46.242-04:00 level=DEBUG source=sched.go:105 msg="starting llm scheduler" time=2024-04-28T16:56:46.242-04:00 level=INFO source=gpu.go:96 msg="Detecting GPUs" time=2024-04-28T16:56:46.243-04:00 level=DEBUG source=gpu.go:203 msg="Searching for GPU library" name=cudart64_*.dll time=2024-04-28T16:56:46.243-04:00 level=DEBUG source=gpu.go:221 msg="gpu library search" globs="[C:\\Users\\Aaron\\AppData\\Local\\Programs\\Ollama\\cudart64_*.dll c:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v*\\bin\\cudart64_*.dll C:\\Python38\\Scripts\\cudart64_*.dll* C:\\Python38\\cudart64_*.dll* C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common\\cudart64_*.dll* C:\\Program Files (x86)\\Intel\\iCLS Client\\cudart64_*.dll* C:\\Program Files\\Intel\\iCLS Client\\cudart64_*.dll* C:\\WINDOWS\\system32\\cudart64_*.dll* C:\\WINDOWS\\cudart64_*.dll* C:\\WINDOWS\\System32\\Wbem\\cudart64_*.dll* C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\cudart64_*.dll* C:\\Program Files (x86)\\Intel\\Intel(R) Management Engine Components\\DAL\\cudart64_*.dll* C:\\Program Files\\Intel\\Intel(R) Management Engine Components\\DAL\\cudart64_*.dll* C:\\Program Files (x86)\\Intel\\Intel(R) Management Engine Components\\IPT\\cudart64_*.dll* C:\\Program Files\\Intel\\Intel(R) Management Engine Components\\IPT\\cudart64_*.dll* C:\\Program Files\\TortoiseGit\\bin\\cudart64_*.dll* C:\\Program Files\\nodejs\\cudart64_*.dll* C:\\UnxTools\\cudart64_*.dll* C:\\WINDOWS\\System32\\OpenSSH\\cudart64_*.dll* C:\\Program Files\\dotnet\\cudart64_*.dll* C:\\Program Files\\Microsoft SQL Server\\130\\Tools\\Binn\\cudart64_*.dll* C:\\Program Files\\Microsoft SQL Server\\Client SDK\\ODBC\\170\\Tools\\Binn\\cudart64_*.dll* C:\\Program Files (x86)\\Yarn\\bin\\cudart64_*.dll* C:\\ProgramData\\chocolatey\\bin\\cudart64_*.dll* c:\\k\\cudart64_*.dll* C:\\tools\\java\\jdk1.8.0_221\\bin\\cudart64_*.dll* C:\\Program Files\\Git\\cmd\\cudart64_*.dll* C:\\Program Files\\NVIDIA Corporation\\Nsight Compute 2024.1.1\\cudart64_*.dll* C:\\Users\\Aaron\\AppData\\Local\\Microsoft\\WindowsApps\\cudart64_*.dll* C:\\Users\\Aaron\\AppData\\Local\\atom\\bin\\cudart64_*.dll* C:\\Users\\Aaron\\AppData\\Local\\Programs\\Git\\cmd\\cudart64_*.dll* C:\\Users\\Aaron\\AppData\\Roaming\\npm\\cudart64_*.dll* C:\\Users\\Aaron\\AppData\\Local\\Microsoft\\WindowsApps\\cudart64_*.dll* C:\\Users\\Aaron\\cudart64_*.dll* C:\\Users\\Aaron\\AppData\\Local\\Programs\\Microsoft VS Code\\bin\\cudart64_*.dll* C:\\Users\\Aaron\\AppData\\Local\\Programs\\Ollama\\cudart64_*.dll*]" time=2024-04-28T16:56:46.262-04:00 level=DEBUG source=gpu.go:249 msg="discovered GPU libraries" paths="[C:\\Users\\Aaron\\AppData\\Local\\Programs\\Ollama\\cudart64_110.dll C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common\\cudart64_60.dll]" cudaSetDevice err: 3 time=2024-04-28T16:56:46.462-04:00 level=DEBUG source=gpu.go:261 msg="Unable to load cudart" library=C:\Users\Aaron\AppData\Local\Programs\Ollama\cudart64_110.dll error="cudart init failure: 3" CUDA driver version: 9-1 time=2024-04-28T16:56:46.469-04:00 level=INFO source=gpu.go:101 msg="detected GPUs" library="C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common\\cudart64_60.dll" count=1 time=2024-04-28T16:56:46.470-04:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" [GPU-00000000-0100-0000-00c0-000000000000] CUDA totalMem 0 [GPU-00000000-0100-0000-00c0-000000000000] CUDA freeMem 3540602060 [GPU-00000000-0100-0000-00c0-000000000000] Compute Capability 1.0 time=2024-04-28T16:56:46.552-04:00 level=INFO source=gpu.go:148 msg="[0] CUDA GPU is too old. Compute Capability detected: 1.0" time=2024-04-28T16:56:46.554-04:00 level=DEBUG source=amd_windows.go:32 msg="unable to load amdhip64.dll: The specified module could not be found." ```
Author
Owner

@dhiltgen commented on GitHub (Apr 30, 2024):

Yikes, yeah, those responses are definitely incorrect. I have a suspicion on what's going wrong, and should be able to get a fix before we finalize 0.1.33.

<!-- gh-comment-id:2085609333 --> @dhiltgen commented on GitHub (Apr 30, 2024): Yikes, yeah, those responses are definitely incorrect. ~~I have a suspicion on what's going wrong, and should be able to get a fix before we finalize 0.1.33.~~
Author
Owner

@dhiltgen commented on GitHub (Apr 30, 2024):

One data point that may help, can you search on your system for other instances of cudart64_*.dll and try putting those directories early in your PATH and see if it changes behavior?

In addition, can you share the output of nvidia-smi so I can see your driver version, and a bit more about your GPU?

<!-- gh-comment-id:2085653482 --> @dhiltgen commented on GitHub (Apr 30, 2024): One data point that may help, can you search on your system for other instances of `cudart64_*.dll` and try putting those directories early in your PATH and see if it changes behavior? In addition, can you share the output of `nvidia-smi` so I can see your driver version, and a bit more about your GPU?
Author
Owner

@dhiltgen commented on GitHub (Apr 30, 2024):

One more data point. On my test system running Win 11 pro, I have driver 546.12 and cuda v12.3 installed (as well as v11). Our bundled v11 cudart64_110.dll works with my driver and GPU, but if I force ollama to use C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common\\cudart64_65.dll I get a bogus Compute Capability 1.0 as well. Looking at the other fields in the response from cudaGetDeviceProperties it seems consistently wrong. Switching to other cudart libraries on my system I see correct results. I don't understand yet why the PhysX cudart library isn't working, but I think if you find another cuda library on your host and add that into the PATH before the PhysX directory it should start working.

<!-- gh-comment-id:2085759115 --> @dhiltgen commented on GitHub (Apr 30, 2024): One more data point. On my test system running Win 11 pro, I have driver 546.12 and cuda v12.3 installed (as well as v11). Our bundled v11 `cudart64_110.dll` works with my driver and GPU, but if I force ollama to use `C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common\\cudart64_65.dll` I get a bogus Compute Capability 1.0 as well. Looking at the other fields in the response from `cudaGetDeviceProperties` it seems consistently wrong. Switching to other cudart libraries on my system I see correct results. I don't understand yet why the PhysX cudart library isn't working, but I think if you find another cuda library on your host and add that into the PATH before the PhysX directory it should start working.
Author
Owner

@aaronjrod commented on GitHub (Apr 30, 2024):

No longer able to replicate the Compute Capability issue, updating Cuda and restarting a couple times may have done it 😅😅

By bundled, are you referring to the .dll stored in Programs/Ollama?

time=2024-04-30T18:43:51.616-04:00 level=INFO source=images.go:821 msg="total blobs: 5"
time=2024-04-30T18:43:51.622-04:00 level=INFO source=images.go:828 msg="total unused blobs removed: 0"
time=2024-04-30T18:43:51.622-04:00 level=INFO source=routes.go:1074 msg="Listening on 127.0.0.1:11434 (version 0.1.33-rc5)"
time=2024-04-30T18:43:51.622-04:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cuda_v11.3 rocm_v5.7 cpu]"
time=2024-04-30T18:43:51.622-04:00 level=INFO source=gpu.go:96 msg="Detecting GPUs"
time=2024-04-30T18:43:51.665-04:00 level=INFO source=gpu.go:101 msg="detected GPUs" library=C:\Users\Aaron\AppData\Local\Programs\Ollama\cudart64_110.dll count=1
time=2024-04-30T18:43:51.674-04:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
[GIN] 2024/04/30 - 18:44:18 | 200 |            0s |       127.0.0.1 | HEAD     "/"
[GIN] 2024/04/30 - 18:44:18 | 200 |      2.7838ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/04/30 - 18:44:18 | 200 |      2.5923ms |       127.0.0.1 | POST     "/api/show"
time=2024-04-30T18:44:18.674-04:00 level=INFO source=gpu.go:96 msg="Detecting GPUs"
time=2024-04-30T18:44:18.690-04:00 level=INFO source=gpu.go:101 msg="detected GPUs" library=C:\Users\Aaron\AppData\Local\Programs\Ollama\cudart64_110.dll count=1
time=2024-04-30T18:44:18.690-04:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-04-30T18:44:22.678-04:00 level=INFO source=memory.go:147 msg="offload to gpu" layers.real=-1 layers.estimate=17 memory.available="3381.9 MiB" memory.required.full="5033.0 MiB" memory.required.partial="3260.0 MiB" memory.required.kv="256.0 MiB" memory.weights.total="4156.0 MiB" memory.weights.repeating="3745.0 MiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="164.0 MiB" memory.graph.partial="677.5 MiB"
time=2024-04-30T18:44:22.678-04:00 level=INFO source=memory.go:147 msg="offload to gpu" layers.real=-1 layers.estimate=17 memory.available="3381.9 MiB" memory.required.full="5033.0 MiB" memory.required.partial="3260.0 MiB" memory.required.kv="256.0 MiB" memory.weights.total="4156.0 MiB" memory.weights.repeating="3745.0 MiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="164.0 MiB" memory.graph.partial="677.5 MiB"
time=2024-04-30T18:44:22.682-04:00 level=INFO source=memory.go:147 msg="offload to gpu" layers.real=-1 layers.estimate=17 memory.available="3381.9 MiB" memory.required.full="5033.0 MiB" memory.required.partial="3260.0 MiB" memory.required.kv="256.0 MiB" memory.weights.total="4156.0 MiB" memory.weights.repeating="3745.0 MiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="164.0 MiB" memory.graph.partial="677.5 MiB"
time=2024-04-30T18:44:22.682-04:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-04-30T18:44:22.704-04:00 level=INFO source=server.go:290 msg="starting llama server" cmd="C:\\Users\\Aaron\\AppData\\Local\\Programs\\Ollama\\ollama_runners\\cuda_v11.3\\ollama_llama_server.exe --model C:\\Users\\Aaron\\.ollama\\models\\blobs\\sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 17 --parallel 1 --port 51645"
time=2024-04-30T18:44:22.711-04:00 level=INFO source=sched.go:327 msg="loaded runners" count=1
time=2024-04-30T18:44:22.711-04:00 level=INFO source=server.go:439 msg="waiting for llama runner to start responding"
{"function":"server_params_parse","level":"INFO","line":2603,"msg":"logging to file is disabled.","tid":"15128","timestamp":1714517062}
{"build":2737,"commit":"46e12c4","function":"wmain","level":"INFO","line":2820,"msg":"build info","tid":"15128","timestamp":1714517062}
{"function":"wmain","level":"INFO","line":2827,"msg":"system info","n_threads":4,"n_threads_batch":-1,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | LAMMAFILE = 1 | ","tid":"15128","timestamp":1714517062,"total_threads":4}
llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from C:\Users\Aaron\.ollama\models\blobs\sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = Meta-Llama-3-8B-Instruct
llama_model_loader: - kv   2:                          llama.block_count u32              = 32
llama_model_loader: - kv   3:                       llama.context_length u32              = 8192
...
llm_load_print_meta: model params     = 8.03 B
llm_load_print_meta: model size       = 4.33 GiB (4.64 BPW)
llm_load_print_meta: general.name     = Meta-Llama-3-8B-Instruct
llm_load_print_meta: BOS token        = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token        = 128001 '<|end_of_text|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_print_meta: EOT token        = 128009 '<|eot_id|>'
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 970, compute capability 5.2, VMM: yes
llm_load_tensors: ggml ctx size =    0.30 MiB
llm_load_tensors: offloading 17 repeating layers to GPU
llm_load_tensors: offloaded 17/33 layers to GPU
llm_load_tensors:        CPU buffer size =  4437.80 MiB
llm_load_tensors:      CUDA0 buffer size =  1989.53 MiB
.......................................................................................
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: freq_base  = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:  CUDA_Host KV buffer size =   120.00 MiB
llama_kv_cache_init:      CUDA0 KV buffer size =   136.00 MiB
llama_new_context_with_model: KV self size  =  256.00 MiB, K (f16):  128.00 MiB, V (f16):  128.00 MiB
llama_new_context_with_model:  CUDA_Host  output buffer size =     0.50 MiB
llama_new_context_with_model:      CUDA0 compute buffer size =   677.48 MiB
llama_new_context_with_model:  CUDA_Host compute buffer size =    12.01 MiB
llama_new_context_with_model: graph nodes  = 1030
llama_new_context_with_model: graph splits = 169
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 512.15       Driver Version: 512.15       CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ... WDDM  | 00000000:01:00.0 Off |                  N/A |
| 36%   54C    P8    15W / 151W |   3175MiB /  4096MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     12592      C   ...3\ollama_llama_server.exe    N/A      |
+-----------------------------------------------------------------------------+
<!-- gh-comment-id:2087676134 --> @aaronjrod commented on GitHub (Apr 30, 2024): No longer able to replicate the Compute Capability issue, updating Cuda and restarting a couple times may have done it 😅😅 By bundled, are you referring to the .dll stored in Programs/Ollama? ``` time=2024-04-30T18:43:51.616-04:00 level=INFO source=images.go:821 msg="total blobs: 5" time=2024-04-30T18:43:51.622-04:00 level=INFO source=images.go:828 msg="total unused blobs removed: 0" time=2024-04-30T18:43:51.622-04:00 level=INFO source=routes.go:1074 msg="Listening on 127.0.0.1:11434 (version 0.1.33-rc5)" time=2024-04-30T18:43:51.622-04:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 cuda_v11.3 rocm_v5.7 cpu]" time=2024-04-30T18:43:51.622-04:00 level=INFO source=gpu.go:96 msg="Detecting GPUs" time=2024-04-30T18:43:51.665-04:00 level=INFO source=gpu.go:101 msg="detected GPUs" library=C:\Users\Aaron\AppData\Local\Programs\Ollama\cudart64_110.dll count=1 time=2024-04-30T18:43:51.674-04:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" [GIN] 2024/04/30 - 18:44:18 | 200 | 0s | 127.0.0.1 | HEAD "/" [GIN] 2024/04/30 - 18:44:18 | 200 | 2.7838ms | 127.0.0.1 | POST "/api/show" [GIN] 2024/04/30 - 18:44:18 | 200 | 2.5923ms | 127.0.0.1 | POST "/api/show" time=2024-04-30T18:44:18.674-04:00 level=INFO source=gpu.go:96 msg="Detecting GPUs" time=2024-04-30T18:44:18.690-04:00 level=INFO source=gpu.go:101 msg="detected GPUs" library=C:\Users\Aaron\AppData\Local\Programs\Ollama\cudart64_110.dll count=1 time=2024-04-30T18:44:18.690-04:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-04-30T18:44:22.678-04:00 level=INFO source=memory.go:147 msg="offload to gpu" layers.real=-1 layers.estimate=17 memory.available="3381.9 MiB" memory.required.full="5033.0 MiB" memory.required.partial="3260.0 MiB" memory.required.kv="256.0 MiB" memory.weights.total="4156.0 MiB" memory.weights.repeating="3745.0 MiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="164.0 MiB" memory.graph.partial="677.5 MiB" time=2024-04-30T18:44:22.678-04:00 level=INFO source=memory.go:147 msg="offload to gpu" layers.real=-1 layers.estimate=17 memory.available="3381.9 MiB" memory.required.full="5033.0 MiB" memory.required.partial="3260.0 MiB" memory.required.kv="256.0 MiB" memory.weights.total="4156.0 MiB" memory.weights.repeating="3745.0 MiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="164.0 MiB" memory.graph.partial="677.5 MiB" time=2024-04-30T18:44:22.682-04:00 level=INFO source=memory.go:147 msg="offload to gpu" layers.real=-1 layers.estimate=17 memory.available="3381.9 MiB" memory.required.full="5033.0 MiB" memory.required.partial="3260.0 MiB" memory.required.kv="256.0 MiB" memory.weights.total="4156.0 MiB" memory.weights.repeating="3745.0 MiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="164.0 MiB" memory.graph.partial="677.5 MiB" time=2024-04-30T18:44:22.682-04:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-04-30T18:44:22.704-04:00 level=INFO source=server.go:290 msg="starting llama server" cmd="C:\\Users\\Aaron\\AppData\\Local\\Programs\\Ollama\\ollama_runners\\cuda_v11.3\\ollama_llama_server.exe --model C:\\Users\\Aaron\\.ollama\\models\\blobs\\sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 17 --parallel 1 --port 51645" time=2024-04-30T18:44:22.711-04:00 level=INFO source=sched.go:327 msg="loaded runners" count=1 time=2024-04-30T18:44:22.711-04:00 level=INFO source=server.go:439 msg="waiting for llama runner to start responding" {"function":"server_params_parse","level":"INFO","line":2603,"msg":"logging to file is disabled.","tid":"15128","timestamp":1714517062} {"build":2737,"commit":"46e12c4","function":"wmain","level":"INFO","line":2820,"msg":"build info","tid":"15128","timestamp":1714517062} {"function":"wmain","level":"INFO","line":2827,"msg":"system info","n_threads":4,"n_threads_batch":-1,"system_info":"AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | LAMMAFILE = 1 | ","tid":"15128","timestamp":1714517062,"total_threads":4} llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from C:\Users\Aaron\.ollama\models\blobs\sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = Meta-Llama-3-8B-Instruct llama_model_loader: - kv 2: llama.block_count u32 = 32 llama_model_loader: - kv 3: llama.context_length u32 = 8192 ... llm_load_print_meta: model params = 8.03 B llm_load_print_meta: model size = 4.33 GiB (4.64 BPW) llm_load_print_meta: general.name = Meta-Llama-3-8B-Instruct llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>' llm_load_print_meta: EOS token = 128001 '<|end_of_text|>' llm_load_print_meta: LF token = 128 'Ä' llm_load_print_meta: EOT token = 128009 '<|eot_id|>' ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce GTX 970, compute capability 5.2, VMM: yes llm_load_tensors: ggml ctx size = 0.30 MiB llm_load_tensors: offloading 17 repeating layers to GPU llm_load_tensors: offloaded 17/33 layers to GPU llm_load_tensors: CPU buffer size = 4437.80 MiB llm_load_tensors: CUDA0 buffer size = 1989.53 MiB ....................................................................................... llama_new_context_with_model: n_ctx = 2048 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: freq_base = 500000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: CUDA_Host KV buffer size = 120.00 MiB llama_kv_cache_init: CUDA0 KV buffer size = 136.00 MiB llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB llama_new_context_with_model: CUDA_Host output buffer size = 0.50 MiB llama_new_context_with_model: CUDA0 compute buffer size = 677.48 MiB llama_new_context_with_model: CUDA_Host compute buffer size = 12.01 MiB llama_new_context_with_model: graph nodes = 1030 llama_new_context_with_model: graph splits = 169 ``` ``` +-----------------------------------------------------------------------------+ | NVIDIA-SMI 512.15 Driver Version: 512.15 CUDA Version: 11.6 | |-------------------------------+----------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... WDDM | 00000000:01:00.0 Off | N/A | | 36% 54C P8 15W / 151W | 3175MiB / 4096MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 12592 C ...3\ollama_llama_server.exe N/A | +-----------------------------------------------------------------------------+ ```
Author
Owner

@aaronjrod commented on GitHub (Apr 30, 2024):

However, it seems that while the model is loaded into VRAM, all the compute is done at the CPU level. Looking into solutions, any suggestions? I see that not all layers are sent to the GPU. I tried using phi3 as a smaller model (in case llama3 is not being fully loaded), but it is definitely being ran on the CPU. Is shared memory with the integrated GPU (the other 20 GB) not sufficient as VRAM?

Phi3:

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 970, compute capability 5.2, VMM: yes
llm_load_tensors: ggml ctx size =    0.30 MiB
llm_load_tensors: offloading 30 repeating layers to GPU
llm_load_tensors: offloaded 30/33 layers to GPU
llm_load_tensors:        CPU buffer size =  2210.78 MiB
llm_load_tensors:      CUDA0 buffer size =  1942.31 MiB
.................................................................................................
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:  CUDA_Host KV buffer size =    48.00 MiB
llama_kv_cache_init:      CUDA0 KV buffer size =   720.00 MiB
llama_new_context_with_model: KV self size  =  768.00 MiB, K (f16):  384.00 MiB, V (f16):  384.00 MiB
llama_new_context_with_model:  CUDA_Host  output buffer size =     0.13 MiB
llama_new_context_with_model:      CUDA0 compute buffer size =   185.06 MiB
llama_new_context_with_model:  CUDA_Host compute buffer size =    16.01 MiB

CPU being maxxed, GPU doing no compute:

image

<!-- gh-comment-id:2087679816 --> @aaronjrod commented on GitHub (Apr 30, 2024): However, it seems that while the model is loaded into VRAM, all the compute is done at the CPU level. Looking into solutions, any suggestions? I see that not all layers are sent to the GPU. I tried using phi3 as a smaller model (in case llama3 is not being fully loaded), but it is definitely being ran on the CPU. Is shared memory with the integrated GPU (the other 20 GB) not sufficient as VRAM? Phi3: ``` ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce GTX 970, compute capability 5.2, VMM: yes llm_load_tensors: ggml ctx size = 0.30 MiB llm_load_tensors: offloading 30 repeating layers to GPU llm_load_tensors: offloaded 30/33 layers to GPU llm_load_tensors: CPU buffer size = 2210.78 MiB llm_load_tensors: CUDA0 buffer size = 1942.31 MiB ................................................................................................. llama_new_context_with_model: n_ctx = 2048 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: CUDA_Host KV buffer size = 48.00 MiB llama_kv_cache_init: CUDA0 KV buffer size = 720.00 MiB llama_new_context_with_model: KV self size = 768.00 MiB, K (f16): 384.00 MiB, V (f16): 384.00 MiB llama_new_context_with_model: CUDA_Host output buffer size = 0.13 MiB llama_new_context_with_model: CUDA0 compute buffer size = 185.06 MiB llama_new_context_with_model: CUDA_Host compute buffer size = 16.01 MiB ``` CPU being maxxed, GPU doing no compute: ![image](https://github.com/ollama/ollama/assets/35236356/a170bda2-3767-447d-adbe-4bdf4e2132e1)
Author
Owner

@aaronjrod commented on GitHub (May 1, 2024):

Ok, I assume the issue is the same as https://github.com/ollama/ollama/issues/3201. However, not sure why phi3 will not fit all of its layers in VRAM (30 of 33 layers), given that it is a 2.3GB model and I have 4 GB of VRAM. Task manager also shows that dedicated GPU memory is 3.0/4.0, any suggestions on how to fit the full model/get Ollama to use a little more VRAM?

I understand that the question is unrelated to the original thread (and apologize for that), thanks for the help!

<!-- gh-comment-id:2087948124 --> @aaronjrod commented on GitHub (May 1, 2024): Ok, I assume the issue is the same as https://github.com/ollama/ollama/issues/3201. However, not sure why phi3 will not fit all of its layers in VRAM (30 of 33 layers), given that it is a 2.3GB model and I have 4 GB of VRAM. Task manager also shows that dedicated GPU memory is 3.0/4.0, any suggestions on how to fit the full model/get Ollama to use a little more VRAM? I understand that the question is unrelated to the original thread (and apologize for that), thanks for the help!
Author
Owner

@aaronjrod commented on GitHub (May 1, 2024):

-28T16:56:46.469-04:00 level=INFO source=gpu.go:101 msg="detected GPUs" library="C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\cuda

Ran into the compute capability again this morning (Compute Capability 1.0), and saw that Ollama was reading the cudart from Common. Then, I checked nvidia-smi, and saw that the CUDA version was a major version lower than what it last night (??? thanks windows)

Updated GeForce drivers, CUDA version is now 12.4. Also added Ollama further up in path. Seems to have done the trick, GPU is in use, the cudart used is the Ollama one. 16 tokens per second on Phi3, 30/33 layers allocated in VRAM (3.0/4.0 GB in use). Lots of CPU usage, I guess it is what it is then?

<!-- gh-comment-id:2088501474 --> @aaronjrod commented on GitHub (May 1, 2024): > -28T16:56:46.469-04:00 level=INFO source=gpu.go:101 msg="detected GPUs" library="C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common\\cuda Ran into the compute capability again this morning (Compute Capability 1.0), and saw that Ollama was reading the cudart from Common. Then, I checked nvidia-smi, and saw that the CUDA version was a major version lower than what it last night (??? thanks windows) Updated GeForce drivers, CUDA version is now 12.4. Also added Ollama further up in path. Seems to have done the trick, GPU is in use, the cudart used is the Ollama one. 16 tokens per second on Phi3, 30/33 layers allocated in VRAM (3.0/4.0 GB in use). Lots of CPU usage, I guess it is what it is then?
Author
Owner

@dhiltgen commented on GitHub (May 1, 2024):

Happy to hear you got it running on GPU. I'm still trying to get to the bottom of why that PhysX cudart library behaves strangely. I'm sort of wondering if it's exposing some sort of "virtual" GPU.

We include a copy of cudart v11 in the distribution to try to make it easier for users to install without having to add the cuda libraries on their host. There's some combination of factors that causes that bundled version to not work for some users which we're still trying to get to the bottom of.

As to the layers question - we're continuing to refine our prediction algorithm to maximize VRAM usage without hitting OOM crashes. Model architecture, context size and other factors can influence the actual VRAM usage at runtime compared to the on-disk size of the model.

I'd like to keep this issue tracking the unexplained PhysX cudart behavior leading to misidentification as CC 1.0.

<!-- gh-comment-id:2088663651 --> @dhiltgen commented on GitHub (May 1, 2024): Happy to hear you got it running on GPU. I'm still trying to get to the bottom of why that PhysX cudart library behaves strangely. I'm sort of wondering if it's exposing some sort of "virtual" GPU. We include a copy of cudart v11 in the distribution to try to make it easier for users to install without having to add the cuda libraries on their host. There's some combination of factors that causes that bundled version to not work for some users which we're still trying to get to the bottom of. As to the layers question - we're continuing to refine our prediction algorithm to maximize VRAM usage without hitting OOM crashes. Model architecture, context size and other factors can influence the actual VRAM usage at runtime compared to the on-disk size of the model. I'd like to keep this issue tracking the unexplained PhysX cudart behavior leading to misidentification as CC 1.0.
Author
Owner

@Eisaichen commented on GitHub (May 3, 2024):

I found 0.1.33 loaded significantly more lib files from the system, including nvcuda.dll nvcuda64.dll and nvapi64.dll, while 0.1.32 did not need to.
Maybe 0.1.32 using the bundled runtime as you mentioned, however, 0.1.33 didn't?

0.1.32 Dll
0	ollama.exe	0x560000	0x3365000	C:\Users\local\Desktop\Ollama_32\ollama.exe				
1	ntdll.dll	0x7FFB52A30000	0x216000	C:\WINDOWS\SYSTEM32\ntdll.dll	NT Layer DLL	10.0.22621.3374	Microsoft Corporation	
2	KERNEL32.DLL	0x7FFB52200000	0xC4000	C:\WINDOWS\System32\KERNEL32.DLL	Windows NT BASE API Client DLL	10.0.22621.3374	Microsoft Corporation	
3	KERNELBASE.dll	0x7FFB4FEF0000	0x3A7000	C:\WINDOWS\System32\KERNELBASE.dll	Windows NT BASE API Client DLL	10.0.22621.3447	Microsoft Corporation	
4	msvcrt.dll	0x7FFB51CB0000	0xA7000	C:\WINDOWS\System32\msvcrt.dll	Windows NT CRT DLL	7.0.22621.2506	Microsoft Corporation	
5	bcryptprimitives.dll	0x7FFB4FE70000	0x79000	C:\WINDOWS\System32\bcryptprimitives.dll	Windows Cryptographic Primitives Library	10.0.22621.3374	Microsoft Corporation	
6	winmm.dll	0x7FFB42D40000	0x34000	C:\WINDOWS\SYSTEM32\winmm.dll	MCI API DLL	10.0.22621.2506	Microsoft Corporation	
7	ucrtbase.dll	0x7FFB4FD50000	0x111000	C:\WINDOWS\System32\ucrtbase.dll	Microsoft® C Runtime Library	10.0.22621.3374	Microsoft Corporation	
8	ws2_32.dll	0x7FFB527F0000	0x71000	C:\WINDOWS\System32\ws2_32.dll	Windows Socket 2.0 32-Bit DLL	10.0.22621.1	Microsoft Corporation	
9	RPCRT4.dll	0x7FFB528D0000	0x115000	C:\WINDOWS\System32\RPCRT4.dll	Remote Procedure Call Runtime	10.0.22621.3447	Microsoft Corporation	
10	powrprof.dll	0x7FFB4EBB0000	0x4D000	C:\WINDOWS\SYSTEM32\powrprof.dll	Power Profile Helper DLL	10.0.22621.3374	Microsoft Corporation	
11	UMPDC.dll	0x7FFB4EB90000	0x13000	C:\WINDOWS\SYSTEM32\UMPDC.dll	User Mode Power Dependency Coordinator	10.0.22621.1	Microsoft Corporation	
12	mswsock.dll	0x7FFB4F2A0000	0x69000	C:\WINDOWS\system32\mswsock.dll	Microsoft Windows Sockets 2.0 Service Provider	10.0.22621.2506	Microsoft Corporation	
0.1.33 Dll
0	ollama.exe	0x530000	0x19BE000	C:\Users\local\Desktop\ollama_33\ollama.exe				
1	ntdll.dll	0x7FFB52A30000	0x216000	C:\WINDOWS\SYSTEM32\ntdll.dll	NT Layer DLL	10.0.22621.3374	Microsoft Corporation	
2	KERNEL32.DLL	0x7FFB52200000	0xC4000	C:\WINDOWS\System32\KERNEL32.DLL	Windows NT BASE API Client DLL	10.0.22621.3374	Microsoft Corporation	
3	KERNELBASE.dll	0x7FFB4FEF0000	0x3A7000	C:\WINDOWS\System32\KERNELBASE.dll	Windows NT BASE API Client DLL	10.0.22621.3447	Microsoft Corporation	
4	msvcrt.dll	0x7FFB51CB0000	0xA7000	C:\WINDOWS\System32\msvcrt.dll	Windows NT CRT DLL	7.0.22621.2506	Microsoft Corporation	
5	bcryptprimitives.dll	0x7FFB4FE70000	0x79000	C:\WINDOWS\System32\bcryptprimitives.dll	Windows Cryptographic Primitives Library	10.0.22621.3374	Microsoft Corporation	
6	winmm.dll	0x7FFB42D40000	0x34000	C:\WINDOWS\SYSTEM32\winmm.dll	MCI API DLL	10.0.22621.2506	Microsoft Corporation	
7	ucrtbase.dll	0x7FFB4FD50000	0x111000	C:\WINDOWS\System32\ucrtbase.dll	Microsoft® C Runtime Library	10.0.22621.3374	Microsoft Corporation	
8	ws2_32.dll	0x7FFB527F0000	0x71000	C:\WINDOWS\System32\ws2_32.dll	Windows Socket 2.0 32-Bit DLL	10.0.22621.1	Microsoft Corporation	
9	RPCRT4.dll	0x7FFB528D0000	0x115000	C:\WINDOWS\System32\RPCRT4.dll	Remote Procedure Call Runtime	10.0.22621.3447	Microsoft Corporation	
10	powrprof.dll	0x7FFB4EBB0000	0x4D000	C:\WINDOWS\SYSTEM32\powrprof.dll	Power Profile Helper DLL	10.0.22621.3374	Microsoft Corporation	
11	UMPDC.dll	0x7FFB4EB90000	0x13000	C:\WINDOWS\SYSTEM32\UMPDC.dll	User Mode Power Dependency Coordinator	10.0.22621.1	Microsoft Corporation	
12	mswsock.dll	0x7FFB4F2A0000	0x69000	C:\WINDOWS\system32\mswsock.dll	Microsoft Windows Sockets 2.0 Service Provider	10.0.22621.2506	Microsoft Corporation	
13	nvcuda.dll	0x7FFACC740000	0x38E000	C:\WINDOWS\system32\nvcuda.dll	NVIDIA CUDA Driver, Version 551.86 	31.0.15.5186	NVIDIA Corporation	
14	ADVAPI32.dll	0x7FFB51BE0000	0xB2000	C:\WINDOWS\System32\ADVAPI32.dll	Advanced Windows 32 Base API	10.0.22621.3296	Microsoft Corporation	
15	sechost.dll	0x7FFB52440000	0xA8000	C:\WINDOWS\System32\sechost.dll	Host for SCM/SDDL/LSA Lookup APIs	10.0.22621.3296	Microsoft Corporation	
16	bcrypt.dll	0x7FFB50730000	0x28000	C:\WINDOWS\System32\bcrypt.dll	Windows Cryptographic Primitives Library	10.0.22621.2506	Microsoft Corporation	
17	gdi32.dll	0x7FFB527C0000	0x29000	C:\WINDOWS\System32\gdi32.dll	GDI Client DLL	10.0.22621.3085	Microsoft Corporation	
18	win32u.dll	0x7FFB50410000	0x26000	C:\WINDOWS\System32\win32u.dll	Win32u	10.0.22621.3447	Microsoft Corporation	
19	gdi32full.dll	0x7FFB50610000	0x119000	C:\WINDOWS\System32\gdi32full.dll	GDI Client DLL	10.0.22621.3374	Microsoft Corporation	
20	msvcp_win.dll	0x7FFB50440000	0x9A000	C:\WINDOWS\System32\msvcp_win.dll	Microsoft® C Runtime Library	10.0.22621.3374	Microsoft Corporation	
21	USER32.dll	0x7FFB51E80000	0x1AE000	C:\WINDOWS\System32\USER32.dll	Multi-User Windows USER API Client DLL	10.0.22621.3374	Microsoft Corporation	
22	IMM32.DLL	0x7FFB52770000	0x31000	C:\WINDOWS\System32\IMM32.DLL	Multi-User Windows IMM32 API Client DLL	10.0.22621.3374	Microsoft Corporation	
23	dxcore.dll	0x7FFB4D4A0000	0x39000	C:\WINDOWS\SYSTEM32\dxcore.dll	DXCore	10.0.22621.3374	Microsoft Corporation	
24	nvcuda64.dll	0x7FFA9C570000	0xA1F000	C:\WINDOWS\system32\DriverStore\FileRepository\nv_dispi.inf_amd64_362f239e9bd019fc\nvcuda64.dll	NVIDIA CUDA Driver, Version 551.86 	31.0.15.5186	NVIDIA Corporation	
25	SHLWAPI.dll	0x7FFB510C0000	0x5E000	C:\WINDOWS\System32\SHLWAPI.dll	Shell Light-weight Utility Library	10.0.22621.2506	Microsoft Corporation	
26	VERSION.dll	0x7FFB47740000	0xA000	C:\WINDOWS\SYSTEM32\VERSION.dll	Version Checking and File Installation Libraries	10.0.22621.1	Microsoft Corporation	
27	msasn1.dll	0x7FFB4F640000	0x12000	C:\WINDOWS\SYSTEM32\msasn1.dll	ASN.1 Runtime APIs	10.0.22621.2506	Microsoft Corporation	
28	cryptnet.dll	0x7FFB46D60000	0x32000	C:\WINDOWS\SYSTEM32\cryptnet.dll	Crypto Network Related API	10.0.22621.1	Microsoft Corporation	
29	CRYPT32.dll	0x7FFB502A0000	0x167000	C:\WINDOWS\System32\CRYPT32.dll	Crypto API32	10.0.22621.3447	Microsoft Corporation	
30	drvstore.dll	0x7FFB46B80000	0x158000	C:\WINDOWS\SYSTEM32\drvstore.dll	Driver Store API	10.0.22621.2506	Microsoft Corporation	
31	devobj.dll	0x7FFB4FA10000	0x2C000	C:\WINDOWS\SYSTEM32\devobj.dll	Device Information Set DLL	10.0.22621.2506	Microsoft Corporation	
32	cfgmgr32.dll	0x7FFB4FA40000	0x4E000	C:\WINDOWS\SYSTEM32\cfgmgr32.dll	Configuration Manager DLL	10.0.22621.2506	Microsoft Corporation	
33	wldp.dll	0x7FFB4F530000	0x48000	C:\WINDOWS\SYSTEM32\wldp.dll	Windows Lockdown Policy	10.0.22621.3447	Microsoft Corporation	
34	combase.dll	0x7FFB51320000	0x388000	C:\WINDOWS\System32\combase.dll	Microsoft COM for Windows	10.0.22621.3235	Microsoft Corporation	
35	OLEAUT32.dll	0x7FFB50FE0000	0xD7000	C:\WINDOWS\System32\OLEAUT32.dll	OLEAUT32.DLL	10.0.22621.2506	Microsoft Corporation	
36	cryptbase.dll	0x7FFB4F4A0000	0xC000	C:\WINDOWS\SYSTEM32\cryptbase.dll	Base cryptographic API DLL	10.0.22621.1	Microsoft Corporation	
37	nvapi64.dll	0x7FFB3C910000	0x6C4000	C:\WINDOWS\SYSTEM32\nvapi64.dll	NVIDIA NVAPI Library, Version 551.86 	31.0.15.5186	NVIDIA Corporation	
38	SETUPAPI.dll	0x7FFB516B0000	0x474000	C:\WINDOWS\System32\SETUPAPI.dll	Windows Setup API	10.0.22621.2506	Microsoft Corporation	
39	SHELL32.dll	0x7FFB50780000	0x85C000	C:\WINDOWS\System32\SHELL32.dll	Windows Shell Common Dll	10.0.22621.3374	Microsoft Corporation	
40	ole32.dll	0x7FFB524F0000	0x1A5000	C:\WINDOWS\System32\ole32.dll	Microsoft OLE for Windows	10.0.22621.3374	Microsoft Corporation	
41	kernel.appcore.dll	0x7FFB4EE40000	0x18000	C:\WINDOWS\SYSTEM32\kernel.appcore.dll	AppModel API Host	10.0.22621.2715	Microsoft Corporation	
42	dwmapi.dll	0x7FFB4D430000	0x2B000	C:\WINDOWS\SYSTEM32\dwmapi.dll	Microsoft Desktop Window Manager API	10.0.22621.3085	Microsoft Corporation	
43	WINTRUST.dll	0x7FFB504E0000	0x6B000	C:\WINDOWS\System32\WINTRUST.dll	Microsoft Trust Verification APIs	10.0.22621.3447	Microsoft Corporation	
<!-- gh-comment-id:2092261582 --> @Eisaichen commented on GitHub (May 3, 2024): I found 0.1.33 loaded significantly more lib files from the system, including `nvcuda.dll` `nvcuda64.dll` and `nvapi64.dll`, while 0.1.32 did not need to. Maybe 0.1.32 using the bundled runtime as you mentioned, however, 0.1.33 didn't? <details><summary>0.1.32 Dll</summary> ``` 0 ollama.exe 0x560000 0x3365000 C:\Users\local\Desktop\Ollama_32\ollama.exe 1 ntdll.dll 0x7FFB52A30000 0x216000 C:\WINDOWS\SYSTEM32\ntdll.dll NT Layer DLL 10.0.22621.3374 Microsoft Corporation 2 KERNEL32.DLL 0x7FFB52200000 0xC4000 C:\WINDOWS\System32\KERNEL32.DLL Windows NT BASE API Client DLL 10.0.22621.3374 Microsoft Corporation 3 KERNELBASE.dll 0x7FFB4FEF0000 0x3A7000 C:\WINDOWS\System32\KERNELBASE.dll Windows NT BASE API Client DLL 10.0.22621.3447 Microsoft Corporation 4 msvcrt.dll 0x7FFB51CB0000 0xA7000 C:\WINDOWS\System32\msvcrt.dll Windows NT CRT DLL 7.0.22621.2506 Microsoft Corporation 5 bcryptprimitives.dll 0x7FFB4FE70000 0x79000 C:\WINDOWS\System32\bcryptprimitives.dll Windows Cryptographic Primitives Library 10.0.22621.3374 Microsoft Corporation 6 winmm.dll 0x7FFB42D40000 0x34000 C:\WINDOWS\SYSTEM32\winmm.dll MCI API DLL 10.0.22621.2506 Microsoft Corporation 7 ucrtbase.dll 0x7FFB4FD50000 0x111000 C:\WINDOWS\System32\ucrtbase.dll Microsoft® C Runtime Library 10.0.22621.3374 Microsoft Corporation 8 ws2_32.dll 0x7FFB527F0000 0x71000 C:\WINDOWS\System32\ws2_32.dll Windows Socket 2.0 32-Bit DLL 10.0.22621.1 Microsoft Corporation 9 RPCRT4.dll 0x7FFB528D0000 0x115000 C:\WINDOWS\System32\RPCRT4.dll Remote Procedure Call Runtime 10.0.22621.3447 Microsoft Corporation 10 powrprof.dll 0x7FFB4EBB0000 0x4D000 C:\WINDOWS\SYSTEM32\powrprof.dll Power Profile Helper DLL 10.0.22621.3374 Microsoft Corporation 11 UMPDC.dll 0x7FFB4EB90000 0x13000 C:\WINDOWS\SYSTEM32\UMPDC.dll User Mode Power Dependency Coordinator 10.0.22621.1 Microsoft Corporation 12 mswsock.dll 0x7FFB4F2A0000 0x69000 C:\WINDOWS\system32\mswsock.dll Microsoft Windows Sockets 2.0 Service Provider 10.0.22621.2506 Microsoft Corporation ``` </details> <details><summary>0.1.33 Dll</summary> ``` 0 ollama.exe 0x530000 0x19BE000 C:\Users\local\Desktop\ollama_33\ollama.exe 1 ntdll.dll 0x7FFB52A30000 0x216000 C:\WINDOWS\SYSTEM32\ntdll.dll NT Layer DLL 10.0.22621.3374 Microsoft Corporation 2 KERNEL32.DLL 0x7FFB52200000 0xC4000 C:\WINDOWS\System32\KERNEL32.DLL Windows NT BASE API Client DLL 10.0.22621.3374 Microsoft Corporation 3 KERNELBASE.dll 0x7FFB4FEF0000 0x3A7000 C:\WINDOWS\System32\KERNELBASE.dll Windows NT BASE API Client DLL 10.0.22621.3447 Microsoft Corporation 4 msvcrt.dll 0x7FFB51CB0000 0xA7000 C:\WINDOWS\System32\msvcrt.dll Windows NT CRT DLL 7.0.22621.2506 Microsoft Corporation 5 bcryptprimitives.dll 0x7FFB4FE70000 0x79000 C:\WINDOWS\System32\bcryptprimitives.dll Windows Cryptographic Primitives Library 10.0.22621.3374 Microsoft Corporation 6 winmm.dll 0x7FFB42D40000 0x34000 C:\WINDOWS\SYSTEM32\winmm.dll MCI API DLL 10.0.22621.2506 Microsoft Corporation 7 ucrtbase.dll 0x7FFB4FD50000 0x111000 C:\WINDOWS\System32\ucrtbase.dll Microsoft® C Runtime Library 10.0.22621.3374 Microsoft Corporation 8 ws2_32.dll 0x7FFB527F0000 0x71000 C:\WINDOWS\System32\ws2_32.dll Windows Socket 2.0 32-Bit DLL 10.0.22621.1 Microsoft Corporation 9 RPCRT4.dll 0x7FFB528D0000 0x115000 C:\WINDOWS\System32\RPCRT4.dll Remote Procedure Call Runtime 10.0.22621.3447 Microsoft Corporation 10 powrprof.dll 0x7FFB4EBB0000 0x4D000 C:\WINDOWS\SYSTEM32\powrprof.dll Power Profile Helper DLL 10.0.22621.3374 Microsoft Corporation 11 UMPDC.dll 0x7FFB4EB90000 0x13000 C:\WINDOWS\SYSTEM32\UMPDC.dll User Mode Power Dependency Coordinator 10.0.22621.1 Microsoft Corporation 12 mswsock.dll 0x7FFB4F2A0000 0x69000 C:\WINDOWS\system32\mswsock.dll Microsoft Windows Sockets 2.0 Service Provider 10.0.22621.2506 Microsoft Corporation 13 nvcuda.dll 0x7FFACC740000 0x38E000 C:\WINDOWS\system32\nvcuda.dll NVIDIA CUDA Driver, Version 551.86 31.0.15.5186 NVIDIA Corporation 14 ADVAPI32.dll 0x7FFB51BE0000 0xB2000 C:\WINDOWS\System32\ADVAPI32.dll Advanced Windows 32 Base API 10.0.22621.3296 Microsoft Corporation 15 sechost.dll 0x7FFB52440000 0xA8000 C:\WINDOWS\System32\sechost.dll Host for SCM/SDDL/LSA Lookup APIs 10.0.22621.3296 Microsoft Corporation 16 bcrypt.dll 0x7FFB50730000 0x28000 C:\WINDOWS\System32\bcrypt.dll Windows Cryptographic Primitives Library 10.0.22621.2506 Microsoft Corporation 17 gdi32.dll 0x7FFB527C0000 0x29000 C:\WINDOWS\System32\gdi32.dll GDI Client DLL 10.0.22621.3085 Microsoft Corporation 18 win32u.dll 0x7FFB50410000 0x26000 C:\WINDOWS\System32\win32u.dll Win32u 10.0.22621.3447 Microsoft Corporation 19 gdi32full.dll 0x7FFB50610000 0x119000 C:\WINDOWS\System32\gdi32full.dll GDI Client DLL 10.0.22621.3374 Microsoft Corporation 20 msvcp_win.dll 0x7FFB50440000 0x9A000 C:\WINDOWS\System32\msvcp_win.dll Microsoft® C Runtime Library 10.0.22621.3374 Microsoft Corporation 21 USER32.dll 0x7FFB51E80000 0x1AE000 C:\WINDOWS\System32\USER32.dll Multi-User Windows USER API Client DLL 10.0.22621.3374 Microsoft Corporation 22 IMM32.DLL 0x7FFB52770000 0x31000 C:\WINDOWS\System32\IMM32.DLL Multi-User Windows IMM32 API Client DLL 10.0.22621.3374 Microsoft Corporation 23 dxcore.dll 0x7FFB4D4A0000 0x39000 C:\WINDOWS\SYSTEM32\dxcore.dll DXCore 10.0.22621.3374 Microsoft Corporation 24 nvcuda64.dll 0x7FFA9C570000 0xA1F000 C:\WINDOWS\system32\DriverStore\FileRepository\nv_dispi.inf_amd64_362f239e9bd019fc\nvcuda64.dll NVIDIA CUDA Driver, Version 551.86 31.0.15.5186 NVIDIA Corporation 25 SHLWAPI.dll 0x7FFB510C0000 0x5E000 C:\WINDOWS\System32\SHLWAPI.dll Shell Light-weight Utility Library 10.0.22621.2506 Microsoft Corporation 26 VERSION.dll 0x7FFB47740000 0xA000 C:\WINDOWS\SYSTEM32\VERSION.dll Version Checking and File Installation Libraries 10.0.22621.1 Microsoft Corporation 27 msasn1.dll 0x7FFB4F640000 0x12000 C:\WINDOWS\SYSTEM32\msasn1.dll ASN.1 Runtime APIs 10.0.22621.2506 Microsoft Corporation 28 cryptnet.dll 0x7FFB46D60000 0x32000 C:\WINDOWS\SYSTEM32\cryptnet.dll Crypto Network Related API 10.0.22621.1 Microsoft Corporation 29 CRYPT32.dll 0x7FFB502A0000 0x167000 C:\WINDOWS\System32\CRYPT32.dll Crypto API32 10.0.22621.3447 Microsoft Corporation 30 drvstore.dll 0x7FFB46B80000 0x158000 C:\WINDOWS\SYSTEM32\drvstore.dll Driver Store API 10.0.22621.2506 Microsoft Corporation 31 devobj.dll 0x7FFB4FA10000 0x2C000 C:\WINDOWS\SYSTEM32\devobj.dll Device Information Set DLL 10.0.22621.2506 Microsoft Corporation 32 cfgmgr32.dll 0x7FFB4FA40000 0x4E000 C:\WINDOWS\SYSTEM32\cfgmgr32.dll Configuration Manager DLL 10.0.22621.2506 Microsoft Corporation 33 wldp.dll 0x7FFB4F530000 0x48000 C:\WINDOWS\SYSTEM32\wldp.dll Windows Lockdown Policy 10.0.22621.3447 Microsoft Corporation 34 combase.dll 0x7FFB51320000 0x388000 C:\WINDOWS\System32\combase.dll Microsoft COM for Windows 10.0.22621.3235 Microsoft Corporation 35 OLEAUT32.dll 0x7FFB50FE0000 0xD7000 C:\WINDOWS\System32\OLEAUT32.dll OLEAUT32.DLL 10.0.22621.2506 Microsoft Corporation 36 cryptbase.dll 0x7FFB4F4A0000 0xC000 C:\WINDOWS\SYSTEM32\cryptbase.dll Base cryptographic API DLL 10.0.22621.1 Microsoft Corporation 37 nvapi64.dll 0x7FFB3C910000 0x6C4000 C:\WINDOWS\SYSTEM32\nvapi64.dll NVIDIA NVAPI Library, Version 551.86 31.0.15.5186 NVIDIA Corporation 38 SETUPAPI.dll 0x7FFB516B0000 0x474000 C:\WINDOWS\System32\SETUPAPI.dll Windows Setup API 10.0.22621.2506 Microsoft Corporation 39 SHELL32.dll 0x7FFB50780000 0x85C000 C:\WINDOWS\System32\SHELL32.dll Windows Shell Common Dll 10.0.22621.3374 Microsoft Corporation 40 ole32.dll 0x7FFB524F0000 0x1A5000 C:\WINDOWS\System32\ole32.dll Microsoft OLE for Windows 10.0.22621.3374 Microsoft Corporation 41 kernel.appcore.dll 0x7FFB4EE40000 0x18000 C:\WINDOWS\SYSTEM32\kernel.appcore.dll AppModel API Host 10.0.22621.2715 Microsoft Corporation 42 dwmapi.dll 0x7FFB4D430000 0x2B000 C:\WINDOWS\SYSTEM32\dwmapi.dll Microsoft Desktop Window Manager API 10.0.22621.3085 Microsoft Corporation 43 WINTRUST.dll 0x7FFB504E0000 0x6B000 C:\WINDOWS\System32\WINTRUST.dll Microsoft Trust Verification APIs 10.0.22621.3447 Microsoft Corporation ``` </details>
Author
Owner

@dhiltgen commented on GitHub (May 3, 2024):

We adjusted the behavior in 0.1.33 to try to use cuda libraries on the host system if found in the hopes that would resolve some other issues we've seen with our bundled library not working in some cases. We weren't anticipating a cudart library successfully loading and enumerating a GPU but providing incorrect information about memory and CC version. I'd definitely like to get this fixed ASAP for the next release, we just need to figure out what the best approach is.

<!-- gh-comment-id:2093538563 --> @dhiltgen commented on GitHub (May 3, 2024): We adjusted the behavior in 0.1.33 to try to use cuda libraries on the host system if found in the hopes that would resolve some other issues we've seen with our bundled library not working in some cases. We weren't anticipating a cudart library successfully loading and enumerating a GPU but providing incorrect information about memory and CC version. I'd definitely like to get this fixed ASAP for the next release, we just need to figure out what the best approach is.
Author
Owner

@Freffles commented on GitHub (May 4, 2024):

FWIW, I think I have the same issue.

Edit: Actually, not the same but similar. GPU is not being used. Have a very similar graphics card to the OP, GTX 1050Ti.

Noticed my ollama embeddings take a very long time. Server logs seem to point to an Ollama version of cuda DLL (cudart64_110.dll) despite the fact that up to date cuda is installed. Removed Ollama, removed all nvidia then re-installed nvidia first then ollama but still the same.

image

ollama.log

<!-- gh-comment-id:2094075301 --> @Freffles commented on GitHub (May 4, 2024): FWIW, I think I have the same issue. Edit: Actually, not the same but similar. GPU is not being used. Have a very similar graphics card to the OP, GTX 1050Ti. Noticed my ollama embeddings take a very long time. Server logs seem to point to an Ollama version of cuda DLL (cudart64_110.dll) despite the fact that up to date cuda is installed. Removed Ollama, removed all nvidia then re-installed nvidia first then ollama but still the same. ![image](https://github.com/ollama/ollama/assets/122719748/32124685-d812-4999-91e2-a86425687c5d) [ollama.log](https://github.com/ollama/ollama/files/15211991/ollama.log)
Author
Owner

@makeryangcom commented on GitHub (May 5, 2024):

I also have the same issue; although I removed it from the PATH, I still can't use the GPU. #3969

<!-- gh-comment-id:2094641337 --> @makeryangcom commented on GitHub (May 5, 2024): I also have the same issue; although I removed it from the PATH, I still can't use the GPU. #3969
Author
Owner

@cr1cr1 commented on GitHub (May 6, 2024):

Managed to make it use GPU by force setting in PATH the ollama dir (or any other) that contains cudart64_110.dll
ollama version: 0.1.33, Windows 11

set PATH=C:\tools\scoop\apps\ollama\current;%PATH%

ollama serve
time=2024-05-06T21:00:09.798+03:00 level=INFO source=images.go:828 msg="total blobs: 7"
time=2024-05-06T21:00:09.799+03:00 level=INFO source=images.go:835 msg="total unused blobs removed: 0"
time=2024-05-06T21:00:09.800+03:00 level=INFO source=routes.go:1071 msg="Listening on 127.0.0.1:11434 (version 0.1.33)"
time=2024-05-06T21:00:09.800+03:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cuda_v11.3 rocm_v5.7 cpu cpu_avx cpu_avx2]"
time=2024-05-06T21:00:09.800+03:00 level=INFO source=gpu.go:96 msg="Detecting GPUs"
time=2024-05-06T21:00:09.833+03:00 level=INFO source=gpu.go:101 msg="detected GPUs" library=C:\tools\scoop\apps\ollama\current\cudart64_110.dll count=1

......

ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3080, compute capability 8.6, VMM: yes
llm_load_tensors: ggml ctx size =    0.30 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors:        CPU buffer size =   281.81 MiB
llm_load_tensors:      CUDA0 buffer size =  4155.99 MiB
nvidia-smi
Mon May  6 20:56:04 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 552.22                 Driver Version: 552.22         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3080      WDDM  |   00000000:01:00.0  On |                  N/A |
| 47%   38C    P8             35W /  350W |    8023MiB /  12288MiB |      7%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
.....
|    0   N/A  N/A     23524      C   ...\cuda_v11.3\ollama_llama_server.exe      N/A      |
+-----------------------------------------------------------------------------------------+
<!-- gh-comment-id:2096623526 --> @cr1cr1 commented on GitHub (May 6, 2024): Managed to make it use GPU by force setting in `PATH` the ollama dir (or any other) that contains `cudart64_110.dll` ollama version: `0.1.33`, Windows 11 ```bash set PATH=C:\tools\scoop\apps\ollama\current;%PATH% ollama serve time=2024-05-06T21:00:09.798+03:00 level=INFO source=images.go:828 msg="total blobs: 7" time=2024-05-06T21:00:09.799+03:00 level=INFO source=images.go:835 msg="total unused blobs removed: 0" time=2024-05-06T21:00:09.800+03:00 level=INFO source=routes.go:1071 msg="Listening on 127.0.0.1:11434 (version 0.1.33)" time=2024-05-06T21:00:09.800+03:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cuda_v11.3 rocm_v5.7 cpu cpu_avx cpu_avx2]" time=2024-05-06T21:00:09.800+03:00 level=INFO source=gpu.go:96 msg="Detecting GPUs" time=2024-05-06T21:00:09.833+03:00 level=INFO source=gpu.go:101 msg="detected GPUs" library=C:\tools\scoop\apps\ollama\current\cudart64_110.dll count=1 ...... ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 3080, compute capability 8.6, VMM: yes llm_load_tensors: ggml ctx size = 0.30 MiB llm_load_tensors: offloading 32 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 33/33 layers to GPU llm_load_tensors: CPU buffer size = 281.81 MiB llm_load_tensors: CUDA0 buffer size = 4155.99 MiB ``` ```bash nvidia-smi Mon May 6 20:56:04 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 552.22 Driver Version: 552.22 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 3080 WDDM | 00000000:01:00.0 On | N/A | | 47% 38C P8 35W / 350W | 8023MiB / 12288MiB | 7% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| ..... | 0 N/A N/A 23524 C ...\cuda_v11.3\ollama_llama_server.exe N/A | +-----------------------------------------------------------------------------------------+ ```
Author
Owner

@nguyenhuuloc304 commented on GitHub (Jul 10, 2024):

Hi @cr1cr1
I tried your solution. but it did not work. any other way to force using GPU? Thank you.
image

<!-- gh-comment-id:2219941583 --> @nguyenhuuloc304 commented on GitHub (Jul 10, 2024): Hi @cr1cr1 I tried your solution. but it did not work. any other way to force using GPU? Thank you. ![image](https://github.com/ollama/ollama/assets/21157325/bd9665fe-997a-417e-9c8a-1ab14ce5dab3)
Author
Owner

@nguyenhuuloc304 commented on GitHub (Jul 10, 2024):

I uninstalled version 0.2.1 and install version 0.1.33 and set variable as below. it also did not work
image

my log
image

<!-- gh-comment-id:2219987261 --> @nguyenhuuloc304 commented on GitHub (Jul 10, 2024): I uninstalled version 0.2.1 and install version 0.1.33 and set variable as below. it also did not work ![image](https://github.com/ollama/ollama/assets/21157325/4f4fdbec-c9ac-45e3-bc9a-6ce9903e1dfc) my log ![image](https://github.com/ollama/ollama/assets/21157325/86f676f2-b4e3-4592-9bf1-3357f44603af)
Author
Owner

@cr1cr1 commented on GitHub (Jul 10, 2024):

The idea was that ollama dir should be first in PATH, not last.

From the last log, you do not have the problem in this issue, seems to offloads layers to your GPU but... you need VRAM (graphics RAM) enough to preload the entire model, so at leas 4GB for llama3.

Your card is too old with not enough memory.

<!-- gh-comment-id:2220104403 --> @cr1cr1 commented on GitHub (Jul 10, 2024): The idea was that ollama dir should be `first` in PATH, not last. From the last log, you do not have the problem in this issue, seems to offloads layers to your GPU but... you need VRAM (graphics RAM) enough to preload the entire model, so at leas 4GB for llama3. Your card is too old with not enough memory.
Author
Owner

@nguyenhuuloc304 commented on GitHub (Jul 11, 2024):

Hi @cr1cr1 thank you for your advices. I tried another card with 4GB VRAM but no luck.

image

<!-- gh-comment-id:2222908274 --> @nguyenhuuloc304 commented on GitHub (Jul 11, 2024): Hi @cr1cr1 thank you for your advices. I tried another card with 4GB VRAM but no luck. ![image](https://github.com/ollama/ollama/assets/21157325/250ec214-1292-4862-b886-7b24cf8cd6df)
Author
Owner

@dhiltgen commented on GitHub (Jul 15, 2024):

@nguyenhuuloc304 your scenario looks unrelated to the PhysX cudart library.

From the screenshots you've shared, it looks like we are loading on the GPU, however you're loading a model much larger than the avaailable VRAM, so only ~40% of the model is loading into GPU, and the remaining ~60% is running on the CPU. This means you're limited to the CPU inference speed, and the GPU is spending most of its time waiting for the CPU to perform it's calculations. If you load a smaller model that fits fully in your GPU you should see much better performance.

You can run the command ollama ps to see the ratios of CPU vs. GPU.

<!-- gh-comment-id:2229099718 --> @dhiltgen commented on GitHub (Jul 15, 2024): @nguyenhuuloc304 your scenario looks unrelated to the PhysX cudart library. From the screenshots you've shared, it looks like we are loading on the GPU, however you're loading a model much larger than the avaailable VRAM, so only ~40% of the model is loading into GPU, and the remaining ~60% is running on the CPU. This means you're limited to the CPU inference speed, and the GPU is spending most of its time waiting for the CPU to perform it's calculations. If you load a smaller model that fits fully in your GPU you should see much better performance. You can run the command `ollama ps` to see the ratios of CPU vs. GPU.
Author
Owner

@rbz518 commented on GitHub (Aug 1, 2024):

Here is my confusion. Maybe Windows Task Manager doesn't have the correct usage for the GPU? Or maybe I am misreading ollama ps? Output from ollama ps never seems to update though...

image

image

image

<!-- gh-comment-id:2263987949 --> @rbz518 commented on GitHub (Aug 1, 2024): Here is my confusion. Maybe Windows Task Manager doesn't have the correct usage for the GPU? Or maybe I am misreading `ollama ps`? Output from `ollama ps` never seems to update though... ![image](https://github.com/user-attachments/assets/3dfb128e-c3ef-4e49-8535-c02628d96f94) ![image](https://github.com/user-attachments/assets/049aad8b-55b2-48d6-8222-ab207c93ed72) ![image](https://github.com/user-attachments/assets/bbe2cbd2-bbc5-4891-99d3-43c938fde27d)
Author
Owner

@dhiltgen commented on GitHub (Aug 1, 2024):

@rbz518 56% of 42G is ~23G (from the ollama ps output), which is almost exactly what Task Manager says is dedicated on your GPU, so the numbers seem to match pretty close to me. Can you clarify your question?

<!-- gh-comment-id:2264179836 --> @dhiltgen commented on GitHub (Aug 1, 2024): @rbz518 56% of 42G is ~23G (from the `ollama ps` output), which is almost exactly what Task Manager says is dedicated on your GPU, so the numbers seem to match pretty close to me. Can you clarify your question?
Author
Owner

@Surakiatk commented on GitHub (Aug 7, 2024):

image

I want ollama to use my CPU and GPU at 90-100% performance. What do I need to do?

<!-- gh-comment-id:2273811460 --> @Surakiatk commented on GitHub (Aug 7, 2024): ![image](https://github.com/user-attachments/assets/3a6805b0-f604-4c2c-b2ce-f266f320b212) I want ollama to use my CPU and GPU at 90-100% performance. What do I need to do?
Author
Owner

@Paramjethwa commented on GitHub (Sep 25, 2024):

put the path of cudart64_110.dll a the top of system variable and it did not work for me

return :

`[+] Running 2/0
✔ Container local_multimodal_ai-ollama-1 Created 0.0s
✔ Container local_multimodal_ai-app-1 Created 0.0s
Attaching to app-1, ollama-1
ollama-1 | 2024/09/25 17:12:44 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:15m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
ollama-1 | time=2024-09-25T17:12:44.372Z level=INFO source=images.go:753 msg="total blobs: 28"
ollama-1 | time=2024-09-25T17:12:44.504Z level=INFO source=images.go:760 msg="total unused blobs removed: 0"
ollama-1 | time=2024-09-25T17:12:44.627Z level=INFO source=routes.go:1172 msg="Listening on [::]:11434 (version 0.3.10)"
ollama-1 | time=2024-09-25T17:12:44.629Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2245169848/runners
app-1 |
app-1 | Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.
app-1 |
app-1 |
app-1 | You can now view your Streamlit app in your browser.
app-1 |
app-1 | Local URL: http://localhost:8501
app-1 | Network URL: http://172.18.0.3:8501
app-1 | External URL: http://103.110.166.152:8501
app-1 |

ollama-1 | time=2024-09-25T17:12:52.794Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [rocm_v60102 cpu cpu_avx cpu_avx2 cuda_v11 cuda_v12]"
ollama-1 | time=2024-09-25T17:12:52.795Z level=INFO source=gpu.go:200 msg="looking for compatible GPUs"
ollama-1 | time=2024-09-25T17:12:52.976Z level=INFO source=gpu.go:568 msg="unable to load cuda driver library" library=/usr/lib/x86_64-linux-gnu/libcuda.so.1 error="cuda driver library init failure: 500"
ollama-1 | time=2024-09-25T17:12:52.977Z level=INFO source=gpu.go:568 msg="unable to load cuda driver library" library=/usr/lib/wsl/drivers/nvaci.inf_amd64_bcb4d5d133099d13/libcuda.so.1.1 error="cuda driver library init failure: 500"
ollama-1 | time=2024-09-25T17:12:52.997Z level=INFO source=gpu.go:347 msg="no compatible GPUs were discovered"
ollama-1 | time=2024-09-25T17:12:52.997Z level=INFO source=types.go:107 msg="inference compute" id=0 library=cpu variant=avx2 compute="" driver=0.0 name="" total="7.6 GiB" available="6.6 GiB"`

i am running streamlitapp chat app through WSL2 using docker

<!-- gh-comment-id:2374695816 --> @Paramjethwa commented on GitHub (Sep 25, 2024): put the path of cudart64_110.dll a the top of system variable and it did not work for me return : `[+] Running 2/0 ✔ Container local_multimodal_ai-ollama-1 Created 0.0s ✔ Container local_multimodal_ai-app-1 Created 0.0s Attaching to app-1, ollama-1 ollama-1 | 2024/09/25 17:12:44 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:15m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]" ollama-1 | time=2024-09-25T17:12:44.372Z level=INFO source=images.go:753 msg="total blobs: 28" ollama-1 | time=2024-09-25T17:12:44.504Z level=INFO source=images.go:760 msg="total unused blobs removed: 0" ollama-1 | time=2024-09-25T17:12:44.627Z level=INFO source=routes.go:1172 msg="Listening on [::]:11434 (version 0.3.10)" ollama-1 | time=2024-09-25T17:12:44.629Z level=INFO source=payload.go:30 msg="extracting embedded files" dir=/tmp/ollama2245169848/runners app-1 | app-1 | Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false. app-1 | app-1 | app-1 | You can now view your Streamlit app in your browser. app-1 | app-1 | Local URL: http://localhost:8501 app-1 | Network URL: http://172.18.0.3:8501 app-1 | External URL: http://103.110.166.152:8501 app-1 | ollama-1 | time=2024-09-25T17:12:52.794Z level=INFO source=payload.go:44 msg="Dynamic LLM libraries [rocm_v60102 cpu cpu_avx cpu_avx2 cuda_v11 cuda_v12]" ollama-1 | time=2024-09-25T17:12:52.795Z level=INFO source=gpu.go:200 msg="looking for compatible GPUs" ollama-1 | time=2024-09-25T17:12:52.976Z level=INFO source=gpu.go:568 msg="unable to load cuda driver library" library=/usr/lib/x86_64-linux-gnu/libcuda.so.1 error="cuda driver library init failure: 500" ollama-1 | time=2024-09-25T17:12:52.977Z level=INFO source=gpu.go:568 msg="unable to load cuda driver library" library=/usr/lib/wsl/drivers/nvaci.inf_amd64_bcb4d5d133099d13/libcuda.so.1.1 error="cuda driver library init failure: 500" ollama-1 | time=2024-09-25T17:12:52.997Z level=INFO source=gpu.go:347 msg="no compatible GPUs were discovered" ollama-1 | time=2024-09-25T17:12:52.997Z level=INFO source=types.go:107 msg="inference compute" id=0 library=cpu variant=avx2 compute="" driver=0.0 name="" total="7.6 GiB" available="6.6 GiB"` i am running streamlitapp chat app through WSL2 using docker
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#48996