[GH-ISSUE #1167] Another CUDA error 100 problem on WSL2 with RTX3090 #591

Closed
opened 2026-04-12 10:17:52 -05:00 by GiteaMirror · 16 comments
Owner

Originally created by @samxu29 on GitHub (Nov 17, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1167

Originally assigned to: @dhiltgen on GitHub.

Firstly I want to thank you for all the developers, this is an amazing project. Being a noob I am running into some problem I am hoping someone can give me the answer to.

I have a similar strange problem to this issue: https://github.com/jmorganca/ollama/issues/684

Although first time I thought it was my Cuda toolkit problem so I uninstalled the ollama and freshly installed Cuda toolkit then ran the script curl https://ollama.ai/install.sh | sh to install ollama again.
But I still get CUDA error 100 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:5661: no CUDA-capable device is detected current device: 32544

I had no problem running llama-cpp-python in another project I was working on with LLAMA_CUBLAS support, but for some reason for the life of me I can't get ollama running on GPU here:

2023/11/17 01:27:31 llama.go:290: 23013 MB VRAM available, loading up to 150 GPU layers
2023/11/17 01:27:31 llama.go:415: starting llama runner
2023/11/17 01:27:31 llama.go:473: waiting for llama runner to start responding

CUDA error 100 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:5661: no CUDA-capable device is detected current device: 32544
2023/11/17 01:27:31 llama.go:430: 100 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:5661: no CUDA-capable device is detected
current device: 32544
2023/11/17 01:27:31 llama.go:438: error starting llama runner: llama runner process has terminated
2023/11/17 01:27:31 llama.go:504: llama runner stopped successfully
2023/11/17 01:27:31 llama.go:415: starting llama runner
2023/11/17 01:27:31 llama.go:473: waiting for llama runner to start responding
{"timestamp":1700202451,"level":"WARNING","function":"server_params_parse","line":871,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":-1}
{"timestamp":1700202451,"level":"INFO","function":"main","line":1323,"message":"build info","build":219,"commit":"9e70cc0"}
{"timestamp":1700202451,"level":"INFO","function":"main","line":1325,"message":"system info","n_threads":10,"n_threads_batch":-1,"total_threads":20,"system_info":"AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | "}
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from 

How would I change num_gpu? And are there any flags I need to turn CUBLAS on for ollama to utilize the GPU?

Originally created by @samxu29 on GitHub (Nov 17, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1167 Originally assigned to: @dhiltgen on GitHub. Firstly I want to thank you for all the developers, this is an amazing project. Being a noob I am running into some problem I am hoping someone can give me the answer to. I have a similar strange problem to this issue: https://github.com/jmorganca/ollama/issues/684 Although first time I thought it was my Cuda toolkit problem so I uninstalled the ollama and freshly installed Cuda toolkit then ran the script `curl https://ollama.ai/install.sh | sh` to install ollama again. But I still get `CUDA error 100 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:5661: no CUDA-capable device is detected current device: 32544` I had no problem running llama-cpp-python in another project I was working on with `LLAMA_CUBLAS` support, but for some reason for the life of me I can't get ollama running on GPU here: ``` 2023/11/17 01:27:31 llama.go:290: 23013 MB VRAM available, loading up to 150 GPU layers 2023/11/17 01:27:31 llama.go:415: starting llama runner 2023/11/17 01:27:31 llama.go:473: waiting for llama runner to start responding CUDA error 100 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:5661: no CUDA-capable device is detected current device: 32544 2023/11/17 01:27:31 llama.go:430: 100 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:5661: no CUDA-capable device is detected current device: 32544 2023/11/17 01:27:31 llama.go:438: error starting llama runner: llama runner process has terminated 2023/11/17 01:27:31 llama.go:504: llama runner stopped successfully 2023/11/17 01:27:31 llama.go:415: starting llama runner 2023/11/17 01:27:31 llama.go:473: waiting for llama runner to start responding {"timestamp":1700202451,"level":"WARNING","function":"server_params_parse","line":871,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":-1} {"timestamp":1700202451,"level":"INFO","function":"main","line":1323,"message":"build info","build":219,"commit":"9e70cc0"} {"timestamp":1700202451,"level":"INFO","function":"main","line":1325,"message":"system info","n_threads":10,"n_threads_batch":-1,"total_threads":20,"system_info":"AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | "} llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from ``` How would I change `num_gpu`? And are there any flags I need to turn `CUBLAS` on for ollama to utilize the GPU?
GiteaMirror added the bugnvidiawindows labels 2026-04-12 10:17:52 -05:00
Author
Owner

@BruceMacD commented on GitHub (Nov 17, 2023):

Thanks for bringing this to our attention, would you be able to add the output of nvidia-smi to this issue? I'm trying to narrow down an nvidia library version.

<!-- gh-comment-id:1817153637 --> @BruceMacD commented on GitHub (Nov 17, 2023): Thanks for bringing this to our attention, would you be able to add the output of `nvidia-smi` to this issue? I'm trying to narrow down an nvidia library version.
Author
Owner

@samxu29 commented on GitHub (Nov 18, 2023):

Thanks for bringing this to our attention, would you be able to add the output of nvidia-smi to this issue? I'm trying to narrow down an nvidia library version.

image

NVIDIA-SMI 545.29.01
CUDA Version: 12.3
Driver Version: 546.01

Thank you so much for keeping track of this issue.

<!-- gh-comment-id:1817326824 --> @samxu29 commented on GitHub (Nov 18, 2023): > Thanks for bringing this to our attention, would you be able to add the output of `nvidia-smi` to this issue? I'm trying to narrow down an nvidia library version. ![image](https://github.com/jmorganca/ollama/assets/22229980/10b3078f-b80b-4054-a55f-a811ebcf2022) NVIDIA-SMI 545.29.01 CUDA Version: 12.3 Driver Version: 546.01 Thank you so much for keeping track of this issue.
Author
Owner

@WongChoice commented on GitHub (Nov 18, 2023):

Hey specify the gpu

echo $CUDA_VISIBLE_DEVICES
blank

export CUDA_VISIBLE_DEVICES=0

and reinstall

<!-- gh-comment-id:1817515125 --> @WongChoice commented on GitHub (Nov 18, 2023): Hey specify the gpu `echo $CUDA_VISIBLE_DEVICES` *blank* `export CUDA_VISIBLE_DEVICES=0` and reinstall
Author
Owner

@samxu29 commented on GitHub (Nov 18, 2023):

Hey specify the gpu

echo $CUDA_VISIBLE_DEVICES blank

export CUDA_VISIBLE_DEVICES=0

and reinstall

Still the same error...

<!-- gh-comment-id:1817588594 --> @samxu29 commented on GitHub (Nov 18, 2023): > Hey specify the gpu > > `echo $CUDA_VISIBLE_DEVICES` _blank_ > > `export CUDA_VISIBLE_DEVICES=0` > > and reinstall Still the same error...
Author
Owner

@samxu29 commented on GitHub (Nov 19, 2023):

So I tried to set export CUDA_VISIBLE_DEVICES=0 and CMAKE_ARGS="-DLLAMA_CUBLAS=on" none of it worked. I am going to install a new wsl and see if it's something wrong with my wsl setup.

update:
I installed a new Ubuntu WSL, and everything is working with CUDA.
I am confused, both WSLs have the same driver version and CUDA version, not sure where is broken.

<!-- gh-comment-id:1817742758 --> @samxu29 commented on GitHub (Nov 19, 2023): So I tried to set `export CUDA_VISIBLE_DEVICES=0` and `CMAKE_ARGS="-DLLAMA_CUBLAS=on"` none of it worked. I am going to install a new wsl and see if it's something wrong with my wsl setup. update: I installed a new Ubuntu WSL, and everything is working with CUDA. I am confused, both WSLs have the same driver version and CUDA version, not sure where is broken.
Author
Owner

@Nan-Do commented on GitHub (Nov 20, 2023):

This seems like a problem with llama.cpp, I'm not sure llama.cpp is supposed to work on WSL with cuda, is clearly not working in your system, this might be due to the precompiled llama.cpp provided by the ollama installer. If this is the cause you could compile llama.cpp in your system and switch the one ollama provides.

The command export CUDA_VISIBLE_DEVICES=0 will only work if you're compiling llama.cpp from scratch not by using the ollama installing script.

This will probably let load the models, but without gpu acceleration:

CUDA_VISIBLE_DEVICES=0 ollama run llama2
<!-- gh-comment-id:1818160386 --> @Nan-Do commented on GitHub (Nov 20, 2023): This seems like a problem with llama.cpp, I'm not sure llama.cpp is supposed to work on WSL with cuda, is clearly not working in your system, this might be due to the precompiled llama.cpp provided by the ollama installer. If this is the cause you could compile llama.cpp in your system and switch the one ollama provides. The command `export CUDA_VISIBLE_DEVICES=0` will only work if you're compiling llama.cpp from scratch not by using the ollama installing script. This will probably let load the models, but without gpu acceleration: ``` CUDA_VISIBLE_DEVICES=0 ollama run llama2 ```
Author
Owner

@tetratorus commented on GitHub (Nov 27, 2023):

building it from source fixed this for me

<!-- gh-comment-id:1827148023 --> @tetratorus commented on GitHub (Nov 27, 2023): building it from source fixed this for me
Author
Owner

@taweili commented on GitHub (Nov 29, 2023):

I had the same issue with Titan RTX. I tried several of the suggestions in this post, but none worked for me. I ended up compiling from the source, and it works like a charm. I also discovered that including the Windows path while building the code would mess up the build. I put my experience in #1265

<!-- gh-comment-id:1831923411 --> @taweili commented on GitHub (Nov 29, 2023): I had the same issue with Titan RTX. I tried several of the suggestions in this post, but none worked for me. I ended up compiling from the source, and it works like a charm. I also discovered that including the Windows path while building the code would mess up the build. I put my experience in #1265
Author
Owner

@samxu29 commented on GitHub (Nov 30, 2023):

I had the same issue with Titan RTX. I tried several of the suggestions in this post, but none worked for me. I ended up compiling from the source, and it works like a charm. I also discovered that including the Windows path while building the code would mess up the build. I put my experience in #1265

I found a very strange thing, so I powered up a new WSL with no Cuda toolkit installed ollama worked like a charm. Then the next day after I installed Cuda toolkit 11.8, and then Ollama stopped recognizing the GPU again. It seems like it doesn't like the Cuda toolkit 11.8 I assume?

<!-- gh-comment-id:1833152382 --> @samxu29 commented on GitHub (Nov 30, 2023): > I had the same issue with Titan RTX. I tried several of the suggestions in this post, but none worked for me. I ended up compiling from the source, and it works like a charm. I also discovered that including the Windows path while building the code would mess up the build. I put my experience in #1265 I found a very strange thing, so I powered up a new WSL with no Cuda toolkit installed ollama worked like a charm. Then the next day after I installed Cuda toolkit 11.8, and then Ollama stopped recognizing the GPU again. It seems like it doesn't like the Cuda toolkit 11.8 I assume?
Author
Owner

@samxu29 commented on GitHub (Nov 30, 2023):

building it from source fixed this for me

I built it from the source, it did not fix it for me.
Could be cuda-toolkit version?

<!-- gh-comment-id:1833166841 --> @samxu29 commented on GitHub (Nov 30, 2023): > building it from source fixed this for me I built it from the source, it did not fix it for me. Could be cuda-toolkit version?
Author
Owner

@Nan-Do commented on GitHub (Dec 1, 2023):

The system should work with cuda 11.8, it is the version supported by conda and it's been fairly tested by many people. It could be several things, probably an abi mismatch between some of the required libraries but the error ollama shows is not very informative. Do you have acceleration in other applications?

<!-- gh-comment-id:1835230753 --> @Nan-Do commented on GitHub (Dec 1, 2023): The system should work with cuda 11.8, it is the version supported by conda and it's been fairly tested by many people. It could be several things, probably an abi mismatch between some of the required libraries but the error ollama shows is not very informative. Do you have acceleration in other applications?
Author
Owner

@taweili commented on GitHub (Dec 1, 2023):

I had the same issue with Titan RTX. I tried several of the suggestions in this post, but none worked for me. I ended up compiling from the source, and it works like a charm. I also discovered that including the Windows path while building the code would mess up the build. I put my experience in #1265

I found a very strange thing, so I powered up a new WSL with no Cuda toolkit installed ollama worked like a charm. Then the next day after I installed Cuda toolkit 11.8, and then Ollama stopped recognizing the GPU again. It seems like it doesn't like the Cuda toolkit 11.8 I assume?

This sounds very strange. So far, I have assumed that WSL is a straight Linux but it seems that it has tighter integration with the Windows. Have you check the environment? Do you have the Windows path in the newly installed WSL without cuda? It may be the ollama is using the toolkit from Windows? I am studying the build process to figure out how exactly the distribution binary was built. New to Go and have to get into the go build mindset.

<!-- gh-comment-id:1836306944 --> @taweili commented on GitHub (Dec 1, 2023): > > I had the same issue with Titan RTX. I tried several of the suggestions in this post, but none worked for me. I ended up compiling from the source, and it works like a charm. I also discovered that including the Windows path while building the code would mess up the build. I put my experience in #1265 > > I found a very strange thing, so I powered up a new WSL with no Cuda toolkit installed ollama worked like a charm. Then the next day after I installed Cuda toolkit 11.8, and then Ollama stopped recognizing the GPU again. It seems like it doesn't like the Cuda toolkit 11.8 I assume? This sounds very strange. So far, I have assumed that WSL is a straight Linux but it seems that it has tighter integration with the Windows. Have you check the environment? Do you have the Windows path in the newly installed WSL without cuda? It may be the ollama is using the toolkit from Windows? I am studying the build process to figure out how exactly the distribution binary was built. New to Go and have to get into the go build mindset.
Author
Owner

@egeres commented on GitHub (Dec 10, 2023):

I'm getting CUDA error 100 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:484: no CUDA-capable device is detected current device: 0 after having run curl https://ollama.ai/install.sh | sh

My nvidia-smi output
Sun Dec 10 13:52:48 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.01              Driver Version: 546.01       CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060 Ti     On  | 00000000:04:00.0 Off |                  N/A |
| 31%   29C    P8               8W / 200W |      0MiB /  8192MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce RTX 3090        On  | 00000000:09:00.0  On |                  N/A |
| 48%   45C    P8              47W / 370W |   3820MiB / 24576MiB |     36%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A        22      G   /Xwayland                                 N/A      |
|    1   N/A  N/A        22      G   /Xwayland                                 N/A      |
+---------------------------------------------------------------------------------------+

However, I think my installation went fine...

$ curl https://ollama.ai/install.sh | sh
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  7983    0  7983    0     0  39716      0 --:--:-- --:--:-- --:--:-- 39716
>>> Downloading ollama...
######################################################################## 100.0%#=#=#
######################################################################## 100.0%
>>> Installing ollama to /usr/local/bin...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> NVIDIA GPU installed.
>>> The Ollama API is now available at 0.0.0.0:11434.
>>> Install complete. Run "ollama" from the command line.

What other information can I provide to address this issue?

<!-- gh-comment-id:1848958614 --> @egeres commented on GitHub (Dec 10, 2023): I'm getting `CUDA error 100 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:484: no CUDA-capable device is detected current device: 0` after having run `curl https://ollama.ai/install.sh | sh` <details> <summary>My nvidia-smi output</summary> ``` Sun Dec 10 13:52:48 2023 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 545.29.01 Driver Version: 546.01 CUDA Version: 12.3 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3060 Ti On | 00000000:04:00.0 Off | N/A | | 31% 29C P8 8W / 200W | 0MiB / 8192MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 NVIDIA GeForce RTX 3090 On | 00000000:09:00.0 On | N/A | | 48% 45C P8 47W / 370W | 3820MiB / 24576MiB | 36% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 22 G /Xwayland N/A | | 1 N/A N/A 22 G /Xwayland N/A | +---------------------------------------------------------------------------------------+ ``` </details> However, I think my installation went fine... ``` $ curl https://ollama.ai/install.sh | sh % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 7983 0 7983 0 0 39716 0 --:--:-- --:--:-- --:--:-- 39716 >>> Downloading ollama... ######################################################################## 100.0%#=#=# ######################################################################## 100.0% >>> Installing ollama to /usr/local/bin... >>> Adding current user to ollama group... >>> Creating ollama systemd service... >>> NVIDIA GPU installed. >>> The Ollama API is now available at 0.0.0.0:11434. >>> Install complete. Run "ollama" from the command line. ``` What other information can I provide to address this issue?
Author
Owner

@seanmavley commented on GitHub (Dec 13, 2023):

@egeres It just doesn't work for me too.

I know my GPU is enabled, and active, because I can run PrivateGPT and I get the BLAS =1 and it runs on GPU fine, no issues, no errors. GPU gets detected alright.

Yet Ollama is complaining that no GPU is detected.

nvidia-smi also indicates GPU is detected. I'm not sure what the problem is.

I'm going to try and build from source and see.

<!-- gh-comment-id:1854815518 --> @seanmavley commented on GitHub (Dec 13, 2023): @egeres It just doesn't work for me too. I know my GPU is enabled, and active, because I can run PrivateGPT and I get the `BLAS =1` and it runs on GPU fine, no issues, no errors. GPU gets detected alright. Yet Ollama is complaining that no GPU is detected. `nvidia-smi` also indicates GPU is detected. I'm not sure what the problem is. I'm going to try and build from source and see.
Author
Owner

@dhiltgen commented on GitHub (May 2, 2024):

Our GPU support code has evolved quite a bit since Nov, and we have a Windows version available now as well, which eliminates the need of using WSL to get Ollama running on Windows.

Please give the latest release a try (in particular, 0.1.33 fixes a number of Window bugs) and let us know.

https://github.com/ollama/ollama/releases

<!-- gh-comment-id:2091673063 --> @dhiltgen commented on GitHub (May 2, 2024): Our GPU support code has evolved quite a bit since Nov, and we have a Windows version available now as well, which eliminates the need of using WSL to get Ollama running on Windows. Please give the latest release a try (in particular, 0.1.33 fixes a number of Window bugs) and let us know. https://github.com/ollama/ollama/releases
Author
Owner

@dhiltgen commented on GitHub (May 21, 2024):

I'm going to go ahead and close this one. Please try out the latest version, and if you're still seeing cuda errors, please let us know, share the server log, and we'll re-open the issue.

<!-- gh-comment-id:2123147018 --> @dhiltgen commented on GitHub (May 21, 2024): I'm going to go ahead and close this one. Please try out the latest version, and if you're still seeing cuda errors, please let us know, share the server log, and we'll re-open the issue.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#591