[GH-ISSUE #1460] Getting the GPU running in WSL2? #784

New Issue

@gerroon commented on GitHub (Dec 11, 2023):

Hi

Thanks for your help. I believe I have CUDA running now, but it still complaints about it. It sounds like there is no "nvidia-smi" for wsl2. Maybe the code can check for it in another way.

023/12/11 17:43:55 images.go:732: total blobs: 6                                                                                                                                                                           2023/12/11 17:43:55 images.go:739: total unused blobs removed: 0                                                                                                                                                            2023/12/11 17:43:55 routes.go:843: Listening on 127.0.0.1:11434 (version 0.1.14)                                                                                                                                            2023/12/11 17:43:55 routes.go:863: warning: gpu support may not be enabled, check that you have installed GPU drivers: nvidia-smi command failed

Checking CUDA in the container.

./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 3080 Ti"
  CUDA Driver Version / Runtime Version          12.2 / 12.3
  CUDA Capability Major/Minor version number:    8.6
  Total amount of global memory:                 12288 MBytes (12884377600 bytes)
  (080) Multiprocessors, (128) CUDA Cores/MP:    10240 CUDA Cores
  GPU Max Clock rate:                            1665 MHz (1.66 GHz)
  Memory Clock rate:                             9501 Mhz
  Memory Bus Width:                              384-bit
  L2 Cache Size:                                 6291456 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        102400 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.2, CUDA Runtime Version = 12.3, NumDevs = 1
Result = PASS

@gerroon commented on GitHub (Dec 11, 2023): Hi Thanks for your help. I believe I have CUDA running now, but it still complaints about it. It sounds like there is no "**nvidia-smi**" for wsl2. Maybe the code can check for it in another way. ``` 023/12/11 17:43:55 images.go:732: total blobs: 6 2023/12/11 17:43:55 images.go:739: total unused blobs removed: 0 2023/12/11 17:43:55 routes.go:843: Listening on 127.0.0.1:11434 (version 0.1.14) 2023/12/11 17:43:55 routes.go:863: warning: gpu support may not be enabled, check that you have installed GPU drivers: nvidia-smi command failed ``` Checking CUDA in the container. ``` ./deviceQuery ./deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "NVIDIA GeForce RTX 3080 Ti" CUDA Driver Version / Runtime Version 12.2 / 12.3 CUDA Capability Major/Minor version number: 8.6 Total amount of global memory: 12288 MBytes (12884377600 bytes) (080) Multiprocessors, (128) CUDA Cores/MP: 10240 CUDA Cores GPU Max Clock rate: 1665 MHz (1.66 GHz) Memory Clock rate: 9501 Mhz Memory Bus Width: 384-bit L2 Cache Size: 6291456 bytes Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total shared memory per multiprocessor: 102400 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 1536 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device supports Managed Memory: Yes Device supports Compute Preemption: Yes Supports Cooperative Kernel Launch: Yes Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.2, CUDA Runtime Version = 12.3, NumDevs = 1 Result = PASS ```

GiteaMirror commented

@BruceMacD commented on GitHub (Dec 12, 2023):

Typically nvidia-smi will be available in wsl2 if the nvidia drivers have been installed on the host system:
https://www.nvidia.com/Download/index.aspx

I'd recommend checking the nvidia drivers in windows and seeing if they are up to date. I'll take a look at seeing if we can package in deviceQuery like you used here. It would be nice to have something more reliable.

@BruceMacD commented on GitHub (Dec 12, 2023): Typically `nvidia-smi` will be available in wsl2 if the nvidia drivers have been installed on the host system: https://www.nvidia.com/Download/index.aspx I'd recommend checking the nvidia drivers in windows and seeing if they are up to date. I'll take a look at seeing if we can package in deviceQuery like you used here. It would be nice to have something more reliable.

GiteaMirror commented

https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=WSL-Ubuntu&target_version=2.0

@gerroon commented on GitHub (Dec 12, 2023):

@BruceMacD Thanks for the reply. I was able to install it properly after trying couple links.. I think this was the one helped me

@gerroon commented on GitHub (Dec 12, 2023): @BruceMacD Thanks for the reply. I was able to install it properly after trying couple links.. I think this was the one helped me https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=WSL-Ubuntu&target_version=2.0

GiteaMirror commented

@ghost commented on GitHub (Dec 19, 2023):

Ollama cannot find the GPU no matter what I try:

/var/log/syslog:

routes.go:891: warning: gpu support may not be enabled, check that you have installed GPU drivers: nvidia-smi command failed

$ which nvidia-smi
/usr/lib/wsl/lib/nvidia-smi
$ nvidia-smi
Tue Dec 19 06:31:34 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.36                 Driver Version: 546.33       CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4080        On  | 00000000:01:00.0  On |                  N/A |
|  0%   48C    P5              35W / 320W |   1374MiB / 16376MiB |      2%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
$

$ cat /proc/version
Linux version 5.15.133.1-microsoft-standard-WSL2 (root@1c602f52c2e4) (gcc (GCC) 11.2.0, GNU ld (GNU Binutils) 2.37) #1 SMP Thu Oct 5 21:02:42 UTC 2023

Latest version of WSL2 kernel (just did wsl --update). All latest Windows updates.

Even tried installing cuda-toolkit-12-3 (WSL-Ubuntu), didn't affect anything.

@ghost commented on GitHub (Dec 19, 2023): Ollama cannot find the GPU no matter what I try: `/var/log/syslog`: ``` routes.go:891: warning: gpu support may not be enabled, check that you have installed GPU drivers: nvidia-smi command failed ``` ``` $ which nvidia-smi /usr/lib/wsl/lib/nvidia-smi $ nvidia-smi Tue Dec 19 06:31:34 2023 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 545.36 Driver Version: 546.33 CUDA Version: 12.3 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 4080 On | 00000000:01:00.0 On | N/A | | 0% 48C P5 35W / 320W | 1374MiB / 16376MiB | 2% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+ $ ``` ``` $ cat /proc/version Linux version 5.15.133.1-microsoft-standard-WSL2 (root@1c602f52c2e4) (gcc (GCC) 11.2.0, GNU ld (GNU Binutils) 2.37) #1 SMP Thu Oct 5 21:02:42 UTC 2023 ``` Latest version of WSL2 kernel (just did `wsl --update`). All latest Windows updates. Even tried installing `cuda-toolkit-12-3` (WSL-Ubuntu), didn't affect anything.

GiteaMirror commented

@mongolu commented on GitHub (Dec 19, 2023):

I am on Win11 with wsl2 and I run ollama in docker (built locally from Dockerfile) => it's using GPU.

@mongolu commented on GitHub (Dec 19, 2023): I am on Win11 with wsl2 and I run ollama in docker (built locally from Dockerfile) => it's using GPU.

GiteaMirror commented

@gerroon commented on GitHub (Dec 19, 2023):

@yurigeinish you might need to symlink nvidia-smi, if you installed it that should be in the sytem but it is not in the path by default.

@gerroon commented on GitHub (Dec 19, 2023): @yurigeinish you might need to symlink nvidia-smi, if you installed it that should be in the sytem but it is not in the path by default.

GiteaMirror commented

@ghost commented on GitHub (Dec 19, 2023):

@gerroon Looks like sudo ln -s $(which nvidia-smi) /usr/bin/ helped, thanks. At least I'm not seeing the related error in the logs anymore. And when Ollama generates an answer, first I get a CPU spike around 41% and several seconds later I'm getting 100% GPU usage while the answer is getting generated. I assume that's how it should be, thank you.

@ghost commented on GitHub (Dec 19, 2023): @gerroon Looks like `sudo ln -s $(which nvidia-smi) /usr/bin/` helped, thanks. At least I'm not seeing the related error in the logs anymore. And when Ollama generates an answer, first I get a CPU spike around 41% and several seconds later I'm getting 100% GPU usage while the answer is getting generated. I assume that's how it should be, thank you.

GiteaMirror commented

@KDVan commented on GitHub (Dec 19, 2023):

Hi I'm having trouble trying to make ollama (or maybe wsl) to utilizate my GPU. When I try to run the model, only the CPU spike up to 100%. Am I missing something, I have installed all necessary drivers for windows and ubuntu.

@KDVan commented on GitHub (Dec 19, 2023): Hi I'm having trouble trying to make ollama (or maybe wsl) to utilizate my GPU. When I try to run the model, only the CPU spike up to 100%. Am I missing something, I have installed all necessary drivers for windows and ubuntu.

GiteaMirror commented

@cadeon commented on GitHub (Dec 21, 2023):

<removed because I just realized I wrote this against the entirely wrong project. Sorry.>

@cadeon commented on GitHub (Dec 21, 2023): `<removed because I just realized I wrote this against the entirely wrong project. Sorry.>`

GiteaMirror commented

@siikdUde commented on GitHub (Dec 22, 2023):

I got ollama to start using my rtx 4090 by:

Uninstalling Ubuntu
Uninstalling WSL
Reboot
Installing WSL
Installing Ubuntu
(Crucial Part): Basically this is optional for you but it makes the process streamlined:

Installed oobabooga via the one click installer start_wsl.bat for WSL in my root folder.
Input all the values for my system and such (such as specifying I have an nvidia GPU) and it went ahead and downloaded all CUDA drivers, toolkit, pytorch and all other dependencies.
Again, this part is optional as it is for installing oobabooga, but as a welcomed side effect, it installed everything I needed to get Ollama working with my GPU. As a result, my GPU usage now is between 40% - 100% and CPU around 60% while the model is working. Before it was at 0% with my CPU being at around 70%.

Also, it installs the 12.1 version of the toolkit, which I believe is the one that works (at least for me). When I updated to 12.3, my GPU stopped working with Ollama, so be mindful of that.

Hope this helps anyone that comes across this thread.

@siikdUde commented on GitHub (Dec 22, 2023): I got ollama to start using my rtx 4090 by: 1. Uninstalling Ubuntu 2. Uninstalling WSL 3. Reboot 4. Installing WSL 5. Installing Ubuntu 6. (Crucial Part): Basically this is optional for you but it makes the process streamlined: - Installed [oobabooga](https://github.com/oobabooga/text-generation-webui?tab=readme-ov-file#how-to-install) via the one click installer `start_wsl.bat` for WSL in my root folder. - Input all the values for my system and such (such as specifying I have an nvidia GPU) and it went ahead and downloaded all CUDA drivers, toolkit, pytorch and all other dependencies. - Again, this part is optional as it is for installing oobabooga, but as a welcomed side effect, it installed everything I needed to get Ollama working with my GPU. As a result, my GPU usage now is between 40% - 100% and CPU around 60% while the model is working. Before it was at 0% with my CPU being at around 70%. Also, it installs the 12.1 version of the toolkit, which I believe is the one that works (at least for me). When I updated to 12.3, my GPU stopped working with Ollama, so be mindful of that. Hope this helps anyone that comes across this thread.

GiteaMirror commented

@mongolu commented on GitHub (Dec 22, 2023):

Uau!
A lot of long things you:ve done.
Glad that you sorted it out.

@mongolu commented on GitHub (Dec 22, 2023): Uau! A lot of long things you:ve done. Glad that you sorted it out.

GiteaMirror commented

@pai1234 commented on GitHub (Feb 8, 2024):

If you are running on WSL2 but only have built-in gpu (Intel R Iris R Xe Graphics)? Any idea on how to set this up on Ubuntu?

@pai1234 commented on GitHub (Feb 8, 2024): If you are running on WSL2 but only have built-in gpu (Intel R Iris R Xe Graphics)? Any idea on how to set this up on Ubuntu?

GiteaMirror commented