[GH-ISSUE #1666] Ollama not using GPU in Windows WSL2 with Docker #936

Closed
opened 2026-04-12 10:38:17 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @dominickp on GitHub (Dec 22, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1666

I'm seeing a lot of CPU usage when the model runs. I do see a tiny bit of GPU usage but I don't think what I'm seeing is optimal. I also see log messages saying the GPU is not working.

I'm running Docker Desktop on Windows 11 with WSL2 backend on Ubuntu 22.04.03 LTS. I believe I have the correct drivers installed in Ubuntu.

In the ollama logs:

ollama  | 2023/12/22 00:17:24 routes.go:915: warning: gpu support may not be enabled, check that you have installed GPU drivers: nvidia-smi command failed
...
ollama  | 2023/12/22 00:13:33 llama.go:407: skipping accelerated runner because num_gpu=0
...
ollama  | {"timestamp":1703204013,"level":"WARNING","function":"server_params_parse","line":2160,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":-1}

Inside my WSL2 instance shell:

dom@Dom-14700K:~$ nvidia-smi
Thu Dec 21 19:16:55 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.04              Driver Version: 546.17       CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060        On  | 00000000:01:00.0  On |                  N/A |
|  0%   25C    P5              39W / 170W |    837MiB / 12288MiB |      2%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |

Inside the ollama container itself:

docker-compose exec ollama bash

root@e70cdd37fb90:/# nvidia-smi 
bash: nvidia-smi: command not found
root@e70cdd37fb90:/#  

The docker compose I'm using:

version: '3.6'

services:
  ollama:
    # Uncomment below for GPU support
    deploy: {resources: {reservations: {devices: [{driver: nvidia, capabilities: [gpu, video]}]}}}
    volumes:
      - \\wsl$\Ubuntu-22.04\home\dom\ollama:/root/.ollama
    container_name: ollama
    tty: true
    restart: unless-stopped
    image: ollama/ollama:0.1.17

# ... redacted the ollama-webui config

volumes:
  ollama: {}

Am I doing something wrong? I really thought that https://github.com/jmorganca/ollama/pull/1644 would solve my issue. But I'm unsure now because of the recent update implicating Swarm mode, which I think I am not using because I am using Docker Desktop which doesn't support that. I would appreciate any help.

If ollama itself needs to execute nvidia-smi, then shouldn't the container have it installed? I don't understand how the container would be able to reach out onto the Docker host to run nvidia-smi there...

Originally created by @dominickp on GitHub (Dec 22, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1666 I'm seeing a lot of CPU usage when the model runs. I do see a tiny bit of GPU usage but I don't think what I'm seeing is optimal. I also see log messages saying the GPU is not working. I'm running Docker Desktop on Windows 11 with WSL2 backend on Ubuntu 22.04.03 LTS. I believe I have the correct drivers installed in Ubuntu. In the ollama logs: ``` ollama | 2023/12/22 00:17:24 routes.go:915: warning: gpu support may not be enabled, check that you have installed GPU drivers: nvidia-smi command failed ... ollama | 2023/12/22 00:13:33 llama.go:407: skipping accelerated runner because num_gpu=0 ... ollama | {"timestamp":1703204013,"level":"WARNING","function":"server_params_parse","line":2160,"message":"Not compiled with GPU offload support, --n-gpu-layers option will be ignored. See main README.md for information on enabling GPU BLAS support","n_gpu_layers":-1} ``` Inside my WSL2 instance shell: ``` dom@Dom-14700K:~$ nvidia-smi Thu Dec 21 19:16:55 2023 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 545.29.04 Driver Version: 546.17 CUDA Version: 12.3 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3060 On | 00000000:01:00.0 On | N/A | | 0% 25C P5 39W / 170W | 837MiB / 12288MiB | 2% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | ``` Inside the ollama container itself: ``` docker-compose exec ollama bash root@e70cdd37fb90:/# nvidia-smi bash: nvidia-smi: command not found root@e70cdd37fb90:/# ``` The docker compose I'm using: ```yml version: '3.6' services: ollama: # Uncomment below for GPU support deploy: {resources: {reservations: {devices: [{driver: nvidia, capabilities: [gpu, video]}]}}} volumes: - \\wsl$\Ubuntu-22.04\home\dom\ollama:/root/.ollama container_name: ollama tty: true restart: unless-stopped image: ollama/ollama:0.1.17 # ... redacted the ollama-webui config volumes: ollama: {} ``` Am I doing something wrong? I really thought that https://github.com/jmorganca/ollama/pull/1644 would solve my issue. But I'm unsure now because of the recent update implicating Swarm mode, which I think I am not using because I am using Docker Desktop which doesn't support that. I would appreciate any help. If ollama itself needs to execute `nvidia-smi`, then shouldn't the container have it installed? I don't understand how the container would be able to reach out onto the Docker host to run `nvidia-smi` there...
Author
Owner

@nuqz1 commented on GitHub (Dec 22, 2023):

i noticed that as well. It seems that the stream itself comes from the GPU but my guess some kind of tokenization/detokenization before that is done on the CPU?

<!-- gh-comment-id:1867089618 --> @nuqz1 commented on GitHub (Dec 22, 2023): i noticed that as well. It seems that the stream itself comes from the GPU but my guess some kind of tokenization/detokenization before that is done on the CPU?
Author
Owner

@dominickp commented on GitHub (Dec 22, 2023):

I think I figured it out, it's kind of dumb. It seems that providing the "video" capability was breaking everything.

I changed:

services:
  ollama:
    deploy: {resources: {reservations: {devices: [{driver: nvidia, capabilities: [gpu, video]}]}}}
    ...

to:

services:
  ollama:
    deploy: {resources: {reservations: {devices: [{driver: nvidia, count: 1, capabilities: [gpu]}]}}}
    ...

and all the errors went away.

llama2 is super fast now. dolphin-mixtral still takes a long time and uses a ton of CPU -- I wonder if that's because I don't have enough VRAM to run it optimally. But since the errors are gone and I see an improvement, I think I resolved the issue.

<!-- gh-comment-id:1867090712 --> @dominickp commented on GitHub (Dec 22, 2023): I think I figured it out, it's kind of dumb. It seems that providing the "video" capability was breaking everything. I changed: ```yml services: ollama: deploy: {resources: {reservations: {devices: [{driver: nvidia, capabilities: [gpu, video]}]}}} ... ``` to: ```yml services: ollama: deploy: {resources: {reservations: {devices: [{driver: nvidia, count: 1, capabilities: [gpu]}]}}} ... ``` and all the errors went away. llama2 is super fast now. dolphin-mixtral still takes a long time and uses a ton of CPU -- I wonder if that's because I don't have enough VRAM to run it optimally. But since the errors are gone and I see an improvement, I think I resolved the issue.
Author
Owner

@PythonGermany commented on GitHub (Jan 26, 2024):

I had the same issue using it like this:

services:
  ollama:
    image: ollama/ollama:latest
    ports:
      - 11434:11434
    volumes:
      - ./data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities: [ gpu ]

what solved it for me was changing the devices section to this:

          devices:
            - driver: nvidia
              count: 1
              capabilities: [ gpu ]
<!-- gh-comment-id:1912608940 --> @PythonGermany commented on GitHub (Jan 26, 2024): I had the same issue using it like this: ```yaml services: ollama: image: ollama/ollama:latest ports: - 11434:11434 volumes: - ./data:/root/.ollama deploy: resources: reservations: devices: - driver: nvidia capabilities: [ gpu ] ``` what solved it for me was changing the devices section to this: ```yaml devices: - driver: nvidia count: 1 capabilities: [ gpu ] ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#936