[GH-ISSUE #3211] GPU Not detected on kubernetes - works localy #1979

Closed
opened 2026-04-12 12:10:13 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @didlawowo on GitHub (Mar 18, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3211

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

i have cluster kubernetes with 4070 Super GPU

On kubernetes cluster into container ollama doesn't detect gpu, but it work if i am running ollam directly on the node who have the gpu

stream logs failed container "ollama" in pod "ollama-74fbf7d68b-lglf9" is waiting to start: ContainerCreating for ollama/ollama-74fbf7d68b-lglf9 (ollama)
time=2024-03-18T03:00:29.503Z level=INFO source=images.go:806 msg="total blobs: 0"
time=2024-03-18T03:00:29.515Z level=INFO source=images.go:813 msg="total unused blobs removed: 0"
time=2024-03-18T03:00:29.515Z level=INFO source=routes.go:1110 msg="Listening on :11434 (version 0.1.29)"
time=2024-03-18T03:00:29.516Z level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama2476510653/runners ..."
time=2024-03-18T03:00:31.661Z level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [rocm_v60000 cpu_avx2 cpu cpu_avx cuda_v11]"
time=2024-03-18T03:00:31.661Z level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-18T03:00:31.661Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-18T03:00:31.668Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: ]"
time=2024-03-18T03:00:31.668Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-18T03:00:31.668Z level=INFO source=routes.go:1133 msg="no GPU detected"

What did you expect to see?

i expect to see my gpu discovered by ollama, cause my gpu is correctly present in the cluster (python whisper work)

Steps to reproduce

deploy olama with helm on kubernetes

with this parameter

chart: ollama
repoURL: https://otwld.github.io/ollama-helm/
targetRevision: 0.19.0
helm:
  values: |
    image:
      repository: fizzbuzz2/ollama
      tag: latest
      pullPolicy: Always
    imagePullSecrets:
      - name: registry-credentials
    runtimeClass: nvidia
    extraEnv:
      - name: NVIDIA_VISIBLE_DEVICES
        value: all
      - name: NVARCH
        value: x86_64
      - name: NV_CUDA_CUDART_VERSION
        value: 12.3.2
      - name: NVIDIA_DRIVER_CAPABILITIES
        value: all
    # extraArgs:
    #   - --gpu=all

    autoscaling:
      enabled: true
      minReplicas: 1
      maxReplicas: 2
      targetCPUUtilizationPercentage: 80
      targetMemoryUtilizationPercentage: 80

    ollama:
      gpu:
        enabled: true
        number: 3

      models:
        - mistral
        - codellama
        - llava

Are there any recent changes that introduced the issue?

never function before

i have nvidia smi on the node, as you can see, the ollama local into the node is here, and the whisper from kubernetes too. but not the ollama from kubernetes.

cluster@nvidia:~$ nvidia-smi
Mon Mar 18 04:02:36 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4070 ... Off | 00000000:05:00.0 Off | N/A |
| 0% 42C P8 6W / 220W | 5048MiB / 12282MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1201 C /usr/local/bin/ollama 454MiB |
| 0 N/A N/A 61256 C /usr/bin/python3 4586MiB |
+-----------------------------------------------------------------------------------------+

cluster@nvidia:~$ nvidia-container-toolkit -version
NVIDIA Container Runtime Hook version 1.14.6
commit: 5605d191332dcfeea802c4497360d60a65c7887e

resource allocation works:

➜ src git:(main) kubectl view-allocations -r gpu
Alias tip: kub view-allocations -r gpu
Resource Requested Limit Allocatable Free
nvidia.com/gpu (50%) 4.0 (50%) 4.0 8.0 4.0
└─ nvidia (50%) 4.0 (50%) 4.0 8.0 4.0
├─ ollama-74fbf7d68b-lglf9 3.0 3.0 __ __
└─ whisper-api-68cc9d4565-s7wr7 1.0 1.0 __ __

OS

Linux

Architecture

amd64

Platform

Docker

Ollama version

latest

GPU

No response

GPU info

rtx 4070 super

CPU

AMD

Other software

kubernetes with k3S and nvidia driver install manualy on ubuntu 23.10

Originally created by @didlawowo on GitHub (Mar 18, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3211 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? i have cluster kubernetes with 4070 Super GPU On kubernetes cluster into container ollama doesn't detect gpu, but it work if i am running ollam directly on the node who have the gpu stream logs failed container "ollama" in pod "ollama-74fbf7d68b-lglf9" is waiting to start: ContainerCreating for ollama/ollama-74fbf7d68b-lglf9 (ollama) time=2024-03-18T03:00:29.503Z level=INFO source=images.go:806 msg="total blobs: 0" time=2024-03-18T03:00:29.515Z level=INFO source=images.go:813 msg="total unused blobs removed: 0" time=2024-03-18T03:00:29.515Z level=INFO source=routes.go:1110 msg="Listening on :11434 (version 0.1.29)" time=2024-03-18T03:00:29.516Z level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama2476510653/runners ..." time=2024-03-18T03:00:31.661Z level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [rocm_v60000 cpu_avx2 cpu cpu_avx cuda_v11]" time=2024-03-18T03:00:31.661Z level=INFO source=gpu.go:77 msg="Detecting GPU type" time=2024-03-18T03:00:31.661Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so" time=2024-03-18T03:00:31.668Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: ]" time=2024-03-18T03:00:31.668Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-03-18T03:00:31.668Z level=INFO source=routes.go:1133 msg="no GPU detected" ### What did you expect to see? i expect to see my gpu discovered by ollama, cause my gpu is correctly present in the cluster (python whisper work) ### Steps to reproduce deploy olama with helm on kubernetes with this parameter chart: ollama repoURL: https://otwld.github.io/ollama-helm/ targetRevision: 0.19.0 helm: values: | image: repository: fizzbuzz2/ollama tag: latest pullPolicy: Always imagePullSecrets: - name: registry-credentials runtimeClass: nvidia extraEnv: - name: NVIDIA_VISIBLE_DEVICES value: all - name: NVARCH value: x86_64 - name: NV_CUDA_CUDART_VERSION value: 12.3.2 - name: NVIDIA_DRIVER_CAPABILITIES value: all # extraArgs: # - --gpu=all autoscaling: enabled: true minReplicas: 1 maxReplicas: 2 targetCPUUtilizationPercentage: 80 targetMemoryUtilizationPercentage: 80 ollama: gpu: enabled: true number: 3 models: - mistral - codellama - llava ### Are there any recent changes that introduced the issue? never function before i have nvidia smi on the node, as you can see, the ollama local into the node is here, and the whisper from kubernetes too. but not the ollama from kubernetes. cluster@nvidia:~$ nvidia-smi Mon Mar 18 04:02:36 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 4070 ... Off | 00000000:05:00.0 Off | N/A | | 0% 42C P8 6W / 220W | 5048MiB / 12282MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 1201 C /usr/local/bin/ollama 454MiB | | 0 N/A N/A 61256 C /usr/bin/python3 4586MiB | +-----------------------------------------------------------------------------------------+ cluster@nvidia:~$ nvidia-container-toolkit -version NVIDIA Container Runtime Hook version 1.14.6 commit: 5605d191332dcfeea802c4497360d60a65c7887e resource allocation works: ➜ src git:(main) kubectl view-allocations -r gpu Alias tip: kub view-allocations -r gpu Resource Requested Limit Allocatable Free nvidia.com/gpu (50%) 4.0 (50%) 4.0 8.0 4.0 └─ nvidia (50%) 4.0 (50%) 4.0 8.0 4.0 ├─ ollama-74fbf7d68b-lglf9 3.0 3.0 __ __ └─ whisper-api-68cc9d4565-s7wr7 1.0 1.0 __ __ ### OS Linux ### Architecture amd64 ### Platform Docker ### Ollama version latest ### GPU _No response_ ### GPU info rtx 4070 super ### CPU AMD ### Other software kubernetes with k3S and nvidia driver install manualy on ubuntu 23.10
GiteaMirror added the dockerbugnvidia labels 2026-04-12 12:10:13 -05:00
Author
Owner

@dhiltgen commented on GitHub (Mar 20, 2024):

The line with msg="Discovered GPU libraries: ]" seems off. There should be a [ in that line as well. Setting OLLAMA_DEBUG=1 in the environment will increase logging and may help shed some light on what's going wrong.

In general, if the nvidia container toolkit is working properly, the nvidia management library is supposed to be mounted into the container from the host to match the driver version. If we're not able to find it, that implies something isn't getting mapped correctly and the toolkit thinks the GPU shouldn't be exposed to the container. We had a bug in prior versions where environment variables were missing which lead to a similar behavior. You might try adjusting to match just these values:

NVIDIA_VISIBLE_DEVICES=all
NVIDIA_DRIVER_CAPABILITIES=compute,utility
<!-- gh-comment-id:2009003395 --> @dhiltgen commented on GitHub (Mar 20, 2024): The line with `msg="Discovered GPU libraries: ]"` seems off. There should be a `[` in that line as well. Setting `OLLAMA_DEBUG=1` in the environment will increase logging and may help shed some light on what's going wrong. In general, if the nvidia container toolkit is working properly, the nvidia management library is supposed to be mounted into the container from the host to match the driver version. If we're not able to find it, that implies something isn't getting mapped correctly and the toolkit thinks the GPU shouldn't be exposed to the container. We had a bug in prior versions where environment variables were missing which lead to a similar behavior. You might try adjusting to match just these values: ``` NVIDIA_VISIBLE_DEVICES=all NVIDIA_DRIVER_CAPABILITIES=compute,utility ```
Author
Owner

@dhiltgen commented on GitHub (Apr 12, 2024):

If you're still having troubles getting it running on k8s, please give the suggestions above a try and let us know.

<!-- gh-comment-id:2052617708 --> @dhiltgen commented on GitHub (Apr 12, 2024): If you're still having troubles getting it running on k8s, please give the suggestions above a try and let us know.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#1979