[GH-ISSUE #14031] CUDA GPU not detected in current version (1.15.x, 1.14.x) (falls back to CPU) with NVIDIA MIG, works in 0.13.1 #55684

Closed
opened 2026-04-29 09:34:42 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @artyomb on GitHub (Feb 2, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14031

What is the issue?

Bug Description

Starting from ollama/ollama:0.13.2, Ollama fails to detect an NVIDIA GPU and falls back to CPU inference.
The same setup works correctly with ollama/ollama:0.13.1.

This appears to be a regression in CUDA/GPU discovery, likely related to changes in secondary device probing or filtering.
(3f30836734)


Expected Behavior

Ollama detects the NVIDIA GPU (MIG device) and reports:

inference compute id=GPU library=CUDA ...

Actual Behavior

Ollama reports only CPU:

inference compute id=cpu library=cpu
entering low vram mode

GPU is present and visible via nvidia-smi, but is filtered out during initialization.


Environment

  • Ollama versions

    • ollama/ollama:0.13.1 → GPU works
    • ollama/ollama:0.13.2 → GPU not detected
  • GPU: NVIDIA H100 NVL (MIG enabled, 1g.24gb)

  • Driver: NVIDIA 550+ (CUDA 13.x runtime)

  • OS: Ubuntu 24.04

  • Runtime: Docker with nvidia-container-runtime

  • Deployment: Docker Swarm (also reproducible in plain Docker)


GPU / MIG Status

nvidia-smi -L

Output:

GPU 0: NVIDIA H100 NVL (UUID: GPU-...)
  MIG 1g.24gb Device 0: (UUID: MIG-...)

Environment Variables

NVIDIA_VISIBLE_DEVICES=MIG-<uuid>
OLLAMA_LLM_LIBRARY=cuda_v13
OLLAMA_LIBRARY_PATH=/usr/lib/ollama/cuda_v13
LD_LIBRARY_PATH=/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
OLLAMA_DEBUG=1

Notes:

  • CUDA_VISIBLE_DEVICES is not set manually.
  • GPU visibility is managed by nvidia-container-runtime.

Logs (Key Difference)

0.13.1 (working):

inference compute id=GPU library=CUDA compute=9.0
description="NVIDIA H100 NVL MIG 1g.24gb"

0.13.2 (broken):

filtering device which didn't fully initialize
inference compute id=cpu library=cpu
entering low vram mode

Suspected Cause

Between v0.13.1 and v0.13.2, CUDA device discovery logic changed.
Additional filtering during secondary GPU discovery appears to incorrectly reject valid MIG devices.

Possibly related to:

  • CUDA_VISIBLE_DEVICES handling
  • MIG UUID vs GPU UUID filtering
  • Secondary probe rejecting devices requiring GGML_CUDA_INIT

Additional Notes

  • Issue is reproducible and deterministic.
  • Same host, same container config, same environment — only Ollama version differs.
  • Newer versions (≥0.14.x / ≥0.15.x) show similar “filtered / didn’t fully initialize” behavior.

Relevant log output


OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

No response

Originally created by @artyomb on GitHub (Feb 2, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14031 ### What is the issue? ## Bug Description Starting from `ollama/ollama:0.13.2`, Ollama fails to detect an NVIDIA GPU and falls back to CPU inference. The same setup works correctly with `ollama/ollama:0.13.1`. This appears to be a regression in CUDA/GPU discovery, likely related to changes in secondary device probing or filtering. (https://github.com/ollama/ollama/commit/3f3083673496adcc0429ff213dabb0c4fcbe21a2) --- ## Expected Behavior Ollama detects the NVIDIA GPU (MIG device) and reports: ``` inference compute id=GPU library=CUDA ... ``` --- ## Actual Behavior Ollama reports only CPU: ``` inference compute id=cpu library=cpu entering low vram mode ``` GPU is present and visible via `nvidia-smi`, but is filtered out during initialization. --- ## Environment - **Ollama versions** - ✅ `ollama/ollama:0.13.1` → GPU works - ❌ `ollama/ollama:0.13.2` → GPU not detected - **GPU**: NVIDIA H100 NVL (MIG enabled, 1g.24gb) - **Driver**: NVIDIA 550+ (CUDA 13.x runtime) - **OS**: Ubuntu 24.04 - **Runtime**: Docker with `nvidia-container-runtime` - **Deployment**: Docker Swarm (also reproducible in plain Docker) --- ## GPU / MIG Status ```bash nvidia-smi -L ``` Output: ``` GPU 0: NVIDIA H100 NVL (UUID: GPU-...) MIG 1g.24gb Device 0: (UUID: MIG-...) ``` --- ## Environment Variables ```env NVIDIA_VISIBLE_DEVICES=MIG-<uuid> OLLAMA_LLM_LIBRARY=cuda_v13 OLLAMA_LIBRARY_PATH=/usr/lib/ollama/cuda_v13 LD_LIBRARY_PATH=/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_DEBUG=1 ``` Notes: - `CUDA_VISIBLE_DEVICES` is **not** set manually. - GPU visibility is managed by `nvidia-container-runtime`. --- ## Logs (Key Difference) **0.13.1 (working):** ``` inference compute id=GPU library=CUDA compute=9.0 description="NVIDIA H100 NVL MIG 1g.24gb" ``` **0.13.2 (broken):** ``` filtering device which didn't fully initialize inference compute id=cpu library=cpu entering low vram mode ``` --- ## Suspected Cause Between `v0.13.1` and `v0.13.2`, CUDA device discovery logic changed. Additional filtering during secondary GPU discovery appears to incorrectly reject valid MIG devices. Possibly related to: - `CUDA_VISIBLE_DEVICES` handling - MIG UUID vs GPU UUID filtering - Secondary probe rejecting devices requiring `GGML_CUDA_INIT` --- ## Additional Notes - Issue is reproducible and deterministic. - Same host, same container config, same environment — only Ollama version differs. - Newer versions (≥0.14.x / ≥0.15.x) show similar “filtered / didn’t fully initialize” behavior. ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU AMD ### Ollama version _No response_
GiteaMirror added the bug label 2026-04-29 09:34:42 -05:00
Author
Owner

@rick-github commented on GitHub (Feb 2, 2026):

https://github.com/ollama/ollama/pull/13851

<!-- gh-comment-id:3835834252 --> @rick-github commented on GitHub (Feb 2, 2026): https://github.com/ollama/ollama/pull/13851
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#55684