[GH-ISSUE #7593] GPU Initialization Fails with Low Utilization on NVIDIA RTX 3090 in Docker #51353

Closed
opened 2026-04-28 19:38:53 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @PrepperShepherd on GitHub (Nov 10, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/7593

What is the issue?

Issue Description:

I am encountering issues with GPU initialization and utilization when running the Ollama Docker container on an NVIDIA RTX 3090. After following the official setup guide for NVIDIA GPU support, including configuring nvidia-docker, GPU usage remains minimal, even when running more complex models like Llama 2. Here are the detailed steps taken and observed results:

Steps Taken:
Installed NVIDIA Driver and Container Toolkit:

NVIDIA Driver Version: 535.216.01
CUDA Version: 12.2
Confirmed correct GPU setup with nvidia-smi on the host.
Configured Docker for NVIDIA GPU Access:

Used nvidia-ctk runtime configure --runtime=docker to set the NVIDIA runtime as default for Docker, followed by a Docker service restart.
Launched the Ollama Docker Container:

Command used:
bash
Copy code
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
Verified GPU Access Inside the Container:

Running nvidia-smi inside the container shows the GPU is detected. However, utilization remains very low when running models, with memory usage but minimal compute utilization.
Tested with Llama 2 and Other Models:

Despite running Llama 2, GPU utilization hovers at or near 0%, with most processes defaulting to CPU.
Debug Information:
Using OLLAMA_DEBUG=1 revealed that while libraries such as libcuda.so and libcudart.so are detected, initialization errors (cuInit err: 999) still occur.
The debug logs also report cudart init failure: 999, indicating potential issues with CUDA initialization or library compatibility.
Expected Outcome:
A smooth initialization and high utilization of the GPU to maximize the performance benefits of the NVIDIA RTX 3090 when running large models.

Questions for Developers:
Are there additional configurations or dependencies to maximize GPU utilization with Ollama, particularly for Dockerized environments?
Does Ollama support or recommend specific CUDA versions for optimal Docker GPU performance?
Any specific parameters or environment variables we should set to ensure the model offloads compute tasks to the GPU?
Thank you in advance for any guidance or suggestions. Your help would be greatly appreciated in diagnosing and optimizing GPU usage for this setup.

OS

Linux, Docker

GPU

Nvidia

CPU

AMD

Ollama version

No response

Originally created by @PrepperShepherd on GitHub (Nov 10, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/7593 ### What is the issue? Issue Description: I am encountering issues with GPU initialization and utilization when running the Ollama Docker container on an NVIDIA RTX 3090. After following the official setup guide for NVIDIA GPU support, including configuring nvidia-docker, GPU usage remains minimal, even when running more complex models like Llama 2. Here are the detailed steps taken and observed results: Steps Taken: Installed NVIDIA Driver and Container Toolkit: NVIDIA Driver Version: 535.216.01 CUDA Version: 12.2 Confirmed correct GPU setup with nvidia-smi on the host. Configured Docker for NVIDIA GPU Access: Used nvidia-ctk runtime configure --runtime=docker to set the NVIDIA runtime as default for Docker, followed by a Docker service restart. Launched the Ollama Docker Container: Command used: bash Copy code docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama Verified GPU Access Inside the Container: Running nvidia-smi inside the container shows the GPU is detected. However, utilization remains very low when running models, with memory usage but minimal compute utilization. Tested with Llama 2 and Other Models: Despite running Llama 2, GPU utilization hovers at or near 0%, with most processes defaulting to CPU. Debug Information: Using OLLAMA_DEBUG=1 revealed that while libraries such as libcuda.so and libcudart.so are detected, initialization errors (cuInit err: 999) still occur. The debug logs also report cudart init failure: 999, indicating potential issues with CUDA initialization or library compatibility. Expected Outcome: A smooth initialization and high utilization of the GPU to maximize the performance benefits of the NVIDIA RTX 3090 when running large models. Questions for Developers: Are there additional configurations or dependencies to maximize GPU utilization with Ollama, particularly for Dockerized environments? Does Ollama support or recommend specific CUDA versions for optimal Docker GPU performance? Any specific parameters or environment variables we should set to ensure the model offloads compute tasks to the GPU? Thank you in advance for any guidance or suggestions. Your help would be greatly appreciated in diagnosing and optimizing GPU usage for this setup. ### OS Linux, Docker ### GPU Nvidia ### CPU AMD ### Ollama version _No response_
GiteaMirror added the bug label 2026-04-28 19:38:53 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#51353