[GH-ISSUE #6928] error looking up nvidia GPU memory - intermittent "cuda driver library failed to get device context 800" #66432

New Issue

GiteaMirror · 2026-05-04T05:08:33-05:00

GiteaMirror commented

2026-05-04 05:08:33 -05:00

Originally created by @championcp on GitHub (Sep 24, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6928

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

I've been running Ollama using the official Docker image, and everything was working fine initially. However, after a while (sometimes a dozen hours, sometimes a few days), Ollama logs showed the following error. Could you please advise on how to resolve this?

log

cuda driver library failed to get device context 800time=2024-09-24T00:41:06.577Z level=WARN source=gpu.go:400 msg="error looking up nvidia GPU memory"
time=2024-09-24T00:41:06.823Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.504949612 model=/root/.ollama/models/blobs/sha256-60b185bbd0004312d5d4e3343d177b9cc049c1422629b9b96878a75f7bcf7fd3

OS

Docker

GPU

Nvidia

CPU

Intel

Ollama version

0.3.10

Originally created by @championcp on GitHub (Sep 24, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6928 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? I've been running Ollama using the official Docker image, and everything was working fine initially. However, after a while (sometimes a dozen hours, sometimes a few days), Ollama logs showed the following error. Could you please advise on how to resolve this? log ``` cuda driver library failed to get device context 800time=2024-09-24T00:41:06.577Z level=WARN source=gpu.go:400 msg="error looking up nvidia GPU memory" time=2024-09-24T00:41:06.823Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.504949612 model=/root/.ollama/models/blobs/sha256-60b185bbd0004312d5d4e3343d177b9cc049c1422629b9b96878a75f7bcf7fd3 ``` ### OS Docker ### GPU Nvidia ### CPU Intel ### Ollama version 0.3.10

GiteaMirror added the nvidia bug docker needs more info labels 2026-05-04 05:08:35 -05:00

GiteaMirror closed this issue

2026-05-04 05:08:43 -05:00

GiteaMirror commented

2026-05-04 05:08:47 -05:00

@rick-github commented on GitHub (Sep 24, 2024):

CUDA error code 800 is CUDA_ERROR_NOT_PERMITTED. Does restarting the container restore operation or do you have to do something else (eg, reboot or run some nvidia command)? Is there anything in the system logs (dmesg, /var/log/syslog, /var/log/kern.log, etc) that indicates anything unusual with the nvidia devices?

@rick-github commented on GitHub (Sep 24, 2024): CUDA error code 800 is [`CUDA_ERROR_NOT_PERMITTED`](https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TYPES.html#:~:text=CUDA_ERROR_NOT_PERMITTED%20%3D%20800). Does restarting the container restore operation or do you have to do something else (eg, reboot or run some nvidia command)? Is there anything in the system logs (`dmesg`, `/var/log/syslog`, `/var/log/kern.log`, etc) that indicates anything unusual with the nvidia devices?

GiteaMirror commented

2026-05-04 05:08:55 -05:00

@championcp commented on GitHub (Sep 24, 2024):

CUDA error code 800 is CUDA_ERROR_NOT_PERMITTED. Does restarting the container restore operation or do you have to do something else (eg, reboot or run some nvidia command)? Is there anything in the system logs (dmesg, /var/log/syslog, /var/log/kern.log, etc) that indicates anything unusual with the nvidia devices?

Thx for your reply.

I found another program also using the gpu. Restarting the Ollama container resolved the problem, but I couldn't find any specific Nvidia-related errors in the system logs.

@championcp commented on GitHub (Sep 24, 2024): > CUDA error code 800 is [`CUDA_ERROR_NOT_PERMITTED`](https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TYPES.html#:~:text=CUDA_ERROR_NOT_PERMITTED%20%3D%20800). Does restarting the container restore operation or do you have to do something else (eg, reboot or run some nvidia command)? Is there anything in the system logs (`dmesg`, `/var/log/syslog`, `/var/log/kern.log`, etc) that indicates anything unusual with the nvidia devices? Thx for your reply. I found another program also using the gpu. Restarting the Ollama container resolved the problem, but I couldn't find any specific Nvidia-related errors in the system logs. ![image](https://github.com/user-attachments/assets/cfe680b5-0dcb-48da-a5ea-85f50346c0f1)

GiteaMirror commented

2026-05-04 05:09:02 -05:00

@coharms commented on GitHub (Oct 6, 2024):

In my experience this happens preferably (but not exclusively) after a hard restart of the underlying machine (in my case no virtualization in between).

@coharms commented on GitHub (Oct 6, 2024): In my experience this happens preferably (but not exclusively) after a hard restart of the underlying machine (in my case no virtualization in between).

GiteaMirror commented

2026-05-04 05:09:09 -05:00

@dhiltgen commented on GitHub (Nov 7, 2024):

I've posted a new PR documenting a workaround some users are seeing success with for a slightly different failure mode, but it might be helpful in these cases as well. If you are experiencing the sporadic 800, please give it a try and let us know if it resolves the problem.

https://github.com/ollama/ollama/pull/7519

@dhiltgen commented on GitHub (Nov 7, 2024): I've posted a new PR documenting a workaround some users are seeing success with for a slightly different failure mode, but it might be helpful in these cases as well. If you are experiencing the sporadic 800, please give it a try and let us know if it resolves the problem. https://github.com/ollama/ollama/pull/7519

GiteaMirror commented

2026-05-04 05:09:18 -05:00

@spacegray-ji commented on GitHub (Jan 13, 2025):

What is the issue?

I've been running Ollama using the official Docker image, and everything was working fine initially. However, after a while (sometimes a dozen hours, sometimes a few days), Ollama logs showed the following error. Could you please advise on how to resolve this?

log
cuda driver library failed to get device context 800time=2024-09-24T00:41:06.577Z level=WARN source=gpu.go:400 msg="error looking up nvidia GPU memory"
time=2024-09-24T00:41:06.823Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.504949612 model=/root/.ollama/models/blobs/sha256-60b185bbd0004312d5d4e3343d177b9cc049c1422629b9b96878a75f7bcf7fd3
OS

Docker

GPU

Nvidia

CPU

Intel

Ollama version

0.3.10

This issue is likely related to Docker configuration rather than an issue with Ollama itself.

If the host system uses systemd to manage the cgroups of Docker containers, unit files referencing the Nvidia GPU may be reloaded. When these unit files are reloaded, Docker containers lose access to the reloaded unit files, which can lead to CUDA errors in the Docker container running Ollama. While restarting the container may temporarily resolve the issue, the container will lose GPU access again every time the unit files are reloaded.

Here are two possible solutions:

Modify the cgroup setting in nvidia-container-runtime/config.toml on the host machine

# Open /etc/nvidia-container-runtime/config.toml with a text editor (vim or nano)
sudo vim /etc/nvidia-container-runtime/config.toml

# Add the following configuration
no-cgroups = false

# Restart Docker
sudo systemctl restart docker

Change the container resource management driver in Docker daemon to cgroupfs

# Open /etc/docker/daemon.json with a text editor (vim or nano)
sudo vim /etc/docker/daemon.json

# Add the following configuration
"exec-opts": ["native.cgroupdriver=cgroupfs"]

# Restart Docker
sudo systemctl restart docker

Note: If your host machine primarily uses systemd, solution 2 may cause conflicts with other programs. Therefore, solution 1 is recommended.

@spacegray-ji commented on GitHub (Jan 13, 2025): > ### What is the issue? > I've been running Ollama using the official Docker image, and everything was working fine initially. However, after a while (sometimes a dozen hours, sometimes a few days), Ollama logs showed the following error. Could you please advise on how to resolve this? > > log > > ``` > cuda driver library failed to get device context 800time=2024-09-24T00:41:06.577Z level=WARN source=gpu.go:400 msg="error looking up nvidia GPU memory" > time=2024-09-24T00:41:06.823Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.504949612 model=/root/.ollama/models/blobs/sha256-60b185bbd0004312d5d4e3343d177b9cc049c1422629b9b96878a75f7bcf7fd3 > ``` > > ### OS > Docker > > ### GPU > Nvidia > > ### CPU > Intel > > ### Ollama version > 0.3.10 This issue is likely related to `Docker` configuration rather than an issue with `Ollama` itself. If the host system uses **systemd** to manage the cgroups of Docker containers, unit files referencing the Nvidia GPU may be reloaded. When these unit files are reloaded, Docker containers lose access to the reloaded unit files, which can lead to CUDA errors in the Docker container running Ollama. While restarting the container may temporarily resolve the issue, the container will lose GPU access again every time the unit files are reloaded. Here are two possible solutions: 1. **Modify the cgroup setting in `nvidia-container-runtime/config.toml` on the host machine** ```bash # Open /etc/nvidia-container-runtime/config.toml with a text editor (vim or nano) sudo vim /etc/nvidia-container-runtime/config.toml # Add the following configuration no-cgroups = false # Restart Docker sudo systemctl restart docker ``` 2. **Change the container resource management driver in Docker daemon to `cgroupfs`** ```bash # Open /etc/docker/daemon.json with a text editor (vim or nano) sudo vim /etc/docker/daemon.json # Add the following configuration "exec-opts": ["native.cgroupdriver=cgroupfs"] # Restart Docker sudo systemctl restart docker ``` **Note:** If your host machine primarily uses **systemd**, `solution 2` may cause conflicts with other programs. Therefore, `solution 1` is recommended.

GiteaMirror commented

2026-05-04 05:09:26 -05:00

@stronk7 commented on GitHub (Feb 9, 2025):

Should this be closed now that the Option2 has been documented @ #7519 ?

Or should Option1, as commented by @spacegray-ji, the one to recommend?

Ciao :-)

PS: I'm going to try Option1 here, as far as we are facing this problem every few weeks.

@stronk7 commented on GitHub (Feb 9, 2025): Should this be closed now that the Option2 has been documented @ #7519 ? Or should Option1, as commented by @spacegray-ji, the one to recommend? Ciao :-) PS: I'm going to try Option1 here, as far as we are facing this problem every few weeks.

GiteaMirror commented

2026-05-04 05:09:36 -05:00

@mrCR100 commented on GitHub (Feb 19, 2025):

@spacegray-ji
hi! I wonder what does "no-cgroups = false" mean? Why this configuration solves the problem?^-^

@mrCR100 commented on GitHub (Feb 19, 2025): @spacegray-ji hi! I wonder what does "no-cgroups = false" mean? Why this configuration solves the problem?^-^

GiteaMirror commented

2026-05-04 05:09:41 -05:00

@spacegray-ji commented on GitHub (Feb 20, 2025):

@spacegray-ji hi! I wonder what does "no-cgroups = false" mean? Why this configuration solves the problem?^-^

The configuration option no-cgroups = false in /etc/nvidia-container-runtime/config.toml controls whether the NVIDIA container runtime should use cgroups for managing GPU resources.

Docker uses cgroups by default to manage container resources, including CPU, memory, and GPU access. When running containers with NVIDIA GPUs, cgroups also play a role in controlling GPU access.

However, if systemd reloads certain unit files, it can reset cgroup configurations, causing Docker containers to lose access to the GPU.

@spacegray-ji commented on GitHub (Feb 20, 2025): > [@spacegray-ji](https://github.com/spacegray-ji) hi! I wonder what does "no-cgroups = false" mean? Why this configuration solves the problem?^-^ The configuration option `no-cgroups = false` in `/etc/nvidia-container-runtime/config.toml` controls whether the NVIDIA container runtime should use `cgroups` for managing GPU resources. Docker uses `cgroups` by default to manage container resources, including CPU, memory, and GPU access. When running containers with NVIDIA GPUs, cgroups also play a role in controlling GPU access. However, if `systemd` reloads certain unit files, it can reset `cgroup` configurations, causing Docker containers to lose access to the GPU.

GiteaMirror commented

2026-05-04 05:09:50 -05:00

@nanggn commented on GitHub (Mar 5, 2025):

I also

@spacegray-ji hi! I wonder what does "no-cgroups = false" mean? Why this configuration solves the problem?^-^

The configuration option no-cgroups = false in /etc/nvidia-container-runtime/config.toml controls whether the NVIDIA container runtime should use cgroups for managing GPU resources.

Docker uses cgroups by default to manage container resources, including CPU, memory, and GPU access. When running containers with NVIDIA GPUs, cgroups also play a role in controlling GPU access.

However, if systemd reloads certain unit files, it can reset cgroup configurations, causing Docker containers to lose access to the GPU.

After several hours of normal operation of the Ollama container, I also encountered this issue. I have tried the recommended option 1, but there are still problems. How can I solve them?

@nanggn commented on GitHub (Mar 5, 2025): I also > > [@spacegray-ji](https://github.com/spacegray-ji) hi! I wonder what does "no-cgroups = false" mean? Why this configuration solves the problem?^-^ > > The configuration option `no-cgroups = false` in `/etc/nvidia-container-runtime/config.toml` controls whether the NVIDIA container runtime should use `cgroups` for managing GPU resources. > > Docker uses `cgroups` by default to manage container resources, including CPU, memory, and GPU access. When running containers with NVIDIA GPUs, cgroups also play a role in controlling GPU access. > > However, if `systemd` reloads certain unit files, it can reset `cgroup` configurations, causing Docker containers to lose access to the GPU. After several hours of normal operation of the Ollama container, I also encountered this issue. I have tried the recommended option 1, but there are still problems. How can I solve them?

GiteaMirror commented

2026-05-04 05:09:53 -05:00

@spacegray-ji commented on GitHub (Mar 5, 2025):

I also

@spacegray-ji hi! I wonder what does "no-cgroups = false" mean? Why this configuration solves the problem?^-^

The configuration option no-cgroups = false in /etc/nvidia-container-runtime/config.toml controls whether the NVIDIA container runtime should use cgroups for managing GPU resources.
Docker uses cgroups by default to manage container resources, including CPU, memory, and GPU access. When running containers with NVIDIA GPUs, cgroups also play a role in controlling GPU access.
However, if systemd reloads certain unit files, it can reset cgroup configurations, causing Docker containers to lose access to the GPU.

After several hours of normal operation of the Ollama container, I also encountered this issue. I have tried the recommended option 1, but there are still problems. How can I solve them?

If you still experience unstable GPU access in the Ollama container for unknown reasons, you might consider an alternative approach: keeping the LLM model loaded on the GPU continuously. The following command allows the model to remain in GPU memory without the need for reloading:

docker exec -it {OLLAMA_CONTAINER_NAME} ollama run {MODEL} --keepalive=999999h

(Please refer to #6401 for continuously keeping the embedding model loaded on the GPU)

Also, checking the configuration in your daemon.json file can help diagnose the issue. Could you please share the specific settings in this file?

@spacegray-ji commented on GitHub (Mar 5, 2025): > I also > > > > [@spacegray-ji](https://github.com/spacegray-ji) hi! I wonder what does "no-cgroups = false" mean? Why this configuration solves the problem?^-^ > > > > > > The configuration option `no-cgroups = false` in `/etc/nvidia-container-runtime/config.toml` controls whether the NVIDIA container runtime should use `cgroups` for managing GPU resources. > > Docker uses `cgroups` by default to manage container resources, including CPU, memory, and GPU access. When running containers with NVIDIA GPUs, cgroups also play a role in controlling GPU access. > > However, if `systemd` reloads certain unit files, it can reset `cgroup` configurations, causing Docker containers to lose access to the GPU. > > After several hours of normal operation of the Ollama container, I also encountered this issue. I have tried the recommended option 1, but there are still problems. How can I solve them? If you still experience unstable GPU access in the Ollama container for unknown reasons, you might consider an alternative approach: keeping the LLM model loaded on the GPU continuously. The following command allows the model to remain in GPU memory without the need for reloading: ```bash docker exec -it {OLLAMA_CONTAINER_NAME} ollama run {MODEL} --keepalive=999999h ``` (Please refer to #6401 for continuously keeping the embedding model loaded on the GPU) Also, checking the configuration in your `daemon.json` file can help diagnose the issue. Could you please share the specific settings in this file?

GiteaMirror commented

2026-05-04 05:10:00 -05:00

@tringler commented on GitHub (Mar 17, 2025):

@spacegray-ji hi! I wonder what does "no-cgroups = false" mean? Why this configuration solves the problem?^-^

The configuration option no-cgroups = false in /etc/nvidia-container-runtime/config.toml controls whether the NVIDIA container runtime should use cgroups for managing GPU resources.

Docker uses cgroups by default to manage container resources, including CPU, memory, and GPU access. When running containers with NVIDIA GPUs, cgroups also play a role in controlling GPU access.

However, if systemd reloads certain unit files, it can reset cgroup configurations, causing Docker containers to lose access to the GPU.

I'm also faced to that issue. I set no-cgroups = false on /etc/nvidia-container-runtime/config.toml and restarted crio, but I'm faced to that issue again and again. --keepalive=999999h helps, but if the container is restarted I'm running into the same issue.

I'm using crio on rhel, which is managing cgroups with systemd.

Is there anything else I could try?

@tringler commented on GitHub (Mar 17, 2025): > > [@spacegray-ji](https://github.com/spacegray-ji) hi! I wonder what does "no-cgroups = false" mean? Why this configuration solves the problem?^-^ > > The configuration option `no-cgroups = false` in `/etc/nvidia-container-runtime/config.toml` controls whether the NVIDIA container runtime should use `cgroups` for managing GPU resources. > > Docker uses `cgroups` by default to manage container resources, including CPU, memory, and GPU access. When running containers with NVIDIA GPUs, cgroups also play a role in controlling GPU access. > > However, if `systemd` reloads certain unit files, it can reset `cgroup` configurations, causing Docker containers to lose access to the GPU. I'm also faced to that issue. I set `no-cgroups = false` on `/etc/nvidia-container-runtime/config.toml` and restarted crio, but I'm faced to that issue again and again. `--keepalive=999999h` helps, but if the container is restarted I'm running into the same issue. I'm using crio on rhel, which is managing cgroups with systemd. Is there anything else I could try?

GiteaMirror commented

2026-05-04 05:10:06 -05:00

@Fade78 commented on GitHub (Mar 28, 2025):

In my experience this happens preferably (but not exclusively) after a hard restart of the underlying machine (in my case no virtualization in between).

It happens to me even on bare metal. I use a docker container for ollama.

@Fade78 commented on GitHub (Mar 28, 2025): > In my experience this happens preferably (but not exclusively) after a hard restart of the underlying machine (in my case no virtualization in between). It happens to me even on bare metal. I use a docker container for ollama.

GiteaMirror commented

2026-05-04 05:10:07 -05:00

@Fade78 commented on GitHub (Mar 28, 2025):

I follow this issue closely it happens for me on bare metal or VM, with docker containers in ubuntu OS. The ollama version I use is 0.6.2 and my nvidia drivers are 570.86.15.

@Fade78 commented on GitHub (Mar 28, 2025): I follow this issue closely it happens for me on bare metal or VM, with docker containers in ubuntu OS. The ollama version I use is 0.6.2 and my nvidia drivers are 570.86.15.

GiteaMirror commented

2026-05-04 05:10:13 -05:00

@tringler commented on GitHub (Mar 28, 2025):

I'm using nvidia grid on an ESXi RHEL VM with crio as container Engine in a Vanilla K8S environment, so I guess it's somewhere between nvidia container runtime and nvidia driver.

@tringler commented on GitHub (Mar 28, 2025): I'm using nvidia grid on an ESXi RHEL VM with crio as container Engine in a Vanilla K8S environment, so I guess it's somewhere between nvidia container runtime and nvidia driver.

GiteaMirror commented

2026-05-04 05:10:18 -05:00

@rick-github commented on GitHub (Mar 28, 2025):

@Fade78 It's not clear from you comment, does neither of the options in https://github.com/ollama/ollama/issues/6928#issuecomment-2586208913 work for you?

@rick-github commented on GitHub (Mar 28, 2025): @Fade78 It's not clear from you comment, does neither of the options in https://github.com/ollama/ollama/issues/6928#issuecomment-2586208913 work for you?

GiteaMirror commented

2026-05-04 05:10:25 -05:00

@aquananu commented on GitHub (Apr 6, 2025):

I faced same issue after updating the toolkit and the drivers got updated creating a mismatch.

a system reboot solved the issue

@aquananu commented on GitHub (Apr 6, 2025): I faced same issue after updating the toolkit and the drivers got updated creating a mismatch. a system reboot solved the issue

GiteaMirror commented

2026-05-04 05:10:32 -05:00

@Fade78 commented on GitHub (Apr 11, 2025):

@Fade78 It's not clear from you comment, does neither of the options in #6928 (comment) work for you?

I tumbled on the FAQ and applied only the change in the daemon.json. It corrected the problem.

After that, I also found a mode called "persistent" in the drivers of nvidia and I'm curious to know if I could enable that and remove the daemon.json modification or if it's unrelated. Do you have any idea if it's the root of the problem?

@Fade78 commented on GitHub (Apr 11, 2025): > [@Fade78](https://github.com/Fade78) It's not clear from you comment, does neither of the options in [#6928 (comment)](https://github.com/ollama/ollama/issues/6928#issuecomment-2586208913) work for you? I tumbled on the FAQ and applied only the change in the daemon.json. **It corrected the problem.** After that, I also found a mode called "persistent" in the drivers of nvidia and I'm curious to know if I could enable that and remove the daemon.json modification or if it's unrelated. Do you have any idea if it's the root of the problem?

GiteaMirror commented

2026-05-04 05:10:37 -05:00

@qhaas commented on GitHub (Jul 24, 2025):

@Fade78 It's not clear from you comment, does neither of the options in #6928 (comment) work for you?

Solution 1 did not work for me, but Solution 2 seems to be holding up. I don't understand why this workaround is needed for ollama. I have other GPU enabled services running in docker (e.g. TF2 based workflows) that do not require it.

@qhaas commented on GitHub (Jul 24, 2025): > [@Fade78](https://github.com/Fade78) It's not clear from you comment, does neither of the options in [#6928 (comment)](https://github.com/ollama/ollama/issues/6928#issuecomment-2586208913) work for you? Solution 1 did not work for me, but Solution 2 seems to be holding up. I don't understand why this workaround is needed for ollama. I have other GPU enabled services running in docker (e.g. TF2 based workflows) that do not require it.

GiteaMirror commented

2026-05-04 05:10:41 -05:00

@dhiltgen commented on GitHub (Jul 31, 2025):

I don't understand why this workaround is needed for ollama. I have other GPU enabled services running in docker (e.g. TF2 based workflows) that do not require it.

I don't know for certain, but our GPU usage pattern is somewhat unique in that we load/unload the GPU libraries periodically and when Ollama is idle, no GPU libraries are loaded. The intent is to let the GPU go back to a low-power state when not in use, and keeping the libraries loaded and sessions active keeps the GPU "powered up". Other apps often keep the GPU bound continuously.

@dhiltgen commented on GitHub (Jul 31, 2025): > I don't understand why this workaround is needed for ollama. I have other GPU enabled services running in docker (e.g. TF2 based workflows) that do not require it. I don't know for certain, but our GPU usage pattern is somewhat unique in that we load/unload the GPU libraries periodically and when Ollama is idle, no GPU libraries are loaded. The intent is to let the GPU go back to a low-power state when not in use, and keeping the libraries loaded and sessions active keeps the GPU "powered up". Other apps often keep the GPU bound continuously.

GiteaMirror commented

2026-05-04 05:10:46 -05:00

@zyberwoof commented on GitHub (Sep 8, 2025):

I'm toying around with a method for mitigating this issue, and I'm posting my idea here in case it helps anyone. In a nutshell, I've created a Health Check for my Docker container that detects when nvidia-smi fails. This marks the container as unhealthy. From there another container, script, or application see that status and restart the container.

I apologize in advance for the amateur craftsmanship.

Health Check for Docker

I've added a Health Check to my Docker compose file. See the healthcheck section below:

services:
  p40:
    image: ollama/ollama:latest
    pull_policy: always
    tty: true
    restart: unless-stopped
    ports:
      - 11434:11434
    runtime: nvidia
    env_file:
      - .env
      - ollama-p40.env
    healthcheck:
      test: nvidia-smi && exit 0 || exit 1
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s

Normally the nvidia-smi command works fine within the Docker container. But when the issue occurs, it throws an error for me. This healthcheck section checks for the command to fail every 30 seconds. If it fails 3 times in a row, Docker marks the container as unhealthy. Once you add a healthcheck like this, you can see whether the container is healthy or unhealthy by running docker ps.

This in itself isn't meant to directly solve the problem. Though, it might help keep since running nvidia-smi might help keep the GPU slightly more active. But with the container marked as unhealthy, you now have multiple ways of mitigating the container automatically. One would be to use another container like autoheal to automatically restart the container. Another would be to run a script with cron or as a service that periodically parses docker ps and checks for unhealthy containers.

I personally made a script that loops endlessly and unit file so that I could run it as a daemon. Good or bad, I'm sharing my work below. Keep in mind that I've been using this for less than a day. Not enough time to verify that it works well.

Background script

Instructions below assume this file is saved as /usr/local/bin/heal_docker_containers.sh

#!/bin/bash
#
# heal_docker_containers.sh
#
# Created 2025 09 07
#
# This script monitors for unhealthy docker containers.  If one is found, this script
# automatically restarts the container.
#
# Containers that are restarted are logged in syslog.  The messages will contain the
# containers' names and IDs, and the messages will be tagged with the name of this
# script.

CHECK_CONTAINER_STATUS_FREQUENCY_SECONDS=60
LOGGER_PRIORITY="local1.warn"
LOGGER_TAG="$(basename "$0")"

function restart_unhealthy_containers () {

        # Get a list of unhealthy containers.  IDs only.
        unhealthy_containers=$(docker ps --quiet --filter health=unhealthy)

        # Loop through the unhealthy containers and restart them
        for container_id in $unhealthy_containers; do

                container_name=$(docker ps --filter "id=$container_id" --format "{{.Names}}")

                message="Restarting unhealthy container $container_name with ID [$container_id]"

                logger --tag "$LOGGER_TAG" --priority "$LOGGER_PRIORITY" "$message"

                docker restart $container_id > /dev/null

        done

}  # restart_unhealthy_containers


while "true"; do

        restart_unhealthy_containers
        sleep $CHECK_CONTAINER_STATUS_FREQUENCY_SECONDS

done

Unit File

Instructions below assume this file is saved as /etc/systemd/system/heal-docker-containers.service

[Unit]
Description=Restarts unhealthy Docker containers.
After=docker.service

[Service]
ExecStart=/bin/bash /usr/local/bin/heal_docker_containers.sh
Type=simple
Restart=always

[Install]
WantedBy=docker.service

Installation

The steps below install the service to restart any and all unhealthy containers.

Create the script and unit files above.
Start the service with sudo systemctl start heal-docker-containers.service
Configure the service to start automatically on boot with sudo systemctl enable heal-docker-containers.service

Uninstall

Configure the service to NOT start automatically on boot with sudo systemctl disable heal-docker-containers.service
Stop the service with sudo systemctl stop heal-docker-containers.service
OPTIONAL: Remove the files heal_docker_containers.sh and heal-docker-containers.service

Check logs for which containers have been restarted

The command below should show the entries for each time this process restarts a container.
journalctl -t "heal_docker_containers.sh"

@zyberwoof commented on GitHub (Sep 8, 2025): I'm toying around with a method for mitigating this issue, and I'm posting my idea here in case it helps anyone. In a nutshell, I've created a Health Check for my Docker container that detects when nvidia-smi fails. This marks the container as unhealthy. From there another container, script, or application see that status and restart the container. _I apologize in advance for the amateur craftsmanship._ # Health Check for Docker I've added a Health Check to my Docker compose file. See the healthcheck section below: ``` services: p40: image: ollama/ollama:latest pull_policy: always tty: true restart: unless-stopped ports: - 11434:11434 runtime: nvidia env_file: - .env - ollama-p40.env healthcheck: test: nvidia-smi && exit 0 || exit 1 interval: 30s timeout: 10s retries: 3 start_period: 10s ``` Normally the `nvidia-smi` command works fine within the Docker container. But when the issue occurs, it throws an error for me. This healthcheck section checks for the command to fail every 30 seconds. If it fails 3 times in a row, Docker marks the container as unhealthy. Once you add a healthcheck like this, you can see whether the container is healthy or unhealthy by running `docker ps`. This in itself isn't meant to directly solve the problem. Though, it might help keep since running `nvidia-smi` might help keep the GPU slightly more active. But with the container marked as unhealthy, you now have multiple ways of mitigating the container automatically. One would be to use another container like [autoheal](https://hub.docker.com/r/willfarrell/autoheal/) to automatically restart the container. Another would be to run a script with cron or as a service that periodically parses docker ps and checks for unhealthy containers. I personally made a script that loops endlessly and unit file so that I could run it as a daemon. Good or bad, I'm sharing my work below. _Keep in mind that I've been using this for less than a day. Not enough time to verify that it works well._ # Background script *Instructions below assume this file is saved as /usr/local/bin/heal_docker_containers.sh* ``` #!/bin/bash # # heal_docker_containers.sh # # Created 2025 09 07 # # This script monitors for unhealthy docker containers. If one is found, this script # automatically restarts the container. # # Containers that are restarted are logged in syslog. The messages will contain the # containers' names and IDs, and the messages will be tagged with the name of this # script. CHECK_CONTAINER_STATUS_FREQUENCY_SECONDS=60 LOGGER_PRIORITY="local1.warn" LOGGER_TAG="$(basename "$0")" function restart_unhealthy_containers () { # Get a list of unhealthy containers. IDs only. unhealthy_containers=$(docker ps --quiet --filter health=unhealthy) # Loop through the unhealthy containers and restart them for container_id in $unhealthy_containers; do container_name=$(docker ps --filter "id=$container_id" --format "{{.Names}}") message="Restarting unhealthy container $container_name with ID [$container_id]" logger --tag "$LOGGER_TAG" --priority "$LOGGER_PRIORITY" "$message" docker restart $container_id > /dev/null done } # restart_unhealthy_containers while "true"; do restart_unhealthy_containers sleep $CHECK_CONTAINER_STATUS_FREQUENCY_SECONDS done ``` # Unit File *Instructions below assume this file is saved as /etc/systemd/system/heal-docker-containers.service* ``` [Unit] Description=Restarts unhealthy Docker containers. After=docker.service [Service] ExecStart=/bin/bash /usr/local/bin/heal_docker_containers.sh Type=simple Restart=always [Install] WantedBy=docker.service ``` # Installation _The steps below install the service to restart any and all unhealthy containers._ 1. Create the script and unit files above. 2. Start the service with `sudo systemctl start heal-docker-containers.service` 3. Configure the service to start automatically on boot with `sudo systemctl enable heal-docker-containers.service` # Uninstall 1. Configure the service to NOT start automatically on boot with `sudo systemctl disable heal-docker-containers.service` 2. Stop the service with `sudo systemctl stop heal-docker-containers.service` 3. OPTIONAL: Remove the files heal_docker_containers.sh and heal-docker-containers.service # Check logs for which containers have been restarted *The command below should show the entries for each time this process restarts a container.* `journalctl -t "heal_docker_containers.sh"`

GiteaMirror commented

2026-05-04 05:10:52 -05:00

@tjwebb commented on GitHub (Sep 10, 2025):

also running into this. after running for 6-8 hours, my logs are filled with this:

cuda driver library failed to get device context 800time=2025-09-10T12:46:15.060Z level=WARN source=gpu.go:436 msg="error looking up nvidia GPU memory"

and then finally

time=2025-09-10T12:46:20.571Z level=WARN source=sched.go:652 msg="gpu VRAM usage didn't recover within timeout" seconds=5.524556222 runner.size="83.7 GiB" runner.vram="83.7 GiB" runner.parallel=1 runner.pid=387 runner.model=/root/.ollama/models/blobs/sha256-90a618fe6ff21b09ca968df959104eb650658b0bef0faef785c18c2795d993e3

The problem is fixed with a restart of ollama

@tjwebb commented on GitHub (Sep 10, 2025): also running into this. after running for 6-8 hours, my logs are filled with this: ``` cuda driver library failed to get device context 800time=2025-09-10T12:46:15.060Z level=WARN source=gpu.go:436 msg="error looking up nvidia GPU memory" ``` and then finally ``` time=2025-09-10T12:46:20.571Z level=WARN source=sched.go:652 msg="gpu VRAM usage didn't recover within timeout" seconds=5.524556222 runner.size="83.7 GiB" runner.vram="83.7 GiB" runner.parallel=1 runner.pid=387 runner.model=/root/.ollama/models/blobs/sha256-90a618fe6ff21b09ca968df959104eb650658b0bef0faef785c18c2795d993e3 ``` The problem is fixed with a restart of ollama

GiteaMirror commented

2026-05-04 05:10:58 -05:00

@rick-github commented on GitHub (Sep 11, 2025):

@tjwebb It's not clear from your comment, does neither of the options in https://github.com/ollama/ollama/issues/6928#issuecomment-2586208913 work for you?

@rick-github commented on GitHub (Sep 11, 2025): @tjwebb It's not clear from your comment, does neither of the options in https://github.com/ollama/ollama/issues/6928#issuecomment-2586208913 work for you?

Sign in to join this conversation.

Branches Tags

main

dhiltgen/ci

parth-launch-plan-gating

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#66432