[GH-ISSUE #6928] error looking up nvidia GPU memory - intermittent "cuda driver library failed to get device context 800" #66432

Closed
opened 2026-05-04 05:08:33 -05:00 by GiteaMirror · 22 comments
Owner

Originally created by @championcp on GitHub (Sep 24, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/6928

Originally assigned to: @dhiltgen on GitHub.

What is the issue?

I've been running Ollama using the official Docker image, and everything was working fine initially. However, after a while (sometimes a dozen hours, sometimes a few days), Ollama logs showed the following error. Could you please advise on how to resolve this?

log

cuda driver library failed to get device context 800time=2024-09-24T00:41:06.577Z level=WARN source=gpu.go:400 msg="error looking up nvidia GPU memory"
time=2024-09-24T00:41:06.823Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.504949612 model=/root/.ollama/models/blobs/sha256-60b185bbd0004312d5d4e3343d177b9cc049c1422629b9b96878a75f7bcf7fd3

OS

Docker

GPU

Nvidia

CPU

Intel

Ollama version

0.3.10

Originally created by @championcp on GitHub (Sep 24, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/6928 Originally assigned to: @dhiltgen on GitHub. ### What is the issue? I've been running Ollama using the official Docker image, and everything was working fine initially. However, after a while (sometimes a dozen hours, sometimes a few days), Ollama logs showed the following error. Could you please advise on how to resolve this? log ``` cuda driver library failed to get device context 800time=2024-09-24T00:41:06.577Z level=WARN source=gpu.go:400 msg="error looking up nvidia GPU memory" time=2024-09-24T00:41:06.823Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.504949612 model=/root/.ollama/models/blobs/sha256-60b185bbd0004312d5d4e3343d177b9cc049c1422629b9b96878a75f7bcf7fd3 ``` ### OS Docker ### GPU Nvidia ### CPU Intel ### Ollama version 0.3.10
GiteaMirror added the nvidiabugdockerneeds more info labels 2026-05-04 05:08:35 -05:00
Author
Owner

@rick-github commented on GitHub (Sep 24, 2024):

CUDA error code 800 is CUDA_ERROR_NOT_PERMITTED. Does restarting the container restore operation or do you have to do something else (eg, reboot or run some nvidia command)? Is there anything in the system logs (dmesg, /var/log/syslog, /var/log/kern.log, etc) that indicates anything unusual with the nvidia devices?

<!-- gh-comment-id:2370319142 --> @rick-github commented on GitHub (Sep 24, 2024): CUDA error code 800 is [`CUDA_ERROR_NOT_PERMITTED`](https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TYPES.html#:~:text=CUDA_ERROR_NOT_PERMITTED%20%3D%20800). Does restarting the container restore operation or do you have to do something else (eg, reboot or run some nvidia command)? Is there anything in the system logs (`dmesg`, `/var/log/syslog`, `/var/log/kern.log`, etc) that indicates anything unusual with the nvidia devices?
Author
Owner

@championcp commented on GitHub (Sep 24, 2024):

CUDA error code 800 is CUDA_ERROR_NOT_PERMITTED. Does restarting the container restore operation or do you have to do something else (eg, reboot or run some nvidia command)? Is there anything in the system logs (dmesg, /var/log/syslog, /var/log/kern.log, etc) that indicates anything unusual with the nvidia devices?

Thx for your reply.

I found another program also using the gpu. Restarting the Ollama container resolved the problem, but I couldn't find any specific Nvidia-related errors in the system logs.

image

<!-- gh-comment-id:2370791741 --> @championcp commented on GitHub (Sep 24, 2024): > CUDA error code 800 is [`CUDA_ERROR_NOT_PERMITTED`](https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TYPES.html#:~:text=CUDA_ERROR_NOT_PERMITTED%20%3D%20800). Does restarting the container restore operation or do you have to do something else (eg, reboot or run some nvidia command)? Is there anything in the system logs (`dmesg`, `/var/log/syslog`, `/var/log/kern.log`, etc) that indicates anything unusual with the nvidia devices? Thx for your reply. I found another program also using the gpu. Restarting the Ollama container resolved the problem, but I couldn't find any specific Nvidia-related errors in the system logs. ![image](https://github.com/user-attachments/assets/cfe680b5-0dcb-48da-a5ea-85f50346c0f1)
Author
Owner

@coharms commented on GitHub (Oct 6, 2024):

In my experience this happens preferably (but not exclusively) after a hard restart of the underlying machine (in my case no virtualization in between).

<!-- gh-comment-id:2395356520 --> @coharms commented on GitHub (Oct 6, 2024): In my experience this happens preferably (but not exclusively) after a hard restart of the underlying machine (in my case no virtualization in between).
Author
Owner

@dhiltgen commented on GitHub (Nov 7, 2024):

I've posted a new PR documenting a workaround some users are seeing success with for a slightly different failure mode, but it might be helpful in these cases as well. If you are experiencing the sporadic 800, please give it a try and let us know if it resolves the problem.

https://github.com/ollama/ollama/pull/7519

<!-- gh-comment-id:2461055857 --> @dhiltgen commented on GitHub (Nov 7, 2024): I've posted a new PR documenting a workaround some users are seeing success with for a slightly different failure mode, but it might be helpful in these cases as well. If you are experiencing the sporadic 800, please give it a try and let us know if it resolves the problem. https://github.com/ollama/ollama/pull/7519
Author
Owner

@spacegray-ji commented on GitHub (Jan 13, 2025):

What is the issue?

I've been running Ollama using the official Docker image, and everything was working fine initially. However, after a while (sometimes a dozen hours, sometimes a few days), Ollama logs showed the following error. Could you please advise on how to resolve this?

log

cuda driver library failed to get device context 800time=2024-09-24T00:41:06.577Z level=WARN source=gpu.go:400 msg="error looking up nvidia GPU memory"
time=2024-09-24T00:41:06.823Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.504949612 model=/root/.ollama/models/blobs/sha256-60b185bbd0004312d5d4e3343d177b9cc049c1422629b9b96878a75f7bcf7fd3

OS

Docker

GPU

Nvidia

CPU

Intel

Ollama version

0.3.10

This issue is likely related to Docker configuration rather than an issue with Ollama itself.

If the host system uses systemd to manage the cgroups of Docker containers, unit files referencing the Nvidia GPU may be reloaded. When these unit files are reloaded, Docker containers lose access to the reloaded unit files, which can lead to CUDA errors in the Docker container running Ollama. While restarting the container may temporarily resolve the issue, the container will lose GPU access again every time the unit files are reloaded.

Here are two possible solutions:

  1. Modify the cgroup setting in nvidia-container-runtime/config.toml on the host machine

    # Open /etc/nvidia-container-runtime/config.toml with a text editor (vim or nano)
    sudo vim /etc/nvidia-container-runtime/config.toml
    
    # Add the following configuration
    no-cgroups = false
    
    # Restart Docker
    sudo systemctl restart docker
    
  2. Change the container resource management driver in Docker daemon to cgroupfs

    # Open /etc/docker/daemon.json with a text editor (vim or nano)
    sudo vim /etc/docker/daemon.json
    
    # Add the following configuration
    "exec-opts": ["native.cgroupdriver=cgroupfs"]
    
    # Restart Docker
    sudo systemctl restart docker
    

Note: If your host machine primarily uses systemd, solution 2 may cause conflicts with other programs. Therefore, solution 1 is recommended.

<!-- gh-comment-id:2586208913 --> @spacegray-ji commented on GitHub (Jan 13, 2025): > ### What is the issue? > I've been running Ollama using the official Docker image, and everything was working fine initially. However, after a while (sometimes a dozen hours, sometimes a few days), Ollama logs showed the following error. Could you please advise on how to resolve this? > > log > > ``` > cuda driver library failed to get device context 800time=2024-09-24T00:41:06.577Z level=WARN source=gpu.go:400 msg="error looking up nvidia GPU memory" > time=2024-09-24T00:41:06.823Z level=WARN source=sched.go:647 msg="gpu VRAM usage didn't recover within timeout" seconds=5.504949612 model=/root/.ollama/models/blobs/sha256-60b185bbd0004312d5d4e3343d177b9cc049c1422629b9b96878a75f7bcf7fd3 > ``` > > ### OS > Docker > > ### GPU > Nvidia > > ### CPU > Intel > > ### Ollama version > 0.3.10 This issue is likely related to `Docker` configuration rather than an issue with `Ollama` itself. If the host system uses **systemd** to manage the cgroups of Docker containers, unit files referencing the Nvidia GPU may be reloaded. When these unit files are reloaded, Docker containers lose access to the reloaded unit files, which can lead to CUDA errors in the Docker container running Ollama. While restarting the container may temporarily resolve the issue, the container will lose GPU access again every time the unit files are reloaded. Here are two possible solutions: 1. **Modify the cgroup setting in `nvidia-container-runtime/config.toml` on the host machine** ```bash # Open /etc/nvidia-container-runtime/config.toml with a text editor (vim or nano) sudo vim /etc/nvidia-container-runtime/config.toml # Add the following configuration no-cgroups = false # Restart Docker sudo systemctl restart docker ``` 2. **Change the container resource management driver in Docker daemon to `cgroupfs`** ```bash # Open /etc/docker/daemon.json with a text editor (vim or nano) sudo vim /etc/docker/daemon.json # Add the following configuration "exec-opts": ["native.cgroupdriver=cgroupfs"] # Restart Docker sudo systemctl restart docker ``` **Note:** If your host machine primarily uses **systemd**, `solution 2` may cause conflicts with other programs. Therefore, `solution 1` is recommended.
Author
Owner

@stronk7 commented on GitHub (Feb 9, 2025):

Should this be closed now that the Option2 has been documented @ #7519 ?

Or should Option1, as commented by @spacegray-ji, the one to recommend?

Ciao :-)

PS: I'm going to try Option1 here, as far as we are facing this problem every few weeks.

<!-- gh-comment-id:2646146310 --> @stronk7 commented on GitHub (Feb 9, 2025): Should this be closed now that the Option2 has been documented @ #7519 ? Or should Option1, as commented by @spacegray-ji, the one to recommend? Ciao :-) PS: I'm going to try Option1 here, as far as we are facing this problem every few weeks.
Author
Owner

@mrCR100 commented on GitHub (Feb 19, 2025):

@spacegray-ji
hi! I wonder what does "no-cgroups = false" mean? Why this configuration solves the problem?^-^

<!-- gh-comment-id:2667675236 --> @mrCR100 commented on GitHub (Feb 19, 2025): @spacegray-ji hi! I wonder what does "no-cgroups = false" mean? Why this configuration solves the problem?^-^
Author
Owner

@spacegray-ji commented on GitHub (Feb 20, 2025):

@spacegray-ji hi! I wonder what does "no-cgroups = false" mean? Why this configuration solves the problem?^-^

The configuration option no-cgroups = false in /etc/nvidia-container-runtime/config.toml controls whether the NVIDIA container runtime should use cgroups for managing GPU resources.

Docker uses cgroups by default to manage container resources, including CPU, memory, and GPU access. When running containers with NVIDIA GPUs, cgroups also play a role in controlling GPU access.

However, if systemd reloads certain unit files, it can reset cgroup configurations, causing Docker containers to lose access to the GPU.

<!-- gh-comment-id:2670115972 --> @spacegray-ji commented on GitHub (Feb 20, 2025): > [@spacegray-ji](https://github.com/spacegray-ji) hi! I wonder what does "no-cgroups = false" mean? Why this configuration solves the problem?^-^ The configuration option `no-cgroups = false` in `/etc/nvidia-container-runtime/config.toml` controls whether the NVIDIA container runtime should use `cgroups` for managing GPU resources. Docker uses `cgroups` by default to manage container resources, including CPU, memory, and GPU access. When running containers with NVIDIA GPUs, cgroups also play a role in controlling GPU access. However, if `systemd` reloads certain unit files, it can reset `cgroup` configurations, causing Docker containers to lose access to the GPU.
Author
Owner

@nanggn commented on GitHub (Mar 5, 2025):

I also

@spacegray-ji hi! I wonder what does "no-cgroups = false" mean? Why this configuration solves the problem?^-^

The configuration option no-cgroups = false in /etc/nvidia-container-runtime/config.toml controls whether the NVIDIA container runtime should use cgroups for managing GPU resources.

Docker uses cgroups by default to manage container resources, including CPU, memory, and GPU access. When running containers with NVIDIA GPUs, cgroups also play a role in controlling GPU access.

However, if systemd reloads certain unit files, it can reset cgroup configurations, causing Docker containers to lose access to the GPU.

After several hours of normal operation of the Ollama container, I also encountered this issue. I have tried the recommended option 1, but there are still problems. How can I solve them?

<!-- gh-comment-id:2699444338 --> @nanggn commented on GitHub (Mar 5, 2025): I also > > [@spacegray-ji](https://github.com/spacegray-ji) hi! I wonder what does "no-cgroups = false" mean? Why this configuration solves the problem?^-^ > > The configuration option `no-cgroups = false` in `/etc/nvidia-container-runtime/config.toml` controls whether the NVIDIA container runtime should use `cgroups` for managing GPU resources. > > Docker uses `cgroups` by default to manage container resources, including CPU, memory, and GPU access. When running containers with NVIDIA GPUs, cgroups also play a role in controlling GPU access. > > However, if `systemd` reloads certain unit files, it can reset `cgroup` configurations, causing Docker containers to lose access to the GPU. After several hours of normal operation of the Ollama container, I also encountered this issue. I have tried the recommended option 1, but there are still problems. How can I solve them?
Author
Owner

@spacegray-ji commented on GitHub (Mar 5, 2025):

I also

@spacegray-ji hi! I wonder what does "no-cgroups = false" mean? Why this configuration solves the problem?^-^

The configuration option no-cgroups = false in /etc/nvidia-container-runtime/config.toml controls whether the NVIDIA container runtime should use cgroups for managing GPU resources.
Docker uses cgroups by default to manage container resources, including CPU, memory, and GPU access. When running containers with NVIDIA GPUs, cgroups also play a role in controlling GPU access.
However, if systemd reloads certain unit files, it can reset cgroup configurations, causing Docker containers to lose access to the GPU.

After several hours of normal operation of the Ollama container, I also encountered this issue. I have tried the recommended option 1, but there are still problems. How can I solve them?

If you still experience unstable GPU access in the Ollama container for unknown reasons, you might consider an alternative approach: keeping the LLM model loaded on the GPU continuously. The following command allows the model to remain in GPU memory without the need for reloading:

docker exec -it {OLLAMA_CONTAINER_NAME} ollama run {MODEL} --keepalive=999999h

(Please refer to #6401 for continuously keeping the embedding model loaded on the GPU)

Also, checking the configuration in your daemon.json file can help diagnose the issue. Could you please share the specific settings in this file?

<!-- gh-comment-id:2699951839 --> @spacegray-ji commented on GitHub (Mar 5, 2025): > I also > > > > [@spacegray-ji](https://github.com/spacegray-ji) hi! I wonder what does "no-cgroups = false" mean? Why this configuration solves the problem?^-^ > > > > > > The configuration option `no-cgroups = false` in `/etc/nvidia-container-runtime/config.toml` controls whether the NVIDIA container runtime should use `cgroups` for managing GPU resources. > > Docker uses `cgroups` by default to manage container resources, including CPU, memory, and GPU access. When running containers with NVIDIA GPUs, cgroups also play a role in controlling GPU access. > > However, if `systemd` reloads certain unit files, it can reset `cgroup` configurations, causing Docker containers to lose access to the GPU. > > After several hours of normal operation of the Ollama container, I also encountered this issue. I have tried the recommended option 1, but there are still problems. How can I solve them? If you still experience unstable GPU access in the Ollama container for unknown reasons, you might consider an alternative approach: keeping the LLM model loaded on the GPU continuously. The following command allows the model to remain in GPU memory without the need for reloading: ```bash docker exec -it {OLLAMA_CONTAINER_NAME} ollama run {MODEL} --keepalive=999999h ``` (Please refer to #6401 for continuously keeping the embedding model loaded on the GPU) Also, checking the configuration in your `daemon.json` file can help diagnose the issue. Could you please share the specific settings in this file?
Author
Owner

@tringler commented on GitHub (Mar 17, 2025):

@spacegray-ji hi! I wonder what does "no-cgroups = false" mean? Why this configuration solves the problem?^-^

The configuration option no-cgroups = false in /etc/nvidia-container-runtime/config.toml controls whether the NVIDIA container runtime should use cgroups for managing GPU resources.

Docker uses cgroups by default to manage container resources, including CPU, memory, and GPU access. When running containers with NVIDIA GPUs, cgroups also play a role in controlling GPU access.

However, if systemd reloads certain unit files, it can reset cgroup configurations, causing Docker containers to lose access to the GPU.

I'm also faced to that issue. I set no-cgroups = false on /etc/nvidia-container-runtime/config.toml and restarted crio, but I'm faced to that issue again and again. --keepalive=999999h helps, but if the container is restarted I'm running into the same issue.

I'm using crio on rhel, which is managing cgroups with systemd.

Is there anything else I could try?

<!-- gh-comment-id:2730529777 --> @tringler commented on GitHub (Mar 17, 2025): > > [@spacegray-ji](https://github.com/spacegray-ji) hi! I wonder what does "no-cgroups = false" mean? Why this configuration solves the problem?^-^ > > The configuration option `no-cgroups = false` in `/etc/nvidia-container-runtime/config.toml` controls whether the NVIDIA container runtime should use `cgroups` for managing GPU resources. > > Docker uses `cgroups` by default to manage container resources, including CPU, memory, and GPU access. When running containers with NVIDIA GPUs, cgroups also play a role in controlling GPU access. > > However, if `systemd` reloads certain unit files, it can reset `cgroup` configurations, causing Docker containers to lose access to the GPU. I'm also faced to that issue. I set `no-cgroups = false` on `/etc/nvidia-container-runtime/config.toml` and restarted crio, but I'm faced to that issue again and again. `--keepalive=999999h` helps, but if the container is restarted I'm running into the same issue. I'm using crio on rhel, which is managing cgroups with systemd. Is there anything else I could try?
Author
Owner

@Fade78 commented on GitHub (Mar 28, 2025):

In my experience this happens preferably (but not exclusively) after a hard restart of the underlying machine (in my case no virtualization in between).

It happens to me even on bare metal. I use a docker container for ollama.

<!-- gh-comment-id:2762159717 --> @Fade78 commented on GitHub (Mar 28, 2025): > In my experience this happens preferably (but not exclusively) after a hard restart of the underlying machine (in my case no virtualization in between). It happens to me even on bare metal. I use a docker container for ollama.
Author
Owner

@Fade78 commented on GitHub (Mar 28, 2025):

I follow this issue closely it happens for me on bare metal or VM, with docker containers in ubuntu OS. The ollama version I use is 0.6.2 and my nvidia drivers are 570.86.15.

<!-- gh-comment-id:2762170118 --> @Fade78 commented on GitHub (Mar 28, 2025): I follow this issue closely it happens for me on bare metal or VM, with docker containers in ubuntu OS. The ollama version I use is 0.6.2 and my nvidia drivers are 570.86.15.
Author
Owner

@tringler commented on GitHub (Mar 28, 2025):

I'm using nvidia grid on an ESXi RHEL VM with crio as container Engine in a Vanilla K8S environment, so I guess it's somewhere between nvidia container runtime and nvidia driver.

<!-- gh-comment-id:2762183364 --> @tringler commented on GitHub (Mar 28, 2025): I'm using nvidia grid on an ESXi RHEL VM with crio as container Engine in a Vanilla K8S environment, so I guess it's somewhere between nvidia container runtime and nvidia driver.
Author
Owner

@rick-github commented on GitHub (Mar 28, 2025):

@Fade78 It's not clear from you comment, does neither of the options in https://github.com/ollama/ollama/issues/6928#issuecomment-2586208913 work for you?

<!-- gh-comment-id:2762191810 --> @rick-github commented on GitHub (Mar 28, 2025): @Fade78 It's not clear from you comment, does neither of the options in https://github.com/ollama/ollama/issues/6928#issuecomment-2586208913 work for you?
Author
Owner

@aquananu commented on GitHub (Apr 6, 2025):

I faced same issue after updating the toolkit and the drivers got updated creating a mismatch.

a system reboot solved the issue

<!-- gh-comment-id:2781200969 --> @aquananu commented on GitHub (Apr 6, 2025): I faced same issue after updating the toolkit and the drivers got updated creating a mismatch. a system reboot solved the issue
Author
Owner

@Fade78 commented on GitHub (Apr 11, 2025):

@Fade78 It's not clear from you comment, does neither of the options in #6928 (comment) work for you?

I tumbled on the FAQ and applied only the change in the daemon.json. It corrected the problem.

After that, I also found a mode called "persistent" in the drivers of nvidia and I'm curious to know if I could enable that and remove the daemon.json modification or if it's unrelated. Do you have any idea if it's the root of the problem?

<!-- gh-comment-id:2795962600 --> @Fade78 commented on GitHub (Apr 11, 2025): > [@Fade78](https://github.com/Fade78) It's not clear from you comment, does neither of the options in [#6928 (comment)](https://github.com/ollama/ollama/issues/6928#issuecomment-2586208913) work for you? I tumbled on the FAQ and applied only the change in the daemon.json. **It corrected the problem.** After that, I also found a mode called "persistent" in the drivers of nvidia and I'm curious to know if I could enable that and remove the daemon.json modification or if it's unrelated. Do you have any idea if it's the root of the problem?
Author
Owner

@qhaas commented on GitHub (Jul 24, 2025):

@Fade78 It's not clear from you comment, does neither of the options in #6928 (comment) work for you?

Solution 1 did not work for me, but Solution 2 seems to be holding up. I don't understand why this workaround is needed for ollama. I have other GPU enabled services running in docker (e.g. TF2 based workflows) that do not require it.

<!-- gh-comment-id:3115332129 --> @qhaas commented on GitHub (Jul 24, 2025): > [@Fade78](https://github.com/Fade78) It's not clear from you comment, does neither of the options in [#6928 (comment)](https://github.com/ollama/ollama/issues/6928#issuecomment-2586208913) work for you? Solution 1 did not work for me, but Solution 2 seems to be holding up. I don't understand why this workaround is needed for ollama. I have other GPU enabled services running in docker (e.g. TF2 based workflows) that do not require it.
Author
Owner

@dhiltgen commented on GitHub (Jul 31, 2025):

I don't understand why this workaround is needed for ollama. I have other GPU enabled services running in docker (e.g. TF2 based workflows) that do not require it.

I don't know for certain, but our GPU usage pattern is somewhat unique in that we load/unload the GPU libraries periodically and when Ollama is idle, no GPU libraries are loaded. The intent is to let the GPU go back to a low-power state when not in use, and keeping the libraries loaded and sessions active keeps the GPU "powered up". Other apps often keep the GPU bound continuously.

<!-- gh-comment-id:3140380204 --> @dhiltgen commented on GitHub (Jul 31, 2025): > I don't understand why this workaround is needed for ollama. I have other GPU enabled services running in docker (e.g. TF2 based workflows) that do not require it. I don't know for certain, but our GPU usage pattern is somewhat unique in that we load/unload the GPU libraries periodically and when Ollama is idle, no GPU libraries are loaded. The intent is to let the GPU go back to a low-power state when not in use, and keeping the libraries loaded and sessions active keeps the GPU "powered up". Other apps often keep the GPU bound continuously.
Author
Owner

@zyberwoof commented on GitHub (Sep 8, 2025):

I'm toying around with a method for mitigating this issue, and I'm posting my idea here in case it helps anyone. In a nutshell, I've created a Health Check for my Docker container that detects when nvidia-smi fails. This marks the container as unhealthy. From there another container, script, or application see that status and restart the container.

I apologize in advance for the amateur craftsmanship.

Health Check for Docker

I've added a Health Check to my Docker compose file. See the healthcheck section below:

services:
  p40:
    image: ollama/ollama:latest
    pull_policy: always
    tty: true
    restart: unless-stopped
    ports:
      - 11434:11434
    runtime: nvidia
    env_file:
      - .env
      - ollama-p40.env
    healthcheck:
      test: nvidia-smi && exit 0 || exit 1
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s

Normally the nvidia-smi command works fine within the Docker container. But when the issue occurs, it throws an error for me. This healthcheck section checks for the command to fail every 30 seconds. If it fails 3 times in a row, Docker marks the container as unhealthy. Once you add a healthcheck like this, you can see whether the container is healthy or unhealthy by running docker ps.

This in itself isn't meant to directly solve the problem. Though, it might help keep since running nvidia-smi might help keep the GPU slightly more active. But with the container marked as unhealthy, you now have multiple ways of mitigating the container automatically. One would be to use another container like autoheal to automatically restart the container. Another would be to run a script with cron or as a service that periodically parses docker ps and checks for unhealthy containers.

I personally made a script that loops endlessly and unit file so that I could run it as a daemon. Good or bad, I'm sharing my work below. Keep in mind that I've been using this for less than a day. Not enough time to verify that it works well.

Background script

Instructions below assume this file is saved as /usr/local/bin/heal_docker_containers.sh

#!/bin/bash
#
# heal_docker_containers.sh
#
# Created 2025 09 07
#
# This script monitors for unhealthy docker containers.  If one is found, this script
# automatically restarts the container.
#
# Containers that are restarted are logged in syslog.  The messages will contain the
# containers' names and IDs, and the messages will be tagged with the name of this
# script.

CHECK_CONTAINER_STATUS_FREQUENCY_SECONDS=60
LOGGER_PRIORITY="local1.warn"
LOGGER_TAG="$(basename "$0")"

function restart_unhealthy_containers () {

        # Get a list of unhealthy containers.  IDs only.
        unhealthy_containers=$(docker ps --quiet --filter health=unhealthy)

        # Loop through the unhealthy containers and restart them
        for container_id in $unhealthy_containers; do

                container_name=$(docker ps --filter "id=$container_id" --format "{{.Names}}")

                message="Restarting unhealthy container $container_name with ID [$container_id]"

                logger --tag "$LOGGER_TAG" --priority "$LOGGER_PRIORITY" "$message"

                docker restart $container_id > /dev/null

        done

}  # restart_unhealthy_containers


while "true"; do

        restart_unhealthy_containers
        sleep $CHECK_CONTAINER_STATUS_FREQUENCY_SECONDS

done

Unit File

Instructions below assume this file is saved as /etc/systemd/system/heal-docker-containers.service

[Unit]
Description=Restarts unhealthy Docker containers.
After=docker.service

[Service]
ExecStart=/bin/bash /usr/local/bin/heal_docker_containers.sh
Type=simple
Restart=always

[Install]
WantedBy=docker.service

Installation

The steps below install the service to restart any and all unhealthy containers.

  1. Create the script and unit files above.
  2. Start the service with sudo systemctl start heal-docker-containers.service
  3. Configure the service to start automatically on boot with sudo systemctl enable heal-docker-containers.service

Uninstall

  1. Configure the service to NOT start automatically on boot with sudo systemctl disable heal-docker-containers.service
  2. Stop the service with sudo systemctl stop heal-docker-containers.service
  3. OPTIONAL: Remove the files heal_docker_containers.sh and heal-docker-containers.service

Check logs for which containers have been restarted

The command below should show the entries for each time this process restarts a container.
journalctl -t "heal_docker_containers.sh"

<!-- gh-comment-id:3268332356 --> @zyberwoof commented on GitHub (Sep 8, 2025): I'm toying around with a method for mitigating this issue, and I'm posting my idea here in case it helps anyone. In a nutshell, I've created a Health Check for my Docker container that detects when nvidia-smi fails. This marks the container as unhealthy. From there another container, script, or application see that status and restart the container. _I apologize in advance for the amateur craftsmanship._ # Health Check for Docker I've added a Health Check to my Docker compose file. See the healthcheck section below: ``` services: p40: image: ollama/ollama:latest pull_policy: always tty: true restart: unless-stopped ports: - 11434:11434 runtime: nvidia env_file: - .env - ollama-p40.env healthcheck: test: nvidia-smi && exit 0 || exit 1 interval: 30s timeout: 10s retries: 3 start_period: 10s ``` Normally the `nvidia-smi` command works fine within the Docker container. But when the issue occurs, it throws an error for me. This healthcheck section checks for the command to fail every 30 seconds. If it fails 3 times in a row, Docker marks the container as unhealthy. Once you add a healthcheck like this, you can see whether the container is healthy or unhealthy by running `docker ps`. This in itself isn't meant to directly solve the problem. Though, it might help keep since running `nvidia-smi` might help keep the GPU slightly more active. But with the container marked as unhealthy, you now have multiple ways of mitigating the container automatically. One would be to use another container like [autoheal](https://hub.docker.com/r/willfarrell/autoheal/) to automatically restart the container. Another would be to run a script with cron or as a service that periodically parses docker ps and checks for unhealthy containers. I personally made a script that loops endlessly and unit file so that I could run it as a daemon. Good or bad, I'm sharing my work below. _Keep in mind that I've been using this for less than a day. Not enough time to verify that it works well._ # Background script *Instructions below assume this file is saved as /usr/local/bin/heal_docker_containers.sh* ``` #!/bin/bash # # heal_docker_containers.sh # # Created 2025 09 07 # # This script monitors for unhealthy docker containers. If one is found, this script # automatically restarts the container. # # Containers that are restarted are logged in syslog. The messages will contain the # containers' names and IDs, and the messages will be tagged with the name of this # script. CHECK_CONTAINER_STATUS_FREQUENCY_SECONDS=60 LOGGER_PRIORITY="local1.warn" LOGGER_TAG="$(basename "$0")" function restart_unhealthy_containers () { # Get a list of unhealthy containers. IDs only. unhealthy_containers=$(docker ps --quiet --filter health=unhealthy) # Loop through the unhealthy containers and restart them for container_id in $unhealthy_containers; do container_name=$(docker ps --filter "id=$container_id" --format "{{.Names}}") message="Restarting unhealthy container $container_name with ID [$container_id]" logger --tag "$LOGGER_TAG" --priority "$LOGGER_PRIORITY" "$message" docker restart $container_id > /dev/null done } # restart_unhealthy_containers while "true"; do restart_unhealthy_containers sleep $CHECK_CONTAINER_STATUS_FREQUENCY_SECONDS done ``` # Unit File *Instructions below assume this file is saved as /etc/systemd/system/heal-docker-containers.service* ``` [Unit] Description=Restarts unhealthy Docker containers. After=docker.service [Service] ExecStart=/bin/bash /usr/local/bin/heal_docker_containers.sh Type=simple Restart=always [Install] WantedBy=docker.service ``` # Installation _The steps below install the service to restart any and all unhealthy containers._ 1. Create the script and unit files above. 2. Start the service with `sudo systemctl start heal-docker-containers.service` 3. Configure the service to start automatically on boot with `sudo systemctl enable heal-docker-containers.service` # Uninstall 1. Configure the service to NOT start automatically on boot with `sudo systemctl disable heal-docker-containers.service` 2. Stop the service with `sudo systemctl stop heal-docker-containers.service` 3. OPTIONAL: Remove the files heal_docker_containers.sh and heal-docker-containers.service # Check logs for which containers have been restarted *The command below should show the entries for each time this process restarts a container.* `journalctl -t "heal_docker_containers.sh"`
Author
Owner

@tjwebb commented on GitHub (Sep 10, 2025):

also running into this. after running for 6-8 hours, my logs are filled with this:

cuda driver library failed to get device context 800time=2025-09-10T12:46:15.060Z level=WARN source=gpu.go:436 msg="error looking up nvidia GPU memory"

and then finally

time=2025-09-10T12:46:20.571Z level=WARN source=sched.go:652 msg="gpu VRAM usage didn't recover within timeout" seconds=5.524556222 runner.size="83.7 GiB" runner.vram="83.7 GiB" runner.parallel=1 runner.pid=387 runner.model=/root/.ollama/models/blobs/sha256-90a618fe6ff21b09ca968df959104eb650658b0bef0faef785c18c2795d993e3

The problem is fixed with a restart of ollama

<!-- gh-comment-id:3274838158 --> @tjwebb commented on GitHub (Sep 10, 2025): also running into this. after running for 6-8 hours, my logs are filled with this: ``` cuda driver library failed to get device context 800time=2025-09-10T12:46:15.060Z level=WARN source=gpu.go:436 msg="error looking up nvidia GPU memory" ``` and then finally ``` time=2025-09-10T12:46:20.571Z level=WARN source=sched.go:652 msg="gpu VRAM usage didn't recover within timeout" seconds=5.524556222 runner.size="83.7 GiB" runner.vram="83.7 GiB" runner.parallel=1 runner.pid=387 runner.model=/root/.ollama/models/blobs/sha256-90a618fe6ff21b09ca968df959104eb650658b0bef0faef785c18c2795d993e3 ``` The problem is fixed with a restart of ollama
Author
Owner

@rick-github commented on GitHub (Sep 11, 2025):

@tjwebb It's not clear from your comment, does neither of the options in https://github.com/ollama/ollama/issues/6928#issuecomment-2586208913 work for you?

<!-- gh-comment-id:3277051180 --> @rick-github commented on GitHub (Sep 11, 2025): @tjwebb It's not clear from your comment, does neither of the options in https://github.com/ollama/ollama/issues/6928#issuecomment-2586208913 work for you?
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#66432