[GH-ISSUE #8594] Ollama stops accessing GPU and Reverts to CPU after runing for extended periods #5557

Closed
opened 2026-04-12 16:48:53 -05:00 by GiteaMirror · 10 comments
Owner

Originally created by @loca5790 on GitHub (Jan 26, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/8594

What is the issue?

I have ollama set to be persistent in my VRAM based off my homeassistant usage. I moved to an RTX3090 and after sometimes 12 hours and other times a day plus Ollama will stop using the GPU and revert to CPU only. It then gets stuck spooling the CPU up for hours at a time without generating any response.

System is:
Ryzen 5700G
64GB Ram
RTX3090

Ollama is running via a docker compose:

services:
  ollama:
    volumes:
      - ollama:/root/.ollama
    container_name: ollama
    pull_policy: if_not_present
    tty: true
    restart: unless-stopped
    image: ollama/ollama:${OLLAMA_DOCKER_TAG-latest}
    ports:
      - ${OLLAMA_WEBAPI_PORT-11434}:11434
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            capabilities: [gpu, compute, utility] #["gpu"]
            count: all
    environment:
      - OLLAMA_DEBUG=1
      - CUDA_VISIBLE_DEVICES=0  # Force use of the GPU

  open-webui:
    build:
      context: .
      args:
        OLLAMA_BASE_URL: '/ollama'
      dockerfile: Dockerfile
    image: ghcr.io/open-webui/open-webui:${WEBUI_DOCKER_TAG-main}
    container_name: open-webui
    volumes:
      - open-webui:/app/backend/data
    depends_on:
      - ollama
    ports:
      - ${OPEN_WEBUI_PORT-3000}:8080
    environment:
      - 'OLLAMA_BASE_URL=http://ollama:11434'
      - 'WEBUI_SECRET_KEY='
    extra_hosts:
      - host.docker.internal:host-gateway
    restart: unless-stopped


volumes:
  ollama: {}
  open-webui: {}

I tried adding the CUDA_VISIBLE_DEVICES to force use of GPU.

A restart of the container will bring it back up and load it back into GPU. I have tried to stress test it by running multiple parallel conversation agents without any issues dropping the GPU.

There are no instances in the log that I can find where the GPU becomes unavailable or anything in debug. The only thing that alerts me in the log it has dropped the GPU is that on a request it will load the model and reference CPU.

It could be a me thing, but spent a few days now without luck. I've had this happen on two machines now.

Second machine this happened on:
Same docker compose setup running in a VM on ubuntu server.
RTX3060 running llava-phi3 as the model and not persistent only as requested

OS

Docker, Linux

GPU

Nvidia

CPU

Intel, AMD

Ollama version

0.5.4

Originally created by @loca5790 on GitHub (Jan 26, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/8594 ### What is the issue? I have ollama set to be persistent in my VRAM based off my homeassistant usage. I moved to an RTX3090 and after sometimes 12 hours and other times a day plus Ollama will stop using the GPU and revert to CPU only. It then gets stuck spooling the CPU up for hours at a time without generating any response. System is: Ryzen 5700G 64GB Ram RTX3090 Ollama is running via a docker compose: ``` services: ollama: volumes: - ollama:/root/.ollama container_name: ollama pull_policy: if_not_present tty: true restart: unless-stopped image: ollama/ollama:${OLLAMA_DOCKER_TAG-latest} ports: - ${OLLAMA_WEBAPI_PORT-11434}:11434 deploy: resources: reservations: devices: - driver: nvidia capabilities: [gpu, compute, utility] #["gpu"] count: all environment: - OLLAMA_DEBUG=1 - CUDA_VISIBLE_DEVICES=0 # Force use of the GPU open-webui: build: context: . args: OLLAMA_BASE_URL: '/ollama' dockerfile: Dockerfile image: ghcr.io/open-webui/open-webui:${WEBUI_DOCKER_TAG-main} container_name: open-webui volumes: - open-webui:/app/backend/data depends_on: - ollama ports: - ${OPEN_WEBUI_PORT-3000}:8080 environment: - 'OLLAMA_BASE_URL=http://ollama:11434' - 'WEBUI_SECRET_KEY=' extra_hosts: - host.docker.internal:host-gateway restart: unless-stopped volumes: ollama: {} open-webui: {} ``` I tried adding the CUDA_VISIBLE_DEVICES to force use of GPU. A restart of the container will bring it back up and load it back into GPU. I have tried to stress test it by running multiple parallel conversation agents without any issues dropping the GPU. There are no instances in the log that I can find where the GPU becomes unavailable or anything in debug. The only thing that alerts me in the log it has dropped the GPU is that on a request it will load the model and reference CPU. It could be a me thing, but spent a few days now without luck. I've had this happen on two machines now. Second machine this happened on: Same docker compose setup running in a VM on ubuntu server. RTX3060 running llava-phi3 as the model and not persistent only as requested ### OS Docker, Linux ### GPU Nvidia ### CPU Intel, AMD ### Ollama version 0.5.4
GiteaMirror added the bug label 2026-04-12 16:48:53 -05:00
Author
Owner
<!-- gh-comment-id:2614485623 --> @rick-github commented on GitHub (Jan 26, 2025): https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#amd-gpu-discovery:~:text=If%20Ollama%20initially,the%20docker%20configuration.
Author
Owner
<!-- gh-comment-id:2614507725 --> @loca5790 commented on GitHub (Jan 26, 2025): > https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#amd-gpu-discovery:~:text=If%20Ollama%20initially,the%20docker%20configuration. Will this also fix NVIDIA?
Author
Owner

@rick-github commented on GitHub (Jan 26, 2025):

I believe so. Try it, let us know.

<!-- gh-comment-id:2614508367 --> @rick-github commented on GitHub (Jan 26, 2025): I believe so. Try it, let us know.
Author
Owner

@loca5790 commented on GitHub (Jan 26, 2025):

I believe so. Try it, let us know.

Just made the change and reloaded docker. I also turned of Nvidia on-demand on one system. We will see! Thank you!

<!-- gh-comment-id:2614510978 --> @loca5790 commented on GitHub (Jan 26, 2025): > I believe so. Try it, let us know. Just made the change and reloaded docker. I also turned of Nvidia on-demand on one system. We will see! Thank you!
Author
Owner

@loca5790 commented on GitHub (Jan 27, 2025):

It appears to also work on Nvidia thank you!

<!-- gh-comment-id:2616142194 --> @loca5790 commented on GitHub (Jan 27, 2025): It appears to also work on Nvidia thank you!
Author
Owner

@rick-github commented on GitHub (Jan 27, 2025):

Thanks for the confirmation.

<!-- gh-comment-id:2616157897 --> @rick-github commented on GitHub (Jan 27, 2025): Thanks for the confirmation.
Author
Owner

@iamarealperson1 commented on GitHub (Feb 20, 2025):

I have the same issue with a Nvidia RTX card.

Is it normal for that the /etc/docker/daemon.json file did not exist and I had to create it in order to add the line recommended in the fix here? https://github.com/ollama/ollama/issues/8594#issuecomment-2614485623

Or could it be that the file is in a different location (running Ubuntu 24.10)

<!-- gh-comment-id:2671651038 --> @iamarealperson1 commented on GitHub (Feb 20, 2025): I have the same issue with a Nvidia RTX card. Is it normal for that the /etc/docker/daemon.json file did not exist and I had to create it in order to add the line recommended in the fix here? https://github.com/ollama/ollama/issues/8594#issuecomment-2614485623 Or could it be that the file is in a different location (running Ubuntu 24.10)
Author
Owner

@rick-github commented on GitHub (Feb 20, 2025):

The file is usually created when the nvidia runtime is added:

{
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    },
    "exec-opts": ["native.cgroupdriver=cgroupfs"]
}

If you can access the GPU from inside a container that would imply that the runtimes section above is in a file somewhere other than /etc/docker/daemon.json.

<!-- gh-comment-id:2671677790 --> @rick-github commented on GitHub (Feb 20, 2025): The file is usually created when the nvidia runtime is added: ``` { "runtimes": { "nvidia": { "args": [], "path": "nvidia-container-runtime" } }, "exec-opts": ["native.cgroupdriver=cgroupfs"] } ``` If you can access the GPU from inside a container that would imply that the `runtimes` section above is in a file somewhere other than /etc/docker/daemon.json.
Author
Owner

@mexicanhatman commented on GitHub (Oct 8, 2025):

https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#amd-gpu-discovery:~:text=If%20Ollama%20initially,the%20docker%20configuration.

For anyone seeing this more recently: the hyperlink will direct you a specific heading on this page, but the relevant text for this problem has been moved outside of that heading and further up the page (so I missed it). Scroll up to see the highlighted text ("If Ollama initially...") or click here: https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#linux-docker. @rick-github mentioned this already but I missed it.

<!-- gh-comment-id:3381382257 --> @mexicanhatman commented on GitHub (Oct 8, 2025): > https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#amd-gpu-discovery:~:text=If%20Ollama%20initially,the%20docker%20configuration. For anyone seeing this more recently: the hyperlink will direct you a specific heading on this page, but the relevant text for this problem has been moved outside of that heading and further up the page (so I missed it). Scroll up to see the highlighted text ("If Ollama initially...") or click here: https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#linux-docker. @rick-github mentioned this already but I missed it.
Author
Owner

@ButterMeWaffle commented on GitHub (Mar 9, 2026):

https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#amd-gpu-discovery:~:text=If%20Ollama%20initially,the%20docker%20configuration.

For anyone seeing this more recently: the hyperlink will direct you a specific heading on this page, but the relevant text for this problem has been moved outside of that heading and further up the page (so I missed it). Scroll up to see the highlighted text ("If Ollama initially...") or click here: https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#linux-docker. @rick-github mentioned this already but I missed it.

that link is also dead now, here is the new one

https://github.com/ollama/ollama/blob/main/docs/troubleshooting.mdx#linux-docker

<!-- gh-comment-id:4024404975 --> @ButterMeWaffle commented on GitHub (Mar 9, 2026): > > https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#amd-gpu-discovery:~:text=If%20Ollama%20initially,the%20docker%20configuration. > > For anyone seeing this more recently: the hyperlink will direct you a specific heading on this page, but the relevant text for this problem has been moved outside of that heading and further up the page (so I missed it). Scroll up to see the highlighted text ("If Ollama initially...") or click here: https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#linux-docker. [@rick-github](https://github.com/rick-github) mentioned this already but I missed it. that link is also dead now, here is the new one https://github.com/ollama/ollama/blob/main/docs/troubleshooting.mdx#linux-docker
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#5557