[GH-ISSUE #3601] Docker 0.1.31 the 2nd Ollama cannot use its designated GPU #48734

Closed
opened 2026-04-28 09:10:09 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @ww2283 on GitHub (Apr 11, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/3601

What is the issue?

I'm on Ubuntu 20.04 with two Ada6000 cards. I use docker compose to host two instance of Ollama, each has its own model because I want to use them for AutoGen. For each Ollama container I want to assign an individual GPU.
In real use, I noticed that ollama1 is working perfectly fine, while ollama2 is working but only use CPU for inference. So I would appreciate the help to get the ollama2 to pick up its designated GPU unit.

Here is my compose yaml file:

services:
  ollama1:
    image: ollama/ollama:latest
    container_name: ollama_model_1
    ports:
      - '11435:11434'
    environment:
      - CUDA_VISIBLE_DEVICES=GPU-*************************
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    volumes:
      - /ADATAtmp/docker/ollama_model_1:/root/.ollama
    networks:
      - autogen_network
    restart: always

  open-webui1:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui_1
    ports:
      - "11436:8080"
    volumes:
      - /ADATAtmp/docker/open-webui-1:/app/backend/data
    environment:
      - OLLAMA_BASE_URL=http://ollama_model_1:11434
    extra_hosts:
      - "host.docker.internal:host-gateway"
    networks:
      - autogen_network
    restart: always

  ollama2:
    image: ollama/ollama:latest
    container_name: ollama_model_2
    ports:
      - '11437:11434'
    environment:
      - CUDA_VISIBLE_DEVICES=GPU-*************************
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    volumes:
      - /ADATAtmp/docker/ollama_model_2:/root/.ollama
    networks:
      - autogen_network
    restart: always

  open-webui2:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui_2
    ports:
      - "11438:8080"
    volumes:
      - /ADATAtmp/docker/open-webui-2:/app/backend/data
    environment:
      - OLLAMA_BASE_URL=http://ollama_model_2:11434
    extra_hosts:
      - "host.docker.internal:host-gateway"
    networks:
      - autogen_network
    restart: always

networks:
  autogen_network:
    driver: bridge

Also I believe this part of docker logs ollama2 is relevant for reviewing:

$ docker logs ollama_model_2
time=2024-04-11T19:05:33.375Z level=INFO source=images.go:804 msg="total blobs: 10"
time=2024-04-11T19:05:33.375Z level=INFO source=images.go:811 msg="total unused blobs removed: 0"
time=2024-04-11T19:05:33.375Z level=INFO source=routes.go:1118 msg="Listening on [::]:11434 (version 0.1.31)"
time=2024-04-11T19:05:33.376Z level=INFO source=payload_common.go:113 msg="Extracting dynamic libraries to /tmp/ollama859516905/runners ..."
time=2024-04-11T19:05:35.857Z level=INFO source=payload_common.go:140 msg="Dynamic LLM libraries [rocm_v60000 cpu cpu_avx2 cpu_avx cuda_v11]"
time=2024-04-11T19:05:35.857Z level=INFO source=gpu.go:115 msg="Detecting GPU type"
time=2024-04-11T19:05:35.857Z level=INFO source=gpu.go:265 msg="Searching for GPU management library libcudart.so*"
time=2024-04-11T19:05:35.857Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: [/tmp/ollama859516905/runners/cuda_v11/libcudart.so.11.0]"
time=2024-04-11T19:05:35.879Z level=INFO source=gpu.go:340 msg="Unable to load cudart CUDA management library /tmp/ollama859516905/runners/cuda_v11/libcudart.so.11.0: cudart init failure: 100"
time=2024-04-11T19:05:35.879Z level=INFO source=gpu.go:265 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-04-11T19:05:35.879Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.535.154.05]"
time=2024-04-11T19:05:35.885Z level=INFO source=gpu.go:131 msg="Nvidia GPU detected via nvidia-ml"
time=2024-04-11T19:05:35.885Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-04-11T19:05:35.890Z level=INFO source=gpu.go:169 msg="[nvidia-ml] NVML CUDA Compute Capability detected: 8.9"
[GIN] 2024/04/11 - 19:08:54 | 200 |     630.936µs |      172.20.0.5 | GET      "/api/tags"
[GIN] 2024/04/11 - 19:08:54 | 200 |     568.095µs |      172.20.0.5 | GET      "/api/tags"
[GIN] 2024/04/11 - 19:08:54 | 200 |     572.393µs |      172.20.0.5 | GET      "/api/tags"
[GIN] 2024/04/11 - 19:08:54 | 200 |      36.179µs |      172.20.0.5 | GET      "/api/version"
[GIN] 2024/04/11 - 19:08:56 | 200 |      38.734µs |      172.20.0.5 | GET      "/api/version"
time=2024-04-11T19:09:04.802Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-04-11T19:09:04.802Z level=INFO source=gpu.go:169 msg="[nvidia-ml] NVML CUDA Compute Capability detected: 8.9"
time=2024-04-11T19:09:04.802Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-04-11T19:09:04.802Z level=INFO source=gpu.go:169 msg="[nvidia-ml] NVML CUDA Compute Capability detected: 8.9"
time=2024-04-11T19:09:04.802Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-04-11T19:09:04.807Z level=INFO source=dyn_ext_server.go:87 msg="Loading Dynamic llm server: /tmp/ollama859516905/runners/cuda_v11/libext_server.so"
time=2024-04-11T19:09:04.807Z level=INFO source=dyn_ext_server.go:147 msg="Initializing llama server"
time=2024-04-11T19:09:04.827Z level=WARN source=llm.go:170 msg="Failed to load dynamic library /tmp/ollama859516905/runners/cuda_v11/libext_server.so  Unable to init GPU: no CUDA-capable device is detected"
time=2024-04-11T19:09:04.828Z level=INFO source=dyn_ext_server.go:87 msg="Loading Dynamic llm server: /tmp/ollama859516905/runners/cpu_avx2/libext_server.so"
time=2024-04-11T19:09:04.828Z level=INFO source=dyn_ext_server.go:147 msg="Initializing llama server"
llama_model_loader: loaded meta data with 26 key-value pairs and 995 tensors from /root/.ollama/models/blobs/sha256-d68d6a65178011b746d215273d6a1f607f78be24a53532cf99618a32c2f382a2 (version GGUF V3 (latest))
# ...rest of the log

What did you expect to see?

No response

Steps to reproduce

No response

Are there any recent changes that introduced the issue?

No response

OS

Linux

Architecture

x86

Platform

Docker

Ollama version

0.1.31

GPU

Nvidia

GPU info

No response

CPU

AMD

Other software

No response

Originally created by @ww2283 on GitHub (Apr 11, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/3601 ### What is the issue? I'm on Ubuntu 20.04 with two Ada6000 cards. I use docker compose to host two instance of Ollama, each has its own model because I want to use them for AutoGen. For each Ollama container I want to assign an individual GPU. In real use, I noticed that ollama1 is working perfectly fine, while ollama2 is working but only use CPU for inference. So I would appreciate the help to get the ollama2 to pick up its designated GPU unit. Here is my compose yaml file: ``` services: ollama1: image: ollama/ollama:latest container_name: ollama_model_1 ports: - '11435:11434' environment: - CUDA_VISIBLE_DEVICES=GPU-************************* deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu] volumes: - /ADATAtmp/docker/ollama_model_1:/root/.ollama networks: - autogen_network restart: always open-webui1: image: ghcr.io/open-webui/open-webui:main container_name: open-webui_1 ports: - "11436:8080" volumes: - /ADATAtmp/docker/open-webui-1:/app/backend/data environment: - OLLAMA_BASE_URL=http://ollama_model_1:11434 extra_hosts: - "host.docker.internal:host-gateway" networks: - autogen_network restart: always ollama2: image: ollama/ollama:latest container_name: ollama_model_2 ports: - '11437:11434' environment: - CUDA_VISIBLE_DEVICES=GPU-************************* deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu] volumes: - /ADATAtmp/docker/ollama_model_2:/root/.ollama networks: - autogen_network restart: always open-webui2: image: ghcr.io/open-webui/open-webui:main container_name: open-webui_2 ports: - "11438:8080" volumes: - /ADATAtmp/docker/open-webui-2:/app/backend/data environment: - OLLAMA_BASE_URL=http://ollama_model_2:11434 extra_hosts: - "host.docker.internal:host-gateway" networks: - autogen_network restart: always networks: autogen_network: driver: bridge ``` Also I believe this part of docker logs ollama2 is relevant for reviewing: ``` $ docker logs ollama_model_2 time=2024-04-11T19:05:33.375Z level=INFO source=images.go:804 msg="total blobs: 10" time=2024-04-11T19:05:33.375Z level=INFO source=images.go:811 msg="total unused blobs removed: 0" time=2024-04-11T19:05:33.375Z level=INFO source=routes.go:1118 msg="Listening on [::]:11434 (version 0.1.31)" time=2024-04-11T19:05:33.376Z level=INFO source=payload_common.go:113 msg="Extracting dynamic libraries to /tmp/ollama859516905/runners ..." time=2024-04-11T19:05:35.857Z level=INFO source=payload_common.go:140 msg="Dynamic LLM libraries [rocm_v60000 cpu cpu_avx2 cpu_avx cuda_v11]" time=2024-04-11T19:05:35.857Z level=INFO source=gpu.go:115 msg="Detecting GPU type" time=2024-04-11T19:05:35.857Z level=INFO source=gpu.go:265 msg="Searching for GPU management library libcudart.so*" time=2024-04-11T19:05:35.857Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: [/tmp/ollama859516905/runners/cuda_v11/libcudart.so.11.0]" time=2024-04-11T19:05:35.879Z level=INFO source=gpu.go:340 msg="Unable to load cudart CUDA management library /tmp/ollama859516905/runners/cuda_v11/libcudart.so.11.0: cudart init failure: 100" time=2024-04-11T19:05:35.879Z level=INFO source=gpu.go:265 msg="Searching for GPU management library libnvidia-ml.so" time=2024-04-11T19:05:35.879Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.535.154.05]" time=2024-04-11T19:05:35.885Z level=INFO source=gpu.go:131 msg="Nvidia GPU detected via nvidia-ml" time=2024-04-11T19:05:35.885Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-04-11T19:05:35.890Z level=INFO source=gpu.go:169 msg="[nvidia-ml] NVML CUDA Compute Capability detected: 8.9" [GIN] 2024/04/11 - 19:08:54 | 200 | 630.936µs | 172.20.0.5 | GET "/api/tags" [GIN] 2024/04/11 - 19:08:54 | 200 | 568.095µs | 172.20.0.5 | GET "/api/tags" [GIN] 2024/04/11 - 19:08:54 | 200 | 572.393µs | 172.20.0.5 | GET "/api/tags" [GIN] 2024/04/11 - 19:08:54 | 200 | 36.179µs | 172.20.0.5 | GET "/api/version" [GIN] 2024/04/11 - 19:08:56 | 200 | 38.734µs | 172.20.0.5 | GET "/api/version" time=2024-04-11T19:09:04.802Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-04-11T19:09:04.802Z level=INFO source=gpu.go:169 msg="[nvidia-ml] NVML CUDA Compute Capability detected: 8.9" time=2024-04-11T19:09:04.802Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-04-11T19:09:04.802Z level=INFO source=gpu.go:169 msg="[nvidia-ml] NVML CUDA Compute Capability detected: 8.9" time=2024-04-11T19:09:04.802Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-04-11T19:09:04.807Z level=INFO source=dyn_ext_server.go:87 msg="Loading Dynamic llm server: /tmp/ollama859516905/runners/cuda_v11/libext_server.so" time=2024-04-11T19:09:04.807Z level=INFO source=dyn_ext_server.go:147 msg="Initializing llama server" time=2024-04-11T19:09:04.827Z level=WARN source=llm.go:170 msg="Failed to load dynamic library /tmp/ollama859516905/runners/cuda_v11/libext_server.so Unable to init GPU: no CUDA-capable device is detected" time=2024-04-11T19:09:04.828Z level=INFO source=dyn_ext_server.go:87 msg="Loading Dynamic llm server: /tmp/ollama859516905/runners/cpu_avx2/libext_server.so" time=2024-04-11T19:09:04.828Z level=INFO source=dyn_ext_server.go:147 msg="Initializing llama server" llama_model_loader: loaded meta data with 26 key-value pairs and 995 tensors from /root/.ollama/models/blobs/sha256-d68d6a65178011b746d215273d6a1f607f78be24a53532cf99618a32c2f382a2 (version GGUF V3 (latest)) # ...rest of the log ``` ### What did you expect to see? _No response_ ### Steps to reproduce _No response_ ### Are there any recent changes that introduced the issue? _No response_ ### OS Linux ### Architecture x86 ### Platform Docker ### Ollama version 0.1.31 ### GPU Nvidia ### GPU info _No response_ ### CPU AMD ### Other software _No response_
GiteaMirror added the bug label 2026-04-28 09:10:09 -05:00
Author
Owner

@ww2283 commented on GitHub (Apr 11, 2024):

lol, problem solved. I will just list it here in case someone find it useful.
The right way is to use device_ids instead of count.

services:

  ollama2:
    image: ollama/ollama:latest
    container_name: ollama_model_2
    ports:
      - '11437:11434'
    environment:
      - CUDA_VISIBLE_DEVICES=GPU-*************************
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['1']
              capabilities: [gpu]
    volumes:
      - /ADATAtmp/docker/ollama_model_2:/root/.ollama
    networks:
      - autogen_network
    restart: always
<!-- gh-comment-id:2050387340 --> @ww2283 commented on GitHub (Apr 11, 2024): lol, problem solved. I will just list it here in case someone find it useful. The right way is to use device_ids instead of count. ``` services: ollama2: image: ollama/ollama:latest container_name: ollama_model_2 ports: - '11437:11434' environment: - CUDA_VISIBLE_DEVICES=GPU-************************* deploy: resources: reservations: devices: - driver: nvidia device_ids: ['1'] capabilities: [gpu] volumes: - /ADATAtmp/docker/ollama_model_2:/root/.ollama networks: - autogen_network restart: always ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#48734