[GH-ISSUE #2716] Official image does not detect GPU #27389

Closed
opened 2026-04-22 04:42:42 -05:00 by GiteaMirror · 6 comments
Owner

Originally created by @Deer-Canidae on GitHub (Feb 23, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2716

Originally assigned to: @dhiltgen on GitHub.

I was trying to run Ollama in a container using podman and pulled the official image from DockerHub.

podman run --rm -it --security-opt label=disable --gpus=all ollama

But I was met with the following log announcing that my GPU was not detected

level=INFO source=images.go:710 msg="total blobs: 0"
level=INFO source=images.go:717 msg="total unused blobs removed: 0"
level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.27)"
level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
level=INFO source=payload_common.go:146 msg="Dynamic LLM libraries [cpu rocm_v5 cuda_v11 rocm_v6 cpu_avx cpu_avx2]"
level=INFO source=gpu.go:94 msg="Detecting GPU type"
level=INFO source=gpu.go:265 msg="Searching for GPU management library libnvidia-ml.so"
level=INFO source=gpu.go:311 msg="Discovered GPU libraries: []"
level=INFO source=gpu.go:265 msg="Searching for GPU management library librocm_smi64.so"
level=INFO source=gpu.go:311 msg="Discovered GPU libraries: []"
level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
level=INFO source=routes.go:1042 msg="no GPU detected

I tried to track down any issue resulting from my improper use of the tool and finally decided to give a shot at building my own ollama image myself see if the issue was replicable.

FROM nvidia/cuda:12.3.1-base-rockylinux9

WORKDIR /opt/ollama
RUN dnf up --refresh -y

RUN curl -L https://ollama.com/download/ollama-linux-amd64 -o /usr/bin/ollama
RUN chmod +x /usr/bin/ollama

ENTRYPOINT [ "/usr/bin/ollama" ]
CMD ["serve"]

Given this Dockerfile I built an image and ran it with the exact same arguments as the official image

podman run --rm -it --security-opt label=disable --gpus=all llm-base

And was met with the following logs

level=INFO source=images.go:710 msg="total blobs: 0"
level=INFO source=images.go:717 msg="total unused blobs removed: 0"
level=INFO source=routes.go:1019 msg="Listening on 127.0.0.1:11434 (version 0.1.27)"
level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
level=INFO source=payload_common.go:146 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 rocm_v5 rocm_v6 cuda_v11]"
level=INFO source=gpu.go:94 msg="Detecting GPU type"
level=INFO source=gpu.go:265 msg="Searching for GPU management library libnvidia-ml.so"
level=INFO source=gpu.go:311 msg="Discovered GPU libraries: [/usr/lib64/libnvidia-ml.so.545.29.06]"
level=INFO source=gpu.go:99 msg="Nvidia GPU detected"
level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
level=INFO source=gpu.go:146 msg="CUDA Compute Capability detected: 8.6

It seems at first glance that the problem comes from the Ollama image itself since the GPU can be detected using Ollama over Nvidia's CUDA images.

If it's any help, I run an RTX 3050Ti mobile GPU on Fedora 39

Originally created by @Deer-Canidae on GitHub (Feb 23, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2716 Originally assigned to: @dhiltgen on GitHub. I was trying to run Ollama in a container using podman and pulled the official image from DockerHub. ```shell podman run --rm -it --security-opt label=disable --gpus=all ollama ``` But I was met with the following log announcing that my GPU was not detected ``` level=INFO source=images.go:710 msg="total blobs: 0" level=INFO source=images.go:717 msg="total unused blobs removed: 0" level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.27)" level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." level=INFO source=payload_common.go:146 msg="Dynamic LLM libraries [cpu rocm_v5 cuda_v11 rocm_v6 cpu_avx cpu_avx2]" level=INFO source=gpu.go:94 msg="Detecting GPU type" level=INFO source=gpu.go:265 msg="Searching for GPU management library libnvidia-ml.so" level=INFO source=gpu.go:311 msg="Discovered GPU libraries: []" level=INFO source=gpu.go:265 msg="Searching for GPU management library librocm_smi64.so" level=INFO source=gpu.go:311 msg="Discovered GPU libraries: []" level=INFO source=cpu_common.go:11 msg="CPU has AVX2" level=INFO source=routes.go:1042 msg="no GPU detected ``` I tried to track down any issue resulting from my improper use of the tool and finally decided to give a shot at building my own ollama image myself see if the issue was replicable. ```Dockerfile FROM nvidia/cuda:12.3.1-base-rockylinux9 WORKDIR /opt/ollama RUN dnf up --refresh -y RUN curl -L https://ollama.com/download/ollama-linux-amd64 -o /usr/bin/ollama RUN chmod +x /usr/bin/ollama ENTRYPOINT [ "/usr/bin/ollama" ] CMD ["serve"] ``` Given this Dockerfile I built an image and ran it with the exact same arguments as the official image ```shell podman run --rm -it --security-opt label=disable --gpus=all llm-base ``` And was met with the following logs ``` level=INFO source=images.go:710 msg="total blobs: 0" level=INFO source=images.go:717 msg="total unused blobs removed: 0" level=INFO source=routes.go:1019 msg="Listening on 127.0.0.1:11434 (version 0.1.27)" level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." level=INFO source=payload_common.go:146 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 rocm_v5 rocm_v6 cuda_v11]" level=INFO source=gpu.go:94 msg="Detecting GPU type" level=INFO source=gpu.go:265 msg="Searching for GPU management library libnvidia-ml.so" level=INFO source=gpu.go:311 msg="Discovered GPU libraries: [/usr/lib64/libnvidia-ml.so.545.29.06]" level=INFO source=gpu.go:99 msg="Nvidia GPU detected" level=INFO source=cpu_common.go:11 msg="CPU has AVX2" level=INFO source=gpu.go:146 msg="CUDA Compute Capability detected: 8.6 ``` It seems at first glance that the problem comes from the Ollama image itself since the GPU can be detected using Ollama over Nvidia's CUDA images. If it's any help, I run an RTX 3050Ti mobile GPU on Fedora 39
Author
Owner

@Biosias commented on GitHub (Feb 24, 2024):

I've encoutered the same problem on Debian 12 with NVIDIA GeForce GTX 1060 6GB

NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0

Docker version 20.10.24+dfsg1, build 297e128

<!-- gh-comment-id:1962351857 --> @Biosias commented on GitHub (Feb 24, 2024): I've encoutered the same problem on Debian 12 with NVIDIA GeForce GTX 1060 6GB NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 Docker version 20.10.24+dfsg1, build 297e128
Author
Owner

@evalladaresv commented on GitHub (Feb 24, 2024):

I'm having the same issue. I have a RTX 3050 Ti too

<!-- gh-comment-id:1962353178 --> @evalladaresv commented on GitHub (Feb 24, 2024): I'm having the same issue. I have a RTX 3050 Ti too
Author
Owner

@Deer-Canidae commented on GitHub (Feb 26, 2024):

Well in the meantime I built a quickfix image if anyone is interested. It works on my side, no guarantees though.

<!-- gh-comment-id:1963116429 --> @Deer-Canidae commented on GitHub (Feb 26, 2024): Well in the meantime I built a [quickfix image](https://github.com/Deer-Canidae/ollama-gpu-fix) if anyone is interested. It works on my side, no guarantees though.
Author
Owner

@dhiltgen commented on GitHub (Feb 27, 2024):

There may be a subtle difference between podman and docker with mapping the host libraries into these containers. I have a feeling it may be some env var. @Deer-Canidae could you try running the ollama image and specify these variables on the command line to see if that changes the behavior?

NVARCH=x86_64
NV_CUDA_CUDART_VERSION=11.3.1
NVIDIA_VISIBLE_DEVICES=all
<!-- gh-comment-id:1967437405 --> @dhiltgen commented on GitHub (Feb 27, 2024): There may be a subtle difference between podman and docker with mapping the host libraries into these containers. I have a feeling it may be some env var. @Deer-Canidae could you try running the ollama image and specify these variables on the command line to see if that changes the behavior? ``` NVARCH=x86_64 NV_CUDA_CUDART_VERSION=11.3.1 NVIDIA_VISIBLE_DEVICES=all ```
Author
Owner

@Deer-Canidae commented on GitHub (Feb 29, 2024):

There may be a subtle difference between podman and docker with mapping the host libraries into these containers. I have a feeling it may be some env var. @Deer-Canidae could you try running the ollama image and specify these variables on the command line to see if that changes the behavior?

NVARCH=x86_64
NV_CUDA_CUDART_VERSION=11.3.1
NVIDIA_VISIBLE_DEVICES=all

running

podman run -it --rm --security-opt label=disable -e NVARCH=x86_64 -e NV_CUDA_CUDART_VERSION=11.3.1 -e NVIDIA_VISIBLE_DEVICES=all --gpus=all ollama/ollama

Seems to get the GPU detected properly

level=INFO source=gpu.go:99 msg="Nvidia GPU detected"
level=INFO source=gpu.go:146 msg="CUDA Compute Capability detected: 8.6"

Note: After further testing it seems that only the NVIDIA_VISIBLE_DEVICES=all envar is required to get it detected

<!-- gh-comment-id:1971244200 --> @Deer-Canidae commented on GitHub (Feb 29, 2024): > There may be a subtle difference between podman and docker with mapping the host libraries into these containers. I have a feeling it may be some env var. @Deer-Canidae could you try running the ollama image and specify these variables on the command line to see if that changes the behavior? > > ``` > NVARCH=x86_64 > NV_CUDA_CUDART_VERSION=11.3.1 > NVIDIA_VISIBLE_DEVICES=all > ``` running ```shell podman run -it --rm --security-opt label=disable -e NVARCH=x86_64 -e NV_CUDA_CUDART_VERSION=11.3.1 -e NVIDIA_VISIBLE_DEVICES=all --gpus=all ollama/ollama ``` Seems to get the GPU detected properly ``` level=INFO source=gpu.go:99 msg="Nvidia GPU detected" level=INFO source=gpu.go:146 msg="CUDA Compute Capability detected: 8.6" ``` Note: After further testing it seems that only the `NVIDIA_VISIBLE_DEVICES=all` envar is required to get it detected
Author
Owner

@mcc311 commented on GitHub (Feb 29, 2024):

Thanks to @Deer-Canidae very much! You solve my problem so well!

At first, I tried your dockerfile, and then it indeed show the GPU detected successfully. But then,
I use GTX 1080, and when I execute

podman exec -it ollama ollama run zephyr

And then check the logs,

time=2024-02-29T19:47:00.937+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-02-29T19:47:00.945+08:00 level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama548365210/cuda_v11/libext_server.so"
time=2024-02-29T19:47:00.945+08:00 level=INFO source=dyn_ext_server.go:150 msg="Initializing llama server"
time=2024-02-29T19:47:00.953+08:00 level=WARN source=llm.go:162 msg="Failed to load dynamic library /tmp/ollama548365210/cuda_v11/libext_server.so  Unable to init GPU: forward compatibility was attempted on non supported HW"

it shows the cuda Error: forward compatibility was attempted on non supported HW
And I search for so many pages, find that cuda version cannot be too high. So later, I change the Dockerfile to this:

# Use the CentOS 7 based CUDA image
FROM nvidia/cuda:11.3.1-base-centos7

# Set the working directory
WORKDIR /opt/ollama

# Update the system and install necessary packages. CentOS 7 uses yum.
# Note: The '--nogpgcheck' option can be used if you face GPG key issues, but it's better to resolve these properly.
RUN yum update -y && \
    yum install -y curl

# Download and install Ollama
RUN curl -L https://ollama.com/download/ollama-linux-amd64 -o /usr/bin/ollama && \
    chmod +x /usr/bin/ollama

# Set the entrypoint
ENTRYPOINT [ "/usr/bin/ollama" ]

# Default command
CMD ["serve"]

And it work for me. Hope it can help others!

<!-- gh-comment-id:1971502708 --> @mcc311 commented on GitHub (Feb 29, 2024): Thanks to @Deer-Canidae very much! You solve my problem so well! At first, I tried your dockerfile, and then it indeed show the GPU detected successfully. But then, I use GTX 1080, and when I execute ``` podman exec -it ollama ollama run zephyr ``` And then check the logs, ``` time=2024-02-29T19:47:00.937+08:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-02-29T19:47:00.945+08:00 level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama548365210/cuda_v11/libext_server.so" time=2024-02-29T19:47:00.945+08:00 level=INFO source=dyn_ext_server.go:150 msg="Initializing llama server" time=2024-02-29T19:47:00.953+08:00 level=WARN source=llm.go:162 msg="Failed to load dynamic library /tmp/ollama548365210/cuda_v11/libext_server.so Unable to init GPU: forward compatibility was attempted on non supported HW" ``` it shows the cuda Error: forward compatibility was attempted on non supported HW And I search for so many pages, find that cuda version cannot be too high. So later, I change the Dockerfile to this: ``` # Use the CentOS 7 based CUDA image FROM nvidia/cuda:11.3.1-base-centos7 # Set the working directory WORKDIR /opt/ollama # Update the system and install necessary packages. CentOS 7 uses yum. # Note: The '--nogpgcheck' option can be used if you face GPG key issues, but it's better to resolve these properly. RUN yum update -y && \ yum install -y curl # Download and install Ollama RUN curl -L https://ollama.com/download/ollama-linux-amd64 -o /usr/bin/ollama && \ chmod +x /usr/bin/ollama # Set the entrypoint ENTRYPOINT [ "/usr/bin/ollama" ] # Default command CMD ["serve"] ``` And it work for me. Hope it can help others!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#27389