[GH-ISSUE #2954] server fails to init GPU when accessing a model (GPU was detected during startup) #79493

Closed
opened 2026-05-09 05:36:23 -05:00 by GiteaMirror · 10 comments
Owner

Originally created by @stephen2001 on GitHub (Mar 6, 2024).
Original GitHub issue: https://github.com/ollama/ollama/issues/2954

Originally assigned to: @dhiltgen on GitHub.

I am running ollama "serve" in a docker container, this is my current dockerfile

FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04

WORKDIR /opt/ollama

RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        wget curl \
    && apt-get autoremove -y \
    && apt-get clean \
    && rm -rf /var/lib/{apt,dpkg,cache,log}/

# Download and install Ollama
RUN curl -L https://ollama.com/download/ollama-linux-amd64 -o /usr/bin/ollama && \
    chmod +x /usr/bin/ollama

ENV OLLAMA_DEBUG=1
ENV OLLAMA_HOST 0.0.0.0

EXPOSE 11434

# Set the entrypoint
ENTRYPOINT [ "/usr/bin/ollama" ]

# Default command
CMD ["serve"]

GPU is detected as expected

time=2024-03-06T14:07:32.512+01:00 level=INFO source=images.go:710 msg="total blobs: 6"
2024-03-06T13:07:32.514600912Z time=2024-03-06T14:07:32.514+01:00 level=INFO source=images.go:717 msg="total unused blobs removed: 0"
2024-03-06T13:07:32.516025518Z time=2024-03-06T14:07:32.515+01:00 level=INFO source=routes.go:1021 msg="Listening on [::]:11434 (version 0.1.28)"
2024-03-06T13:07:32.516175124Z time=2024-03-06T14:07:32.516+01:00 level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
2024-03-06T13:07:37.612422723Z time=2024-03-06T14:07:37.611+01:00 level=INFO source=payload_common.go:146 msg="Dynamic LLM libraries [cpu_avx2 cpu_avx cpu cuda_v11 rocm_v6 rocm_v5]"
2024-03-06T13:07:37.612486963Z time=2024-03-06T14:07:37.612+01:00 level=DEBUG source=payload_common.go:147 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
2024-03-06T13:07:37.612499836Z time=2024-03-06T14:07:37.612+01:00 level=INFO source=gpu.go:94 msg="Detecting GPU type"
2024-03-06T13:07:37.612510087Z time=2024-03-06T14:07:37.612+01:00 level=INFO source=gpu.go:265 msg="Searching for GPU management library libnvidia-ml.so"
2024-03-06T13:07:37.612522996Z time=2024-03-06T14:07:37.612+01:00 level=DEBUG source=gpu.go:283 msg="gpu management search paths: [/usr/local/cuda/lib64/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/libnvidia-ml.so* /usr/lib/wsl/lib/libnvidia-ml.so* /usr/lib/wsl/drivers/*/libnvidia-ml.so* /opt/cuda/lib64/libnvidia-ml.so* /usr/lib*/libnvidia-ml.so* /usr/local/lib*/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/libnvidia-ml.so* /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so* /usr/local/nvidia/lib/libnvidia-ml.so* /usr/local/nvidia/lib64/libnvidia-ml.so*]"
2024-03-06T13:07:37.614431607Z time=2024-03-06T14:07:37.614+01:00 level=INFO source=gpu.go:311 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.525.116.04]"
2024-03-06T13:07:37.614536353Z wiring nvidia management library functions in /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.525.116.04
2024-03-06T13:07:37.614565012Z dlsym: nvmlInit_v2
2024-03-06T13:07:37.614573240Z dlsym: nvmlShutdown
2024-03-06T13:07:37.614580264Z dlsym: nvmlDeviceGetHandleByIndex
2024-03-06T13:07:37.614587226Z dlsym: nvmlDeviceGetMemoryInfo
2024-03-06T13:07:37.614593900Z dlsym: nvmlDeviceGetCount_v2
2024-03-06T13:07:37.614600540Z dlsym: nvmlDeviceGetCudaComputeCapability
2024-03-06T13:07:37.614607160Z dlsym: nvmlSystemGetDriverVersion
2024-03-06T13:07:37.614613946Z dlsym: nvmlDeviceGetName
2024-03-06T13:07:37.614620480Z dlsym: nvmlDeviceGetSerial
2024-03-06T13:07:37.614627222Z dlsym: nvmlDeviceGetVbiosVersion
2024-03-06T13:07:37.614651044Z dlsym: nvmlDeviceGetBoardPartNumber
2024-03-06T13:07:37.614658210Z dlsym: nvmlDeviceGetBrand
2024-03-06T13:07:37.626543025Z CUDA driver version: 525.116.04
2024-03-06T13:07:37.626591950Z time=2024-03-06T14:07:37.626+01:00 level=INFO source=gpu.go:99 msg="Nvidia GPU detected"
2024-03-06T13:07:37.626604324Z time=2024-03-06T14:07:37.626+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
2024-03-06T13:07:37.632326226Z [0] CUDA device name: NVIDIA GeForce GTX 1080 Ti
2024-03-06T13:07:37.632367566Z [0] CUDA part number: 
2024-03-06T13:07:37.632380051Z nvmlDeviceGetSerial failed: 3
2024-03-06T13:07:37.632389911Z [0] CUDA vbios version: 86.02.39.00.22
2024-03-06T13:07:37.632399445Z [0] CUDA brand: 5
2024-03-06T13:07:37.632408716Z [0] CUDA totalMem 11811160064
2024-03-06T13:07:37.632418781Z [0] CUDA usedMem 96272384
2024-03-06T13:07:37.632428941Z time=2024-03-06T14:07:37.632+01:00 level=INFO source=gpu.go:146 msg="CUDA Compute Capability detected: 6.1"
2024-03-06T13:07:37.632439074Z time=2024-03-06T14:07:37.632+01:00 level=DEBUG source=gpu.go:254 msg="**cuda detected 1 devices with 10054M available memory**"
2024-03-06T13:07:45.128463945Z 

After pulling a model and accessing the server via API - port 11434 (via AnythingLLM), I get follwing error message, it is not able to initialize GPU and continues with CPU.

time=2024-03-06T14:12:15.108+01:00 level=DEBUG source=payload_common.go:93 msg="ordered list of LLM libraries to try [/tmp/ollama3053854002/cuda_v11/libext_server.so /tmp/ollama3053854002/cpu_avx2/libext_server.so]"
2024-03-06T13:12:15.108672805Z loading library /tmp/ollama3053854002/cuda_v11/libext_server.so
2024-03-06T13:12:15.124892811Z time=2024-03-06T14:12:15.124+01:00 level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama3053854002/cuda_v11/libext_server.so"
2024-03-06T13:12:15.124934506Z time=2024-03-06T14:12:15.124+01:00 level=INFO source=dyn_ext_server.go:150 msg="Initializing llama server"
2024-03-06T13:12:15.126816285Z [1709730735] system info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | 
2024-03-06T13:12:15.126842734Z [1709730735] Performing pre-initialization of GPU
2024-03-06T13:12:15.152439571Z time=2024-03-06T14:12:15.152+01:00 level=DEBUG source=dyn_ext_server.go:157 msg="**failure during initialization: Unable to init GPU: no CUDA-capable device is detected**"
2024-03-06T13:12:15.152469986Z time=2024-03-06T14:12:15.152+01:00 level=WARN source=llm.go:162 msg="Failed to load dynamic library /tmp/ollama3053854002/cuda_v11/libext_server.so  Unable to init GPU: no CUDA-capable device is detected"
2024-03-06T13:12:15.152476725Z loading library /tmp/ollama3053854002/cpu_avx2/libext_server.so
2024-03-06T13:12:15.154618107Z 

Any suggestion welcome. Thanks in advance

Originally created by @stephen2001 on GitHub (Mar 6, 2024). Original GitHub issue: https://github.com/ollama/ollama/issues/2954 Originally assigned to: @dhiltgen on GitHub. I am running ollama "serve" in a docker container, this is my current dockerfile ``` FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04 WORKDIR /opt/ollama RUN apt-get update \ && apt-get install -y --no-install-recommends \ wget curl \ && apt-get autoremove -y \ && apt-get clean \ && rm -rf /var/lib/{apt,dpkg,cache,log}/ # Download and install Ollama RUN curl -L https://ollama.com/download/ollama-linux-amd64 -o /usr/bin/ollama && \ chmod +x /usr/bin/ollama ENV OLLAMA_DEBUG=1 ENV OLLAMA_HOST 0.0.0.0 EXPOSE 11434 # Set the entrypoint ENTRYPOINT [ "/usr/bin/ollama" ] # Default command CMD ["serve"] ``` GPU is detected as expected ``` time=2024-03-06T14:07:32.512+01:00 level=INFO source=images.go:710 msg="total blobs: 6" 2024-03-06T13:07:32.514600912Z time=2024-03-06T14:07:32.514+01:00 level=INFO source=images.go:717 msg="total unused blobs removed: 0" 2024-03-06T13:07:32.516025518Z time=2024-03-06T14:07:32.515+01:00 level=INFO source=routes.go:1021 msg="Listening on [::]:11434 (version 0.1.28)" 2024-03-06T13:07:32.516175124Z time=2024-03-06T14:07:32.516+01:00 level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..." 2024-03-06T13:07:37.612422723Z time=2024-03-06T14:07:37.611+01:00 level=INFO source=payload_common.go:146 msg="Dynamic LLM libraries [cpu_avx2 cpu_avx cpu cuda_v11 rocm_v6 rocm_v5]" 2024-03-06T13:07:37.612486963Z time=2024-03-06T14:07:37.612+01:00 level=DEBUG source=payload_common.go:147 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" 2024-03-06T13:07:37.612499836Z time=2024-03-06T14:07:37.612+01:00 level=INFO source=gpu.go:94 msg="Detecting GPU type" 2024-03-06T13:07:37.612510087Z time=2024-03-06T14:07:37.612+01:00 level=INFO source=gpu.go:265 msg="Searching for GPU management library libnvidia-ml.so" 2024-03-06T13:07:37.612522996Z time=2024-03-06T14:07:37.612+01:00 level=DEBUG source=gpu.go:283 msg="gpu management search paths: [/usr/local/cuda/lib64/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/libnvidia-ml.so* /usr/lib/wsl/lib/libnvidia-ml.so* /usr/lib/wsl/drivers/*/libnvidia-ml.so* /opt/cuda/lib64/libnvidia-ml.so* /usr/lib*/libnvidia-ml.so* /usr/local/lib*/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/libnvidia-ml.so* /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so* /usr/local/nvidia/lib/libnvidia-ml.so* /usr/local/nvidia/lib64/libnvidia-ml.so*]" 2024-03-06T13:07:37.614431607Z time=2024-03-06T14:07:37.614+01:00 level=INFO source=gpu.go:311 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.525.116.04]" 2024-03-06T13:07:37.614536353Z wiring nvidia management library functions in /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.525.116.04 2024-03-06T13:07:37.614565012Z dlsym: nvmlInit_v2 2024-03-06T13:07:37.614573240Z dlsym: nvmlShutdown 2024-03-06T13:07:37.614580264Z dlsym: nvmlDeviceGetHandleByIndex 2024-03-06T13:07:37.614587226Z dlsym: nvmlDeviceGetMemoryInfo 2024-03-06T13:07:37.614593900Z dlsym: nvmlDeviceGetCount_v2 2024-03-06T13:07:37.614600540Z dlsym: nvmlDeviceGetCudaComputeCapability 2024-03-06T13:07:37.614607160Z dlsym: nvmlSystemGetDriverVersion 2024-03-06T13:07:37.614613946Z dlsym: nvmlDeviceGetName 2024-03-06T13:07:37.614620480Z dlsym: nvmlDeviceGetSerial 2024-03-06T13:07:37.614627222Z dlsym: nvmlDeviceGetVbiosVersion 2024-03-06T13:07:37.614651044Z dlsym: nvmlDeviceGetBoardPartNumber 2024-03-06T13:07:37.614658210Z dlsym: nvmlDeviceGetBrand 2024-03-06T13:07:37.626543025Z CUDA driver version: 525.116.04 2024-03-06T13:07:37.626591950Z time=2024-03-06T14:07:37.626+01:00 level=INFO source=gpu.go:99 msg="Nvidia GPU detected" 2024-03-06T13:07:37.626604324Z time=2024-03-06T14:07:37.626+01:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" 2024-03-06T13:07:37.632326226Z [0] CUDA device name: NVIDIA GeForce GTX 1080 Ti 2024-03-06T13:07:37.632367566Z [0] CUDA part number: 2024-03-06T13:07:37.632380051Z nvmlDeviceGetSerial failed: 3 2024-03-06T13:07:37.632389911Z [0] CUDA vbios version: 86.02.39.00.22 2024-03-06T13:07:37.632399445Z [0] CUDA brand: 5 2024-03-06T13:07:37.632408716Z [0] CUDA totalMem 11811160064 2024-03-06T13:07:37.632418781Z [0] CUDA usedMem 96272384 2024-03-06T13:07:37.632428941Z time=2024-03-06T14:07:37.632+01:00 level=INFO source=gpu.go:146 msg="CUDA Compute Capability detected: 6.1" 2024-03-06T13:07:37.632439074Z time=2024-03-06T14:07:37.632+01:00 level=DEBUG source=gpu.go:254 msg="**cuda detected 1 devices with 10054M available memory**" 2024-03-06T13:07:45.128463945Z ``` After pulling a model and accessing the server via API - port 11434 (via AnythingLLM), I get follwing error message, it is not able to initialize GPU and continues with CPU. ``` time=2024-03-06T14:12:15.108+01:00 level=DEBUG source=payload_common.go:93 msg="ordered list of LLM libraries to try [/tmp/ollama3053854002/cuda_v11/libext_server.so /tmp/ollama3053854002/cpu_avx2/libext_server.so]" 2024-03-06T13:12:15.108672805Z loading library /tmp/ollama3053854002/cuda_v11/libext_server.so 2024-03-06T13:12:15.124892811Z time=2024-03-06T14:12:15.124+01:00 level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /tmp/ollama3053854002/cuda_v11/libext_server.so" 2024-03-06T13:12:15.124934506Z time=2024-03-06T14:12:15.124+01:00 level=INFO source=dyn_ext_server.go:150 msg="Initializing llama server" 2024-03-06T13:12:15.126816285Z [1709730735] system info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | 2024-03-06T13:12:15.126842734Z [1709730735] Performing pre-initialization of GPU 2024-03-06T13:12:15.152439571Z time=2024-03-06T14:12:15.152+01:00 level=DEBUG source=dyn_ext_server.go:157 msg="**failure during initialization: Unable to init GPU: no CUDA-capable device is detected**" 2024-03-06T13:12:15.152469986Z time=2024-03-06T14:12:15.152+01:00 level=WARN source=llm.go:162 msg="Failed to load dynamic library /tmp/ollama3053854002/cuda_v11/libext_server.so Unable to init GPU: no CUDA-capable device is detected" 2024-03-06T13:12:15.152476725Z loading library /tmp/ollama3053854002/cpu_avx2/libext_server.so 2024-03-06T13:12:15.154618107Z ``` Any suggestion welcome. Thanks in advance
Author
Owner

@dhiltgen commented on GitHub (Mar 6, 2024):

Were you unable to get our official container image to work?

https://hub.docker.com/r/ollama/ollama

<!-- gh-comment-id:1981247314 --> @dhiltgen commented on GitHub (Mar 6, 2024): Were you unable to get our official container image to work? https://hub.docker.com/r/ollama/ollama
Author
Owner

@stephen2001 commented on GitHub (Mar 6, 2024):

Yes, I tried the official image, it has the same problem, but was more difficult for me to setup. I needed to install cuda inside of container. Your docker image works well directly on a machine, but as a container it has some issues. The dockerfile above I copied from another issue resolution.

<!-- gh-comment-id:1981413259 --> @stephen2001 commented on GitHub (Mar 6, 2024): Yes, I tried the official image, it has the same problem, but was more difficult for me to setup. I needed to install cuda inside of container. Your docker image works well directly on a machine, but as a container it has some issues. The dockerfile above I copied from another issue resolution.
Author
Owner

@dhiltgen commented on GitHub (Mar 6, 2024):

We build with CUDA v11.3.1 to maximize compatibility, so maybe switching your base layer would help?

If you exec into the container, does nvidia-smi report the GPU?

What host OS are you running? What container runtime?

<!-- gh-comment-id:1981863549 --> @dhiltgen commented on GitHub (Mar 6, 2024): We build with CUDA v11.3.1 to maximize compatibility, so maybe switching your base layer would help? If you exec into the container, does `nvidia-smi` report the GPU? What host OS are you running? What container runtime?
Author
Owner

@stephen2001 commented on GitHub (Mar 7, 2024):

I will try with 11.3.
This is what SMI reports
image
host os: Ubuntu 22.04.3 LTS
docker version 24.0.5, build ced0996
cannot change the server too much, since it is a gpu server that I share with others

<!-- gh-comment-id:1982880331 --> @stephen2001 commented on GitHub (Mar 7, 2024): I will try with 11.3. This is what SMI reports ![image](https://github.com/ollama/ollama/assets/23563824/4c8dc4f8-3897-4878-b9de-6e22e3e50d69) host os: Ubuntu 22.04.3 LTS docker version 24.0.5, build ced0996 cannot change the server too much, since it is a gpu server that I share with others
Author
Owner

@stephen2001 commented on GitHub (Mar 7, 2024):

tried with cuda 11.3, -- no difference

<!-- gh-comment-id:1983698298 --> @stephen2001 commented on GitHub (Mar 7, 2024): tried with cuda 11.3, -- no difference
Author
Owner

@dhiltgen commented on GitHub (Mar 20, 2024):

Can you capture the output from the following so we can try to get to the bottom of why it's not able to detect the GPU?

docker run --rm -it  --gpus=all -p 11434:11434 -e OLLAMA_DEBUG=1 ollama/ollama:0.1.29

Then just try to load a small model so we can see what happens during the GPU discovery.

<!-- gh-comment-id:2010017945 --> @dhiltgen commented on GitHub (Mar 20, 2024): Can you capture the output from the following so we can try to get to the bottom of why it's not able to detect the GPU? ``` docker run --rm -it --gpus=all -p 11434:11434 -e OLLAMA_DEBUG=1 ollama/ollama:0.1.29 ``` Then just try to load a small model so we can see what happens during the GPU discovery.
Author
Owner

@stephen2001 commented on GitHub (Mar 21, 2024):

Many thanks.
I found a way to run original docker image directly on server behind firewall. It is a bit hacky, but it looks like using GPU now. Need to get GPU device usage fixed (cannot use --gpus=all, but only dedicated device).
Will check and let you know

<!-- gh-comment-id:2012835614 --> @stephen2001 commented on GitHub (Mar 21, 2024): Many thanks. I found a way to run original docker image directly on server behind firewall. It is a bit hacky, but it looks like using GPU now. Need to get GPU device usage fixed (cannot use --gpus=all, but only dedicated device). Will check and let you know
Author
Owner

@vakaobr commented on GitHub (Mar 28, 2024):

Hey @stephen2001 can you please advise how to do it ? I'm facing the same issue when trying to run this as a container

<!-- gh-comment-id:2024766378 --> @vakaobr commented on GitHub (Mar 28, 2024): Hey @stephen2001 can you please advise how to do it ? I'm facing the same issue when trying to run this as a container
Author
Owner

@stephen2001 commented on GitHub (Apr 8, 2024):

Hi @vakaobr,
a) I have translated
docker run --rm -it --gpus=all -p 11434:11434 -e OLLAMA_DEBUG=1 ollama/ollama:0.1.29
into a run configuration in PyCharm that I am using to directly create a docker container on my machine

b) once container is running you can create a terminal (or attach to the docker container)

c) I needed to download / update certificates to get pull of models running

d) there seems to be a bug in ollama that requires to unset http_proxy and HTTP_PROXY environment variables. Of course https_proxy and HTTPS_PROXY need to be existing

e) now you can pull / run models

hope this helps

<!-- gh-comment-id:2042846771 --> @stephen2001 commented on GitHub (Apr 8, 2024): Hi @vakaobr, a) I have translated docker run --rm -it --gpus=all -p 11434:11434 -e OLLAMA_DEBUG=1 ollama/ollama:0.1.29 into a run configuration in PyCharm that I am using to directly create a docker container on my machine b) once container is running you can create a terminal (or attach to the docker container) c) I needed to download / update certificates to get pull of models running d) there seems to be a bug in ollama that requires to unset http_proxy and HTTP_PROXY environment variables. Of course https_proxy and HTTPS_PROXY need to be existing e) now you can pull / run models hope this helps
Author
Owner

@dhiltgen commented on GitHub (Apr 15, 2024):

Glad to hear you got it working! Sounds like we can close this issue now.

<!-- gh-comment-id:2057928859 --> @dhiltgen commented on GitHub (Apr 15, 2024): Glad to hear you got it working! Sounds like we can close this issue now.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#79493