[GH-ISSUE #758] colab Nvidia driver not detected #62396

Closed
opened 2026-05-03 08:47:02 -05:00 by GiteaMirror · 12 comments
Owner

Originally created by @wifiuk on GitHub (Oct 11, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/758

I am testing using ollama in a collab, and its not using the GPU at all

and we can see that the GPU is there.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   62C    P8    11W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

How can we make ollama force use it, as it clearly isn't using it at all.

Originally created by @wifiuk on GitHub (Oct 11, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/758 I am testing using ollama in a collab, and its not using the GPU at all and we can see that the GPU is there. ``` +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 | | N/A 62C P8 11W / 70W | 0MiB / 15360MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ ``` How can we make ollama force use it, as it clearly isn't using it at all.
GiteaMirror added the help wantedbug labels 2026-05-03 08:47:02 -05:00
Author
Owner

@mchiang0610 commented on GitHub (Oct 11, 2023):

@wifiuk how are you running this in the colab? Still learning colab on my end, so your help on providing more info will be greatly appreciated

<!-- gh-comment-id:1758309737 --> @mchiang0610 commented on GitHub (Oct 11, 2023): @wifiuk how are you running this in the colab? Still learning colab on my end, so your help on providing more info will be greatly appreciated
Author
Owner

@wifiuk commented on GitHub (Oct 11, 2023):

image

import os
import threading
from pyngrok import ngrok
import subprocess
import time

def ollama():
    os.environ['OLLAMA_HOST'] = '0.0.0.0:11434'
    os.environ['OLLAMA_ORIGINS'] = '*'
    subprocess.Popen(["ollama", "serve"])

def ngrok_tunnel():
    # Wait for some time to ensure ollama is fully started
    time.sleep(10)
    port = "11434"
    public_url = ngrok.connect(port).public_url
    print(f" * ngrok tunnel {public_url} -> http://127.0.0.1:{port}")

def monitor_gpu():
    while True:
        print(subprocess.check_output(["nvidia-smi"]).decode("utf-8"))
        time.sleep(10)  # adjust the sleep time to your preference

# Create threads to run ollama, ngrok_tunnel, and monitor_gpu functions in the background
ollama_thread = threading.Thread(target=ollama)
ngrok_thread = threading.Thread(target=ngrok_tunnel)
gpu_monitor_thread = threading.Thread(target=monitor_gpu)

# Start the threads
ollama_thread.start()
ngrok_thread.start()
gpu_monitor_thread.start()

# Optional: To keep the Colab cell running, preventing the threads from exiting
while True:
    pass

image

image

image

<!-- gh-comment-id:1758331808 --> @wifiuk commented on GitHub (Oct 11, 2023): ![image](https://github.com/jmorganca/ollama/assets/3785545/b046a2e0-17fc-4708-9247-cb16c6c29785) ``` import os import threading from pyngrok import ngrok import subprocess import time def ollama(): os.environ['OLLAMA_HOST'] = '0.0.0.0:11434' os.environ['OLLAMA_ORIGINS'] = '*' subprocess.Popen(["ollama", "serve"]) def ngrok_tunnel(): # Wait for some time to ensure ollama is fully started time.sleep(10) port = "11434" public_url = ngrok.connect(port).public_url print(f" * ngrok tunnel {public_url} -> http://127.0.0.1:{port}") def monitor_gpu(): while True: print(subprocess.check_output(["nvidia-smi"]).decode("utf-8")) time.sleep(10) # adjust the sleep time to your preference # Create threads to run ollama, ngrok_tunnel, and monitor_gpu functions in the background ollama_thread = threading.Thread(target=ollama) ngrok_thread = threading.Thread(target=ngrok_tunnel) gpu_monitor_thread = threading.Thread(target=monitor_gpu) # Start the threads ollama_thread.start() ngrok_thread.start() gpu_monitor_thread.start() # Optional: To keep the Colab cell running, preventing the threads from exiting while True: pass ``` ![image](https://github.com/jmorganca/ollama/assets/3785545/59996f2c-0cbb-4769-92b8-9f528eac5c0e) ![image](https://github.com/jmorganca/ollama/assets/3785545/6a0aef7d-c9c3-42c7-8f02-688913709e07) ![image](https://github.com/jmorganca/ollama/assets/3785545/5e48fec7-370c-453f-97d3-2d7aa4819437)
Author
Owner

@wifiuk commented on GitHub (Oct 11, 2023):

just need to keep pulling the model each time it resets, might add that into the script later, but for now need to get the GPU working or its useless. and there isn't much documentation on it

<!-- gh-comment-id:1758335764 --> @wifiuk commented on GitHub (Oct 11, 2023): just need to keep pulling the model each time it resets, might add that into the script later, but for now need to get the GPU working or its useless. and there isn't much documentation on it
Author
Owner

@mxyng commented on GitHub (Oct 11, 2023):

Keep in mind ollama does not use GPU until a model is loaded which happens on the first generate request. It's expected that nvidia-smi shows no processes claiming VRAM when ollama first starts.

<!-- gh-comment-id:1758590685 --> @mxyng commented on GitHub (Oct 11, 2023): Keep in mind ollama does not use GPU until a model is loaded which happens on the first generate request. It's expected that `nvidia-smi` shows no processes claiming VRAM when ollama first starts.
Author
Owner

@wifiuk commented on GitHub (Oct 11, 2023):

No I have it running on a loop so even when making requests it's shows no GPU usage

<!-- gh-comment-id:1758627253 --> @wifiuk commented on GitHub (Oct 11, 2023): No I have it running on a loop so even when making requests it's shows no GPU usage
Author
Owner

@Syulin7 commented on GitHub (Oct 12, 2023):

I have the same issue on linux and docker, it appears that ollma is not using the CUDA image.
92578798bb/Dockerfile (L17)

<!-- gh-comment-id:1759055350 --> @Syulin7 commented on GitHub (Oct 12, 2023): I have the same issue on linux and docker, it appears that ollma is not using the CUDA image. https://github.com/jmorganca/ollama/blob/92578798bb1abcedd6bc99479d804f32d9ee2f6c/Dockerfile#L17
Author
Owner

@Syulin7 commented on GitHub (Oct 12, 2023):

@wifiuk @mxyng I resolved the issue by replacing the base image.

92578798bb/Dockerfile (L17-L23)

change ubuntu:22.04 to nvidia/cuda:11.8.0-devel-ubuntu22.04

and then it works
image

Perhaps we can build a GPU image and push it to the community, using the "gpu" tag for differentiation.

<!-- gh-comment-id:1759095030 --> @Syulin7 commented on GitHub (Oct 12, 2023): @wifiuk @mxyng I resolved the issue by replacing the base image. https://github.com/jmorganca/ollama/blob/92578798bb1abcedd6bc99479d804f32d9ee2f6c/Dockerfile#L17-L23 change ubuntu:22.04 to nvidia/cuda:11.8.0-devel-ubuntu22.04 and then it works ![image](https://github.com/jmorganca/ollama/assets/37265556/52f7f99a-2533-4069-b700-7a738f03c7b4) Perhaps we can build a GPU image and push it to the community, using the "gpu" tag for differentiation.
Author
Owner

@mxyng commented on GitHub (Oct 13, 2023):

@Syulin7 can you create a new issue? Your comments don't seem related to the original issue since the original issues uses the Linux install script, not Docker

<!-- gh-comment-id:1762388650 --> @mxyng commented on GitHub (Oct 13, 2023): @Syulin7 can you create a new issue? Your comments don't seem related to the original issue since the original issues uses the Linux install script, not Docker
Author
Owner

@Syulin7 commented on GitHub (Oct 16, 2023):

@Syulin7 can you create a new issue? Your comments don't seem related to the original issue since the original issues uses the Linux install script, not Docker

I create a new issue: https://github.com/jmorganca/ollama/issues/797

<!-- gh-comment-id:1764087207 --> @Syulin7 commented on GitHub (Oct 16, 2023): > @Syulin7 can you create a new issue? Your comments don't seem related to the original issue since the original issues uses the Linux install script, not Docker I create a new issue: https://github.com/jmorganca/ollama/issues/797
Author
Owner

@mxyng commented on GitHub (Oct 19, 2023):

I've confirmed Ollama doesn't use GPU by default in Colab's hosted runtime, at least for the T4 instance. It's possible to update the system and upgrade CUDA drivers by adding this line when installing or before starting Ollama:

!sudo apt-get update && sudo apt-get install -y cuda-drivers
<!-- gh-comment-id:1771767427 --> @mxyng commented on GitHub (Oct 19, 2023): I've confirmed Ollama doesn't use GPU by default in Colab's hosted runtime, at least for the T4 instance. It's possible to update the system and upgrade CUDA drivers by adding this line when installing or before starting Ollama: ``` !sudo apt-get update && sudo apt-get install -y cuda-drivers ```
Author
Owner

@mxyng commented on GitHub (Nov 12, 2023):

Since 0.1.9, you can now specify LD_LIBRARY_PATH to let Ollama use the system Nvidia library. An example notebook can be found in https://github.com/jmorganca/ollama/pull/1104

<!-- gh-comment-id:1807261771 --> @mxyng commented on GitHub (Nov 12, 2023): Since 0.1.9, you can now specify LD_LIBRARY_PATH to let Ollama use the system Nvidia library. An example notebook can be found in https://github.com/jmorganca/ollama/pull/1104
Author
Owner

@raffaelemancuso commented on GitHub (Jun 10, 2025):

Since 0.1.9, you can now specify LD_LIBRARY_PATH to let Ollama use the system Nvidia library. An example notebook can be found in #1104

That example notebook is not available anymore.

Here is how I set it up:

!echo 'debconf debconf/frontend select Noninteractive' | sudo debconf-set-selections

!sudo apt-get update && sudo apt-get install -y cuda-drivers lshw

!ollama --version
!status=$?
![[ ${status} -eq 0 ]] && echo "Ollama found." && echo "Not Installing"
![[ ${status} -ne 0 ]] && echo "Ollama not found." && echo "Installing" && curl https://ollama.ai/install.sh && sh install.sh

import os

# Set LD_LIBRARY_PATH so the system NVIDIA library
os.environ.update({'LD_LIBRARY_PATH': '/usr/lib64-nvidia'})

!nohup ollama serve &

!ollama pull deepseek-r1:8b

!pip install --upgrade ollama
<!-- gh-comment-id:2958041675 --> @raffaelemancuso commented on GitHub (Jun 10, 2025): > Since 0.1.9, you can now specify LD_LIBRARY_PATH to let Ollama use the system Nvidia library. An example notebook can be found in [#1104](https://github.com/ollama/ollama/pull/1104) That example notebook is not available anymore. Here is how I set it up: ``` !echo 'debconf debconf/frontend select Noninteractive' | sudo debconf-set-selections !sudo apt-get update && sudo apt-get install -y cuda-drivers lshw !ollama --version !status=$? ![[ ${status} -eq 0 ]] && echo "Ollama found." && echo "Not Installing" ![[ ${status} -ne 0 ]] && echo "Ollama not found." && echo "Installing" && curl https://ollama.ai/install.sh && sh install.sh import os # Set LD_LIBRARY_PATH so the system NVIDIA library os.environ.update({'LD_LIBRARY_PATH': '/usr/lib64-nvidia'}) !nohup ollama serve & !ollama pull deepseek-r1:8b !pip install --upgrade ollama ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#62396