[GH-ISSUE #1094] Ambiguous state in google colab #47056

Closed
opened 2026-04-28 02:55:45 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @ArsBinarii on GitHub (Nov 11, 2023).
Original GitHub issue: https://github.com/ollama/ollama/issues/1094

Google collab T4
Installed cuda 12.3 via: https://developer.nvidia.com/cuda-downloads
now, nvidia-smi shows 12.0, but nvcc reports 12.3

run ollama via

import os
import threading
from pyngrok import ngrok
import subprocess
import time

def ollama():
    os.environ['OLLAMA_HOST'] = '0.0.0.0:11434'
    os.environ['OLLAMA_ORIGINS'] = '*'
    subprocess.Popen(["ollama", "serve"])

def ngrok_tunnel():
    # Wait for some time to ensure ollama is fully started
    time.sleep(10)
    port = "11434"
    public_url = ngrok.connect(port).public_url
    print(f" * ngrok tunnel {public_url} -> http://127.0.0.1:{port}")

def monitor_gpu():
    while True:
        print(subprocess.check_output(["nvidia-smi"]).decode("utf-8"))
        time.sleep(10)  # adjust the sleep time to your preference

# Create threads to run ollama, ngrok_tunnel, and monitor_gpu functions in the background
ollama_thread = threading.Thread(target=ollama)
ngrok_thread = threading.Thread(target=ngrok_tunnel)
gpu_monitor_thread = threading.Thread(target=monitor_gpu)

# Start the threads
ollama_thread.start()
ngrok_thread.start()
gpu_monitor_thread.start()

# Optional: To keep the Colab cell running, preventing the threads from exiting
while True:
    pass

download: wizard-vicuna-uncensored:30b via API
call a simple prompt via API

as per image it seems model is loaded in VGPU, but performance is low, top reports 99-100% CPU usage, there is some RAM usage and nvidia-smi reports 0 usage of the GPU

image
Originally created by @ArsBinarii on GitHub (Nov 11, 2023). Original GitHub issue: https://github.com/ollama/ollama/issues/1094 Google collab T4 Installed cuda 12.3 via: https://developer.nvidia.com/cuda-downloads now, nvidia-smi shows 12.0, but nvcc reports 12.3 run ollama via <pre> import os import threading from pyngrok import ngrok import subprocess import time def ollama(): os.environ['OLLAMA_HOST'] = '0.0.0.0:11434' os.environ['OLLAMA_ORIGINS'] = '*' subprocess.Popen(["ollama", "serve"]) def ngrok_tunnel(): # Wait for some time to ensure ollama is fully started time.sleep(10) port = "11434" public_url = ngrok.connect(port).public_url print(f" * ngrok tunnel {public_url} -> http://127.0.0.1:{port}") def monitor_gpu(): while True: print(subprocess.check_output(["nvidia-smi"]).decode("utf-8")) time.sleep(10) # adjust the sleep time to your preference # Create threads to run ollama, ngrok_tunnel, and monitor_gpu functions in the background ollama_thread = threading.Thread(target=ollama) ngrok_thread = threading.Thread(target=ngrok_tunnel) gpu_monitor_thread = threading.Thread(target=monitor_gpu) # Start the threads ollama_thread.start() ngrok_thread.start() gpu_monitor_thread.start() # Optional: To keep the Colab cell running, preventing the threads from exiting while True: pass </pre> download: wizard-vicuna-uncensored:30b via API call a simple prompt via API as per image it seems model is loaded in VGPU, but performance is low, top reports 99-100% CPU usage, there is some RAM usage and nvidia-smi reports 0 usage of the GPU <img width="1102" alt="image" src="https://github.com/jmorganca/ollama/assets/6293391/d3238fca-9365-412b-8e94-bc932fc21a71">
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#47056