[PR #1834] [MERGED] Detect very old CUDA GPUs and fall back to CPU #10699

Closed
opened 2026-04-12 23:07:59 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/1834
Author: @dhiltgen
Created: 1/7/2024
Status: Merged
Merged: 1/7/2024
Merged by: @dhiltgen

Base: mainHead: old_cuda


📝 Commits (1)

  • d74ce6b Detect very old CUDA GPUs and fall back to CPU

📊 Changes

3 files changed (+74 additions, -2 deletions)

View changed files

📝 gpu/gpu.go (+15 -1)
📝 gpu/gpu_info_cuda.c (+51 -1)
📝 gpu/gpu_info_cuda.h (+8 -0)

📄 Description

If we try to load the CUDA library on an old GPU, it panics and crashes the server. This checks the compute capability before we load the library so we can gracefully fall back to CPU mode.

Prior to version 0.1.18, the fallback behavior resulted from the subprocess runner crashing. Example from an old GPU:

ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 765M, compute capability 3.0

cuBLAS error 3 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:5552
current device: 0
2024/01/06 21:48:54 llama.go:320: llama runner exited with error: exit status 1

In 0.1.18 without this fix, the server crashes with a panic due to the assert in llama.cpp.

With this fix on the same system we now detect and fallback to CPU mode:

2024/01/06 21:52:17 shim_ext_server.go:142: Dynamic LLM variants [cuda rocm]
2024/01/06 21:52:17 gpu.go:37: Detecting GPU type
2024/01/06 21:52:17 gpu.go:56: Nvidia GPU detected
2024/01/06 21:52:17 gpu.go:89: CUDA GPU is too old. Falling back to CPU mode. Compute Capability detected: 3.0
2024/01/06 21:52:17 routes.go:953: no GPU detected
...

Example output on a newer supported GPU which correctly runs on the GPU:

2024/01/06 21:55:11 shim_ext_server.go:142: Dynamic LLM variants [cuda rocm]
2024/01/06 21:55:11 gpu.go:37: Detecting GPU type
2024/01/06 21:55:11 gpu.go:56: Nvidia GPU detected
2024/01/06 21:55:11 gpu.go:86: CUDA Compute Capability detected: 7.5

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/1834 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 1/7/2024 **Status:** ✅ Merged **Merged:** 1/7/2024 **Merged by:** [@dhiltgen](https://github.com/dhiltgen) **Base:** `main` ← **Head:** `old_cuda` --- ### 📝 Commits (1) - [`d74ce6b`](https://github.com/ollama/ollama/commit/d74ce6bd4f78f8a06c97bf9b24485211c48a41d8) Detect very old CUDA GPUs and fall back to CPU ### 📊 Changes **3 files changed** (+74 additions, -2 deletions) <details> <summary>View changed files</summary> 📝 `gpu/gpu.go` (+15 -1) 📝 `gpu/gpu_info_cuda.c` (+51 -1) 📝 `gpu/gpu_info_cuda.h` (+8 -0) </details> ### 📄 Description If we try to load the CUDA library on an old GPU, it panics and crashes the server. This checks the compute capability before we load the library so we can gracefully fall back to CPU mode. Prior to version 0.1.18, the fallback behavior resulted from the subprocess runner crashing. Example from an old GPU: ``` ggml_init_cublas: found 1 CUDA devices: Device 0: NVIDIA GeForce GTX 765M, compute capability 3.0 cuBLAS error 3 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:5552 current device: 0 2024/01/06 21:48:54 llama.go:320: llama runner exited with error: exit status 1 ``` In 0.1.18 without this fix, the server crashes with a panic due to the assert in llama.cpp. With this fix on the same system we now detect and fallback to CPU mode: ``` 2024/01/06 21:52:17 shim_ext_server.go:142: Dynamic LLM variants [cuda rocm] 2024/01/06 21:52:17 gpu.go:37: Detecting GPU type 2024/01/06 21:52:17 gpu.go:56: Nvidia GPU detected 2024/01/06 21:52:17 gpu.go:89: CUDA GPU is too old. Falling back to CPU mode. Compute Capability detected: 3.0 2024/01/06 21:52:17 routes.go:953: no GPU detected ... ``` Example output on a newer supported GPU which correctly runs on the GPU: ``` 2024/01/06 21:55:11 shim_ext_server.go:142: Dynamic LLM variants [cuda rocm] 2024/01/06 21:55:11 gpu.go:37: Detecting GPU type 2024/01/06 21:55:11 gpu.go:56: Nvidia GPU detected 2024/01/06 21:55:11 gpu.go:86: CUDA Compute Capability detected: 7.5 ``` --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-12 23:07:59 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#10699