[PR #642] [CLOSED] Dynamically support ROCm, CUDA, or OpenCL in the GPU-accelerated binary #36149

Closed
opened 2026-04-22 20:51:45 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/642
Author: @65a
Created: 9/29/2023
Status: Closed

Base: mainHead: main


📝 Commits (10+)

  • 4337051 Make VRAM reporting support CUDA or ROCm, in that order
  • 7286a87 Create runnable but not built go file to generate llama.cpp with options set dynamically based on available SDKs.
  • 6e3a720 Use llama.cpp generator to enable dynamically picking up SDKs from build environment
  • 94c71b2 Use nvcc presence instead of nvidia-smi
  • 1e78ce1 Emit a message if a VRAM check failed, but make it clear it's not necessarily bad, we're just going to skip that VRAM check.
  • a5aa21c Delete llm/llama.cpp/generate_cuda_linux.go
  • eac4d43 Create a build script for the cuda binaries. For now just pick between CUBLAS and HIPBLAS
  • 7b7aa95 Simplify cuda build for linux
  • e47f236 Correct arguments to CMake clean. It shouldn't necessarily be useful in a clean or CI environment, but should make testing locally cleaner.
  • 213ebcc Re-enable CLBlast where appropriate. This should build an accelerated binary for almost everyone.

📊 Changes

3 files changed (+107 additions, -18 deletions)

View changed files

llm/llama.cpp/build_linux_cuda.sh (+49 -0)
📝 llm/llama.cpp/generate_linux.go (+1 -4)
📝 llm/llama.go (+57 -14)

📄 Description

This PR changes the way CMake generation works for the cuda binary, and adds support for querying AMD VRAM. ROCm or CUDA (or OpenCL if neither are available) support are enabled dynamically for gguf, and CUDA or OpenCL support are enabled dynamically for ggml. This is performed by running a CMake managing go script in go generate to query via heuristics the presence of various accelerator SDKs, and enable them in the following order: CUDA, ROCm, OpenCL.

The VRAM detection change uses rocm-info. Note that devices with both an AMD and nVidia GPU will use CUDA and report CUDA VRAM by default, so the binary name default of cuda is still appropriate, but it might make sense to call it gpu or accelerated or something in the future.

Second try since GitHub automatically closed #628 when I cleared my commits and reapplied the changes in the UI 😢 . I compressed some commits and rebased on head. Comments welcome, it's easier to see what is going on from the files changed view.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/642 **Author:** [@65a](https://github.com/65a) **Created:** 9/29/2023 **Status:** ❌ Closed **Base:** `main` ← **Head:** `main` --- ### 📝 Commits (10+) - [`4337051`](https://github.com/ollama/ollama/commit/4337051350614ed77114dca32d9f5e29e8d14e13) Make VRAM reporting support CUDA or ROCm, in that order - [`7286a87`](https://github.com/ollama/ollama/commit/7286a87353d1f1c6a6f0654386f1bc9a817b4e2b) Create runnable but not built go file to generate llama.cpp with options set dynamically based on available SDKs. - [`6e3a720`](https://github.com/ollama/ollama/commit/6e3a72001c1ede9d0c22ad440ada6f04264e8672) Use llama.cpp generator to enable dynamically picking up SDKs from build environment - [`94c71b2`](https://github.com/ollama/ollama/commit/94c71b2df11531c3fd27b888307cd61767afb381) Use nvcc presence instead of nvidia-smi - [`1e78ce1`](https://github.com/ollama/ollama/commit/1e78ce16dc4b3b17b5e8030751648aed44e8d5e5) Emit a message if a VRAM check failed, but make it clear it's not necessarily bad, we're just going to skip that VRAM check. - [`a5aa21c`](https://github.com/ollama/ollama/commit/a5aa21cd22495d1035c4df6e34a93984016b1118) Delete llm/llama.cpp/generate_cuda_linux.go - [`eac4d43`](https://github.com/ollama/ollama/commit/eac4d43e54926c088899fcc847d354d4e1e47da1) Create a build script for the cuda binaries. For now just pick between CUBLAS and HIPBLAS - [`7b7aa95`](https://github.com/ollama/ollama/commit/7b7aa95afe9e15ce883ed30df10dc811880e72dc) Simplify cuda build for linux - [`e47f236`](https://github.com/ollama/ollama/commit/e47f2369e4f25b8fc448435207650f5119e67246) Correct arguments to CMake clean. It shouldn't necessarily be useful in a clean or CI environment, but should make testing locally cleaner. - [`213ebcc`](https://github.com/ollama/ollama/commit/213ebccfc9d956033f9b03c6511ffe6902741174) Re-enable CLBlast where appropriate. This should build an accelerated binary for almost everyone. ### 📊 Changes **3 files changed** (+107 additions, -18 deletions) <details> <summary>View changed files</summary> ➕ `llm/llama.cpp/build_linux_cuda.sh` (+49 -0) 📝 `llm/llama.cpp/generate_linux.go` (+1 -4) 📝 `llm/llama.go` (+57 -14) </details> ### 📄 Description This PR changes the way CMake generation works for the cuda binary, and adds support for querying AMD VRAM. ROCm or CUDA (or OpenCL if neither are available) support are enabled dynamically for gguf, and CUDA or OpenCL support are enabled dynamically for ggml. This is performed by running a CMake managing go script in go generate to query via heuristics the presence of various accelerator SDKs, and enable them in the following order: CUDA, ROCm, OpenCL. The VRAM detection change uses rocm-info. Note that devices with both an AMD and nVidia GPU will use CUDA and report CUDA VRAM by default, so the binary name default of cuda is still appropriate, but it might make sense to call it gpu or accelerated or something in the future. Second try since GitHub automatically closed #628 when I cleared my commits and reapplied the changes in the UI :cry: . I compressed some commits and rebased on head. Comments welcome, it's easier to see what is going on from the files changed view. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-22 20:51:45 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#36149