[PR #12000] [MERGED] Add v12 + v13 cuda support #18954

Closed
opened 2026-04-16 06:52:41 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/12000
Author: @dhiltgen
Created: 8/21/2025
Status: Merged
Merged: 9/10/2025
Merged by: @dhiltgen

Base: mainHead: v11_v13


📝 Commits (7)

  • cbe5375 Add support for upcoming NVIDIA Jetsons
  • e90021d cuda: bring back dual versions
  • a01cf4e win: break up native builds in build_windows.ps1
  • ec646df v11 build working on windows and linux
  • baf9585 switch to cuda v12.8 not JIT
  • 2e206ce Set CUDA compression to size
  • ea8330d enhance manual install linux docs

📊 Changes

8 files changed (+146 additions, -25 deletions)

View changed files

📝 .github/workflows/release.yaml (+11 -2)
📝 .github/workflows/test.yaml (+3 -3)
📝 CMakeLists.txt (+3 -3)
📝 CMakePresets.json (+26 -0)
📝 Dockerfile (+27 -3)
📝 discover/cuda_common.go (+8 -7)
📝 docs/linux.md (+2 -1)
📝 scripts/build_windows.ps1 (+66 -6)

📄 Description

Bring back support for dual CUDA stacks for broader GPU/Driver support.

Multiple permutations of v11, v12.1 and v12.8 were explored (with and without JIT), and I believe the optimal setup is v12.8 without JIT, with compression set to size. For v13 JIT is leveraged to reduce added size impact.

  • 1.5G ollama-linux-amd64.tgz
  • 1.5G ollama-linux-arm64.tgz
  • 1.5G ollama-windows-amd64.zip

Baseline Comparison

Current shipping v12.8 and cubin binaries compiled in

  • 1.2G ollama-linux-amd64.tgz
  • 1.0G ollama-linux-arm64.tgz
  • 1.2G ollama-windows-amd64.zip

Note: v12 with JIT would have provided an even greater size benefit with only negligible startup overhead on first run to compile and cache the kernels, however the JIT compilation pins to a minimum driver version matching the CUDA version, so with v12.8, that would have pinned us to a driver version of 570 or newer (Feb 2025) which would have severely impacted out compatibility window. Today we support v12.1 or newer (531 Feb 2023). By sticking with cubin (not JIT) for v12.8, we retain the 531 or newer compatibility. Downgrading to compile with CUDA v12.1 could have allowed JIT with the same driver compatibility, however Blackwell GPUs on pre-580 drivers would no longer work.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/12000 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 8/21/2025 **Status:** ✅ Merged **Merged:** 9/10/2025 **Merged by:** [@dhiltgen](https://github.com/dhiltgen) **Base:** `main` ← **Head:** `v11_v13` --- ### 📝 Commits (7) - [`cbe5375`](https://github.com/ollama/ollama/commit/cbe537548070ba241a143411c9fd4abdd56371f6) Add support for upcoming NVIDIA Jetsons - [`e90021d`](https://github.com/ollama/ollama/commit/e90021db6d714def2457894efcf11d9fbb9c5419) cuda: bring back dual versions - [`a01cf4e`](https://github.com/ollama/ollama/commit/a01cf4e83bba1ba109d0025c8b09fbe28df7553e) win: break up native builds in build_windows.ps1 - [`ec646df`](https://github.com/ollama/ollama/commit/ec646df1feb616c0f7d663dfbcb00d71d98ce64f) v11 build working on windows and linux - [`baf9585`](https://github.com/ollama/ollama/commit/baf958551d4bc86b79abb69726f9b08caecb863f) switch to cuda v12.8 not JIT - [`2e206ce`](https://github.com/ollama/ollama/commit/2e206ce5cb355a020feec6df3f197286d6d1d49d) Set CUDA compression to size - [`ea8330d`](https://github.com/ollama/ollama/commit/ea8330d02df864a39ea8f58c24dcd5a6e7c15ab1) enhance manual install linux docs ### 📊 Changes **8 files changed** (+146 additions, -25 deletions) <details> <summary>View changed files</summary> 📝 `.github/workflows/release.yaml` (+11 -2) 📝 `.github/workflows/test.yaml` (+3 -3) 📝 `CMakeLists.txt` (+3 -3) 📝 `CMakePresets.json` (+26 -0) 📝 `Dockerfile` (+27 -3) 📝 `discover/cuda_common.go` (+8 -7) 📝 `docs/linux.md` (+2 -1) 📝 `scripts/build_windows.ps1` (+66 -6) </details> ### 📄 Description Bring back support for dual CUDA stacks for broader GPU/Driver support. Multiple permutations of v11, v12.1 and v12.8 were explored (with and without JIT), and I believe the optimal setup is v12.8 without JIT, with compression set to `size`. For v13 JIT is leveraged to reduce added size impact. - 1.5G ollama-linux-amd64.tgz - 1.5G ollama-linux-arm64.tgz - 1.5G ollama-windows-amd64.zip ## Baseline Comparison Current shipping v12.8 and cubin binaries compiled in - 1.2G ollama-linux-amd64.tgz - 1.0G ollama-linux-arm64.tgz - 1.2G ollama-windows-amd64.zip Note: v12 with JIT would have provided an even greater size benefit with only negligible startup overhead on first run to compile and cache the kernels, however the JIT compilation pins to a minimum driver version matching the CUDA version, so with v12.8, that would have pinned us to a driver version of 570 or newer (Feb 2025) which would have severely impacted out compatibility window. Today we support v12.1 or newer (531 Feb 2023). By sticking with cubin (not JIT) for v12.8, we retain the 531 or newer compatibility. Downgrading to compile with CUDA v12.1 could have allowed JIT with the same driver compatibility, however Blackwell GPUs on pre-580 drivers would no longer work. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-16 06:52:41 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#18954