[PR #783] [MERGED] fix: offloading on low end GPUs #10327

Closed
opened 2026-04-12 22:58:17 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/783
Author: @mxyng
Created: 10/13/2023
Status: Merged
Merged: 10/13/2023
Merged by: @mxyng

Base: mainHead: mxyng/fix-gpu-offloading


📝 Commits (2)

  • 811c3d1 no gpu if vram < 2GB
  • 35afac0 do not use gpu binary when num_gpu == 0

📊 Changes

1 file changed (+33 additions, -15 deletions)

View changed files

📝 llm/llama.go (+33 -15)

📄 Description

Fixes two issues when using low end GPUs:

GPUs with low VRAM are disproportionately affected by overhead when offloading so any device that has less than 2GB VRAM will be exclusively CPU unless overwritten by num_gpu.

A CUDA-enabled runner will still offload to GPU even if num_gpu is 0. This is problematic when the GPU doesn't support a compatible version of CUDA. In this case, select the CPU runner instead.

Caveat: for MacOS (darwin) go generate only builds Metal on ARM so it shouldn't be marked as Accelerated since there's no fallback


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/783 **Author:** [@mxyng](https://github.com/mxyng) **Created:** 10/13/2023 **Status:** ✅ Merged **Merged:** 10/13/2023 **Merged by:** [@mxyng](https://github.com/mxyng) **Base:** `main` ← **Head:** `mxyng/fix-gpu-offloading` --- ### 📝 Commits (2) - [`811c3d1`](https://github.com/ollama/ollama/commit/811c3d19004ef127478f695a7e1aceadcefdf0e0) no gpu if vram < 2GB - [`35afac0`](https://github.com/ollama/ollama/commit/35afac099a9b1db2ad49893ac57664e3fab22a3c) do not use gpu binary when num_gpu == 0 ### 📊 Changes **1 file changed** (+33 additions, -15 deletions) <details> <summary>View changed files</summary> 📝 `llm/llama.go` (+33 -15) </details> ### 📄 Description Fixes two issues when using low end GPUs: GPUs with low VRAM are disproportionately affected by overhead when offloading so any device that has less than 2GB VRAM will be exclusively CPU unless overwritten by num_gpu. A CUDA-enabled runner will still offload to GPU even if num_gpu is 0. This is problematic when the GPU doesn't support a compatible version of CUDA. In this case, select the CPU runner instead. Caveat: for MacOS (darwin) `go generate` only builds Metal on ARM so it shouldn't be marked as `Accelerated` since there's no fallback --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-12 22:58:17 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#10327