[PR #14461] docs: add GPU-to-CPU fallback troubleshooting section #76980

Open
opened 2026-05-05 09:42:55 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/14461
Author: @Anandesh-Sharma
Created: 2/26/2026
Status: 🔄 Open

Base: mainHead: docs-gpu-fallback-troubleshooting


📝 Commits (1)

  • 3e59e23 docs: add GPU-to-CPU fallback troubleshooting section

📊 Changes

1 file changed (+32 additions, -0 deletions)

View changed files

📝 docs/troubleshooting.mdx (+32 -0)

📄 Description

Summary

  • Add a new "GPU-to-CPU Fallback" section to the troubleshooting docs
  • Documents how to diagnose when models silently fall back from GPU to CPU inference
  • Includes instructions for checking processor usage, diagnosing VRAM issues, and controlling GPU layer offloading

Details

Users frequently encounter situations where Ollama silently falls back to CPU inference when a model doesn't fit in available VRAM. This can be confusing because there's no warning. This section helps users:

  1. Check if their model is running on GPU or CPU with ollama ps
  2. Diagnose VRAM issues with nvidia-smi/rocm-smi
  3. Control GPU offloading with the num_gpu parameter
  4. Use debug logging to see detailed memory allocation

Fixes #14258
Fixes #14260

🤖 Generated with Claude Code


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/14461 **Author:** [@Anandesh-Sharma](https://github.com/Anandesh-Sharma) **Created:** 2/26/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `docs-gpu-fallback-troubleshooting` --- ### 📝 Commits (1) - [`3e59e23`](https://github.com/ollama/ollama/commit/3e59e2388037c3430b34d53d9a82e9c725f60607) docs: add GPU-to-CPU fallback troubleshooting section ### 📊 Changes **1 file changed** (+32 additions, -0 deletions) <details> <summary>View changed files</summary> 📝 `docs/troubleshooting.mdx` (+32 -0) </details> ### 📄 Description ## Summary - Add a new "GPU-to-CPU Fallback" section to the troubleshooting docs - Documents how to diagnose when models silently fall back from GPU to CPU inference - Includes instructions for checking processor usage, diagnosing VRAM issues, and controlling GPU layer offloading ## Details Users frequently encounter situations where Ollama silently falls back to CPU inference when a model doesn't fit in available VRAM. This can be confusing because there's no warning. This section helps users: 1. Check if their model is running on GPU or CPU with `ollama ps` 2. Diagnose VRAM issues with `nvidia-smi`/`rocm-smi` 3. Control GPU offloading with the `num_gpu` parameter 4. Use debug logging to see detailed memory allocation Fixes #14258 Fixes #14260 🤖 Generated with [Claude Code](https://claude.com/claude-code) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 09:42:55 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#76980