[GH-ISSUE #14260] Troubleshooting docs missing GPU fallback diagnosis and VRAM sizing guidance #9286

Open
opened 2026-04-12 22:09:18 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @akuligowski9 on GitHub (Feb 14, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14260

Problem

The current troubleshooting documentation (docs/troubleshooting.mdx) covers GPU discovery failures (driver issues, container setup, ROCm permissions) but does not address the most common user scenario: the GPU is detected but the model doesn't fit in VRAM, causing unexpected CPU fallback or partial offloading.

Users who land on the troubleshooting page after experiencing slow performance find guidance for "GPU not detected" scenarios, but nothing for "GPU detected, model too large for VRAM" — which is the far more common case based on issue volume.

Additionally, there is no reference table for approximate VRAM requirements by model size, forcing users to discover through trial and error whether their hardware can run a given model on GPU.

Why this matters

The FAQ mentions ollama ps to check GPU/CPU split, and the context-length docs mention VRAM tiers, but the troubleshooting page — where confused users most often land — has no guidance connecting these concepts.

Example issues where this guidance would have helped:

  • #4809 — Add information on RAM and VRAM requirements in library
  • #9774 — Estimate VRAM needs based on context length and quantization
  • #8144 — When models don't fit in VRAM, issue alert instead of freezing
  • #14257 — Built-in Hardware Benchmark Tool for Model Compatibility
  • #6864 — Memory Allocation on VRAM when model size > VRAM
  • #4996 — Apple Silicon 8/16GB slow down with larger models

Proposed change

Add a small section to docs/troubleshooting.mdx with:

  1. A short diagnostic guide for "model running slower than expected"
  2. How to check GPU utilization with ollama ps
  3. Common reasons for CPU fallback
  4. An approximate VRAM requirements table for common model sizes (4-bit quantization)
  5. Actionable steps when a model doesn't fit in VRAM

This is a targeted addition (~40-50 lines), not a large documentation overhaul.

How it will be tested

  • Documentation renders correctly in the docs build system
  • Information is accurate based on current model sizes and quantization levels
Originally created by @akuligowski9 on GitHub (Feb 14, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14260 ## Problem The current troubleshooting documentation (`docs/troubleshooting.mdx`) covers GPU discovery failures (driver issues, container setup, ROCm permissions) but does not address the most common user scenario: **the GPU is detected but the model doesn't fit in VRAM, causing unexpected CPU fallback or partial offloading.** Users who land on the troubleshooting page after experiencing slow performance find guidance for "GPU not detected" scenarios, but nothing for "GPU detected, model too large for VRAM" — which is the far more common case based on issue volume. Additionally, there is no reference table for approximate VRAM requirements by model size, forcing users to discover through trial and error whether their hardware can run a given model on GPU. ## Why this matters The FAQ mentions `ollama ps` to check GPU/CPU split, and the context-length docs mention VRAM tiers, but the troubleshooting page — where confused users most often land — has no guidance connecting these concepts. Example issues where this guidance would have helped: - #4809 — Add information on RAM and VRAM requirements in library - #9774 — Estimate VRAM needs based on context length and quantization - #8144 — When models don't fit in VRAM, issue alert instead of freezing - #14257 — Built-in Hardware Benchmark Tool for Model Compatibility - #6864 — Memory Allocation on VRAM when model size > VRAM - #4996 — Apple Silicon 8/16GB slow down with larger models ## Proposed change Add a small section to `docs/troubleshooting.mdx` with: 1. A short diagnostic guide for "model running slower than expected" 2. How to check GPU utilization with `ollama ps` 3. Common reasons for CPU fallback 4. An approximate VRAM requirements table for common model sizes (4-bit quantization) 5. Actionable steps when a model doesn't fit in VRAM This is a targeted addition (~40-50 lines), not a large documentation overhaul. ## How it will be tested - Documentation renders correctly in the docs build system - Information is accurate based on current model sizes and quantization levels
GiteaMirror added the documentation label 2026-04-12 22:09:18 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9286