[PR #14261] llm: warn when model falls back from GPU to CPU #61288

Open
opened 2026-04-29 16:22:20 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/14261
Author: @akuligowski9
Created: 2/14/2026
Status: 🔄 Open

Base: mainHead: llm/warn-gpu-cpu-fallback


📝 Commits (3)

  • ba6eb27 llm: warn when model falls back from GPU to CPU
  • 9a9ac2a llm: scale back to only critical GPU fallback warnings
  • b8bd446 llm: warn on partial GPU offload

📊 Changes

1 file changed (+10 additions, -1 deletions)

View changed files

📝 llm/server.go (+10 -1)

📄 Description

Summary

  • Adds slog.Warn when macOS disables GPU offload because model exceeds system memory (previously only a code comment, no log)
  • Upgrades "insufficient VRAM to load any model layers" from slog.Debug to slog.Warn so it's visible in default logs

No behavior change, no API surface change — only log level and message improvements.

Context

Resolves #14258

Scaled back based on collaborator feedback — removed the per-load GPU/CPU layer split logging since ollama ps already surfaces that information. The remaining warnings cover two edge cases where ollama ps alone doesn't help: silent GPU offload disabling on macOS and zero-layer GPU allocation.

Related issues: #12976 #13589 #12197 #13814 #5923 #4996 #10707 #9948

Test plan

  • go test ./llm/ passes
  • go vet ./llm/ clean
  • Manual verification: load a model larger than system memory on macOS and confirm the warning appears in server logs at default log level

🤖 Generated with Claude Code


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/14261 **Author:** [@akuligowski9](https://github.com/akuligowski9) **Created:** 2/14/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `llm/warn-gpu-cpu-fallback` --- ### 📝 Commits (3) - [`ba6eb27`](https://github.com/ollama/ollama/commit/ba6eb27f9c49dbfa148165c78cbf108bffcfa2bb) llm: warn when model falls back from GPU to CPU - [`9a9ac2a`](https://github.com/ollama/ollama/commit/9a9ac2adc5b3a8a30a43ef6e2234b19d377d35c3) llm: scale back to only critical GPU fallback warnings - [`b8bd446`](https://github.com/ollama/ollama/commit/b8bd446ec341501694cd702231847e7b1bb39c85) llm: warn on partial GPU offload ### 📊 Changes **1 file changed** (+10 additions, -1 deletions) <details> <summary>View changed files</summary> 📝 `llm/server.go` (+10 -1) </details> ### 📄 Description ## Summary - Adds `slog.Warn` when macOS disables GPU offload because model exceeds system memory (previously only a code comment, no log) - Upgrades "insufficient VRAM to load any model layers" from `slog.Debug` to `slog.Warn` so it's visible in default logs No behavior change, no API surface change — only log level and message improvements. ## Context Resolves #14258 Scaled back based on [collaborator feedback](https://github.com/ollama/ollama/issues/14258#issuecomment-3902619458) — removed the per-load GPU/CPU layer split logging since `ollama ps` already surfaces that information. The remaining warnings cover two edge cases where `ollama ps` alone doesn't help: silent GPU offload disabling on macOS and zero-layer GPU allocation. Related issues: #12976 #13589 #12197 #13814 #5923 #4996 #10707 #9948 ## Test plan - [x] `go test ./llm/` passes - [x] `go vet ./llm/` clean - [ ] Manual verification: load a model larger than system memory on macOS and confirm the warning appears in server logs at default log level 🤖 Generated with [Claude Code](https://claude.com/claude-code) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-29 16:22:21 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#61288