[PR #11995] [MERGED] llm: Don't always evict models in CPU-only mode #24219

Closed
opened 2026-04-19 17:27:11 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/11995
Author: @jessegross
Created: 8/20/2025
Status: Merged
Merged: 8/20/2025
Merged by: @jessegross

Base: mainHead: jessegross/cpu_only


📝 Commits (1)

  • dede76d llm: Don't always evict models in CPU-only mode

📊 Changes

2 files changed (+9 additions, -8 deletions)

View changed files

📝 llm/memory.go (+8 -4)
📝 llm/server.go (+1 -4)

📄 Description

With old memory estimates, it's currently impossible to load more than one model at a time when no GPUs are available. This is because the check for whether we need to evict a model looks to see if all layers of the new model can be loaded onto GPUs, which is never true if there are no GPUs. Before the memory management changes, there was a special code path for CPU-only systems.

This problem does not exist with new memory estimates.

Fixes #11974


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/11995 **Author:** [@jessegross](https://github.com/jessegross) **Created:** 8/20/2025 **Status:** ✅ Merged **Merged:** 8/20/2025 **Merged by:** [@jessegross](https://github.com/jessegross) **Base:** `main` ← **Head:** `jessegross/cpu_only` --- ### 📝 Commits (1) - [`dede76d`](https://github.com/ollama/ollama/commit/dede76d78742878fae421a24a88711d921e1d5fc) llm: Don't always evict models in CPU-only mode ### 📊 Changes **2 files changed** (+9 additions, -8 deletions) <details> <summary>View changed files</summary> 📝 `llm/memory.go` (+8 -4) 📝 `llm/server.go` (+1 -4) </details> ### 📄 Description With old memory estimates, it's currently impossible to load more than one model at a time when no GPUs are available. This is because the check for whether we need to evict a model looks to see if all layers of the new model can be loaded onto GPUs, which is never true if there are no GPUs. Before the memory management changes, there was a special code path for CPU-only systems. This problem does not exist with new memory estimates. Fixes #11974 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-19 17:27:11 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#24219