[PR #12623] [MERGED] llm: Perform eviction when num_gpu is set with new estimates #19162

Closed
opened 2026-04-16 06:59:11 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/12623
Author: @jessegross
Created: 10/15/2025
Status: Merged
Merged: 10/15/2025
Merged by: @jessegross

Base: mainHead: jessegross/num_gpu


📝 Commits (1)

  • 09ed195 llm: Perform eviction when num_gpu is set with new estimates

📊 Changes

2 files changed (+12 additions, -4 deletions)

View changed files

📝 llm/server.go (+4 -4)
📝 llm/server_test.go (+8 -0)

📄 Description

Currently, if you set num_gpu then this forces the model to load with that number of layers in the current configuration. This is done regardless of any other information, which means that no eviction is performed even if another model is loaded.

This behavior is different from the old estimates (and still happens for models that runs on the llama engine). In those cases, models would be evicted if needed to load at the requested number of layers. That behavior is more useful and less surprising, so this changes the new estimates to match.

Fixes #12580


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/12623 **Author:** [@jessegross](https://github.com/jessegross) **Created:** 10/15/2025 **Status:** ✅ Merged **Merged:** 10/15/2025 **Merged by:** [@jessegross](https://github.com/jessegross) **Base:** `main` ← **Head:** `jessegross/num_gpu` --- ### 📝 Commits (1) - [`09ed195`](https://github.com/ollama/ollama/commit/09ed195b746a0506b001d59a2d6de5c0bd77636b) llm: Perform eviction when num_gpu is set with new estimates ### 📊 Changes **2 files changed** (+12 additions, -4 deletions) <details> <summary>View changed files</summary> 📝 `llm/server.go` (+4 -4) 📝 `llm/server_test.go` (+8 -0) </details> ### 📄 Description Currently, if you set num_gpu then this forces the model to load with that number of layers in the current configuration. This is done regardless of any other information, which means that no eviction is performed even if another model is loaded. This behavior is different from the old estimates (and still happens for models that runs on the llama engine). In those cases, models would be evicted if needed to load at the requested number of layers. That behavior is more useful and less surprising, so this changes the new estimates to match. Fixes #12580 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-16 06:59:11 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#19162