[PR #5626] [MERGED] sched: error on over-allocation of system memory when on Linux #11859

Closed
opened 2026-04-12 23:40:55 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/5626
Author: @jmorganca
Created: 7/11/2024
Status: Merged
Merged: 7/11/2024
Merged by: @jmorganca

Base: mainHead: jmorganca/maxsize-gpu-only


📝 Commits (1)

  • 320c166 sched: only error when over-allocating system memory

📊 Changes

2 files changed (+9 additions, -37 deletions)

View changed files

📝 llm/server.go (+9 -0)
📝 server/sched.go (+0 -37)

📄 Description

Model switching no longer works on CPU-only machines and the scheduler instead errors with requested model is too large for this system error:

$ ollama run gemma2
Error: requested model (8.4 GiB) is too large for this system (1.9 GiB)

This PR changes this behavior to only stop a new model from loading if a crash will take place from over-allocating system memory on Linux. It also moves the check until after scheduling has taken place to avoid an error before knowing if another model would be unloaded.

Example on a 48GB VRAM system with 64GB of system memory

$ ollama run llama3:70b-instruct-fp16
Error: requested model requires more system memory (86.8 GiB) than is available (62.5 GiB)

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/5626 **Author:** [@jmorganca](https://github.com/jmorganca) **Created:** 7/11/2024 **Status:** ✅ Merged **Merged:** 7/11/2024 **Merged by:** [@jmorganca](https://github.com/jmorganca) **Base:** `main` ← **Head:** `jmorganca/maxsize-gpu-only` --- ### 📝 Commits (1) - [`320c166`](https://github.com/ollama/ollama/commit/320c166d7b98d5e44377fd4df839e66ca158fb00) sched: only error when over-allocating system memory ### 📊 Changes **2 files changed** (+9 additions, -37 deletions) <details> <summary>View changed files</summary> 📝 `llm/server.go` (+9 -0) 📝 `server/sched.go` (+0 -37) </details> ### 📄 Description Model switching no longer works on CPU-only machines and the scheduler instead errors with `requested model is too large for this system` error: ``` $ ollama run gemma2 Error: requested model (8.4 GiB) is too large for this system (1.9 GiB) ``` This PR changes this behavior to only stop a new model from loading if a crash will take place from over-allocating system memory on Linux. It also moves the check until after scheduling has taken place to avoid an error before knowing if another model would be unloaded. Example on a 48GB VRAM system with 64GB of system memory ``` $ ollama run llama3:70b-instruct-fp16 Error: requested model requires more system memory (86.8 GiB) than is available (62.5 GiB) ``` --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-12 23:40:55 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#11859