[PR #14951] server: prevent system sleep during active inference on Windows #77233

Open
opened 2026-05-05 09:54:43 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/14951
Author: @SofianeBel
Created: 3/19/2026
Status: 🔄 Open

Base: mainHead: fix/prevent-sleep-windows


📝 Commits (1)

  • 5151303 server: prevent system sleep during active inference on Windows

📊 Changes

5 files changed (+148 additions, -4 deletions)

View changed files

📝 server/sched.go (+11 -3)
📝 server/sched_test.go (+1 -1)
server/sleep_other.go (+9 -0)
server/sleep_test.go (+55 -0)
server/sleep_windows.go (+72 -0)

📄 Description

On Windows, the system can enter sleep mode while Ollama is actively
processing inference requests, interrupting long-running generations.

Use SetThreadExecutionState to prevent sleep while runners are active.
A reference-counted sleepInhibitor is wired into the scheduler: sleep
is prevented when the first runner becomes active and allowed again
when all runners go idle. The Win32 API is only called on 0↔1
transitions, not on every request. Non-Windows platforms get a no-op
stub.

Fixes #4072


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/14951 **Author:** [@SofianeBel](https://github.com/SofianeBel) **Created:** 3/19/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `fix/prevent-sleep-windows` --- ### 📝 Commits (1) - [`5151303`](https://github.com/ollama/ollama/commit/5151303efa5cd1e41d1f0afff2fdccb7108b5970) server: prevent system sleep during active inference on Windows ### 📊 Changes **5 files changed** (+148 additions, -4 deletions) <details> <summary>View changed files</summary> 📝 `server/sched.go` (+11 -3) 📝 `server/sched_test.go` (+1 -1) ➕ `server/sleep_other.go` (+9 -0) ➕ `server/sleep_test.go` (+55 -0) ➕ `server/sleep_windows.go` (+72 -0) </details> ### 📄 Description On Windows, the system can enter sleep mode while Ollama is actively processing inference requests, interrupting long-running generations. Use SetThreadExecutionState to prevent sleep while runners are active. A reference-counted sleepInhibitor is wired into the scheduler: sleep is prevented when the first runner becomes active and allowed again when all runners go idle. The Win32 API is only called on 0↔1 transitions, not on every request. Non-Windows platforms get a no-op stub. Fixes #4072 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-05 09:54:43 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#77233