[PR #15126] [CLOSED] fix: prevent head-of-line blocking in scheduler during model eviction #46290

Closed
opened 2026-04-25 01:46:10 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15126
Author: @ChharithOeun
Created: 3/29/2026
Status: Closed

Base: mainHead: fix/sched-head-of-line-blocking


📝 Commits (2)

  • 5a00ab6 fix: prevent head-of-line blocking in scheduler
  • 710007d Update sched_test.gotest: add regression test for scheduler head-of-line blocking

📊 Changes

2 files changed (+157 additions, -8 deletions)

View changed files

📝 server/sched.go (+42 -8)
📝 server/sched_test.go (+115 -0)

📄 Description

Problem

When the scheduler needs to evict a model to make room for a new one, processPending blocks on <-s.unloadedCh. During this wait, all other pending requests are frozen — even requests for models that are already loaded and ready to serve.

This is classic head-of-line blocking: a request for an idle, loaded model waits behind a slow eviction/load cycle that has nothing to do with it.

Fixes #14578

Root Cause

In server/sched.go, the processPending goroutine processes requests single-threaded from pendingReqCh. When it decides to evict a runner, it enters a blocking select that only listens for ctx.Done() or s.unloadedCh. Any new requests that arrive on pendingReqCh during this window are stuck in the channel buffer until the eviction completes.

Solution

Replace the simple blocking select with a for/select loop that also drains pendingReqCh while waiting for the eviction:

  • Requests for already-loaded models are dispatched immediately via useLoadedRunner
    • Requests that cannot be served yet (model not loaded) are buffered in a deferred slice
    • After the eviction completes, deferred requests are re-enqueued back into pendingReqCh
    • Cancelled requests are detected and skipped via ctx.Err() check
      The fix is surgical — it only changes the eviction-wait code path and preserves all existing scheduler invariants.

Test

Added TestSchedNoHeadOfLineBlocking which:

  1. Loads model A (active, refCount > 0, cannot be evicted)
    1. Loads model B (active, refCount > 0)
    1. Submits request for model D (triggers B eviction, which blocks because B is active)
    1. Submits request for model A (already loaded)
    1. Asserts A's request is dispatched within 500ms (would timeout without the fix)
    1. Then unblocks B's eviction and confirms D loads successfully
      All 17 existing scheduler tests pass with zero regressions.

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15126 **Author:** [@ChharithOeun](https://github.com/ChharithOeun) **Created:** 3/29/2026 **Status:** ❌ Closed **Base:** `main` ← **Head:** `fix/sched-head-of-line-blocking` --- ### 📝 Commits (2) - [`5a00ab6`](https://github.com/ollama/ollama/commit/5a00ab6c723c6a752dc06d6abf311b41c347aa5f) fix: prevent head-of-line blocking in scheduler - [`710007d`](https://github.com/ollama/ollama/commit/710007d0eb4d8e3cd5a66e387eb3f7cb101c13cd) Update sched_test.gotest: add regression test for scheduler head-of-line blocking ### 📊 Changes **2 files changed** (+157 additions, -8 deletions) <details> <summary>View changed files</summary> 📝 `server/sched.go` (+42 -8) 📝 `server/sched_test.go` (+115 -0) </details> ### 📄 Description ## Problem When the scheduler needs to evict a model to make room for a new one, `processPending` blocks on `<-s.unloadedCh`. During this wait, **all** other pending requests are frozen — even requests for models that are already loaded and ready to serve. This is classic head-of-line blocking: a request for an idle, loaded model waits behind a slow eviction/load cycle that has nothing to do with it. Fixes #14578 ## Root Cause In `server/sched.go`, the `processPending` goroutine processes requests single-threaded from `pendingReqCh`. When it decides to evict a runner, it enters a blocking `select` that only listens for `ctx.Done()` or `s.unloadedCh`. Any new requests that arrive on `pendingReqCh` during this window are stuck in the channel buffer until the eviction completes. ## Solution Replace the simple blocking `select` with a `for/select` loop that **also drains `pendingReqCh`** while waiting for the eviction: - Requests for **already-loaded models** are dispatched immediately via `useLoadedRunner` - - Requests that **cannot be served yet** (model not loaded) are buffered in a `deferred` slice - - After the eviction completes, deferred requests are **re-enqueued** back into `pendingReqCh` - - Cancelled requests are detected and skipped via `ctx.Err()` check The fix is surgical — it only changes the eviction-wait code path and preserves all existing scheduler invariants. ## Test Added `TestSchedNoHeadOfLineBlocking` which: 1. Loads model A (active, refCount > 0, cannot be evicted) 2. 2. Loads model B (active, refCount > 0) 3. 3. Submits request for model D (triggers B eviction, which blocks because B is active) 4. 4. Submits request for model A (already loaded) 5. 5. Asserts A's request is dispatched within 500ms (would timeout without the fix) 6. 6. Then unblocks B's eviction and confirms D loads successfully All 17 existing scheduler tests pass with zero regressions. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-25 01:46:10 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#46290