[PR #4031] [MERGED] Fix/issue 3736: When runners are closing or expiring. Scheduler is getting dirty VRAM size readings. #21899

Closed
opened 2026-04-19 15:56:27 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/4031
Author: @MarkWard0110
Created: 4/29/2024
Status: Merged
Merged: 5/1/2024
Merged by: @dhiltgen

Base: mainHead: fix/issue-3736


📝 Commits (6)

  • 948114e fix sched to wait for the runner to terminate to ensure following vram check will be more accurate
  • f4a73d5 fix runner expire during active use. Clearing the expire timer as it is used. Allowing the finish to assign an expire timer so that the runner will expire after no use.
  • 34a4a94 ignore debug bin files
  • 63c7636 log when the waiting for the process to stop to help debug when other tasks execute during this wait.
  • ba26c7a it will always return an error due to Kill() discarding Wait() errors
  • 321d57e Removing go routine calling .wait from load.

📊 Changes

3 files changed (+18 additions, -8 deletions)

View changed files

📝 .gitignore (+2 -1)
📝 llm/server.go (+7 -7)
📝 server/sched.go (+9 -0)

📄 Description

Issue: When the Ollama Scheduler requests a runner to stop (kill), the Scheduler reads the available VRAM and gets a size that includes the terminating runner. This results in offloading to the CPU and slower execution. Each time a new model is swapped in, the new runner reads the previous runner's memory allocation. This affects the new runner's VRAM allocation estimate.
Fix: When stopping a runner, wait for the process to exit so that the memory is free before Scheduler checks the amount of VRAM available.

Issue: After a runner finishes a request, an expiration timer is assigned based on the session duration. Subsequent requests will renew the expiration timer after each request has finished. If a request happens to take too long and the timer fires, the runner will be scheduled to be unloaded. If concurrently, a new pending may get an incorrect measure of VRAM, resulting in offloading to the CPU and slower execution. Runners are expiring in the middle of heavy use, which results in the same model closing and reloading. The reloading gets a dirty VRAM measurement because the previous runner is not fully closed before the new runner is created. The concurrency of the pending and completed Go routines. The pending continues on the unloaded event, which can be "any" unloaded event.
Fix: Clear the timer so it does not fire when reusing runners. Only assign the timer when the runner has finished. Clear any assigned timers when closing runners. An active runner should not have an expire timer.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/4031 **Author:** [@MarkWard0110](https://github.com/MarkWard0110) **Created:** 4/29/2024 **Status:** ✅ Merged **Merged:** 5/1/2024 **Merged by:** [@dhiltgen](https://github.com/dhiltgen) **Base:** `main` ← **Head:** `fix/issue-3736` --- ### 📝 Commits (6) - [`948114e`](https://github.com/ollama/ollama/commit/948114e3e3325af01c31697dee861459624ca805) fix sched to wait for the runner to terminate to ensure following vram check will be more accurate - [`f4a73d5`](https://github.com/ollama/ollama/commit/f4a73d57a4e3182a4dd25dae93a779a13423dfc0) fix runner expire during active use. Clearing the expire timer as it is used. Allowing the finish to assign an expire timer so that the runner will expire after no use. - [`34a4a94`](https://github.com/ollama/ollama/commit/34a4a94f13d283ef794ba02f84ded96a794bb5d2) ignore debug bin files - [`63c7636`](https://github.com/ollama/ollama/commit/63c763685f1dc94f7efe4742b00b226be99505d0) log when the waiting for the process to stop to help debug when other tasks execute during this wait. - [`ba26c7a`](https://github.com/ollama/ollama/commit/ba26c7aa00635c78c2028aba680556801c02fd64) it will always return an error due to Kill() discarding Wait() errors - [`321d57e`](https://github.com/ollama/ollama/commit/321d57e1a06d8e95346dd911f7473e6cf382fee7) Removing go routine calling .wait from load. ### 📊 Changes **3 files changed** (+18 additions, -8 deletions) <details> <summary>View changed files</summary> 📝 `.gitignore` (+2 -1) 📝 `llm/server.go` (+7 -7) 📝 `server/sched.go` (+9 -0) </details> ### 📄 Description Issue: When the Ollama `Scheduler` requests a runner to stop (kill), the `Scheduler` reads the available VRAM and gets a size that includes the terminating runner. This results in offloading to the CPU and slower execution. Each time a new model is swapped in, the new runner reads the previous runner's memory allocation. This affects the new runner's VRAM allocation estimate. Fix: When stopping a runner, wait for the process to exit so that the memory is free before `Scheduler` checks the amount of VRAM available. Issue: After a runner finishes a request, an expiration timer is assigned based on the session duration. Subsequent requests will renew the expiration timer after each request has finished. If a request happens to take too long and the timer fires, the runner will be scheduled to be unloaded. If concurrently, a new pending may get an incorrect measure of VRAM, resulting in offloading to the CPU and slower execution. Runners are expiring in the middle of heavy use, which results in the same model closing and reloading. The reloading gets a dirty VRAM measurement because the previous runner is not fully closed before the new runner is created. The concurrency of the pending and completed Go routines. The pending continues on the unloaded event, which can be "any" unloaded event. Fix: Clear the timer so it does not fire when reusing runners. Only assign the timer when the runner has finished. Clear any assigned timers when closing runners. An active runner should not have an expire timer. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-19 15:56:27 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#21899