[PR #4299] [MERGED] Wait for GPU free memory reporting to converge #37317

Closed
opened 2026-04-22 22:02:08 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/4299
Author: @dhiltgen
Created: 5/9/2024
Status: Merged
Merged: 5/9/2024
Merged by: @dhiltgen

Base: mainHead: handle_vram_reporting_lag


📝 Commits (1)

  • 354ad92 Wait for GPU free memory reporting to converge

📊 Changes

2 files changed (+61 additions, -3 deletions)

View changed files

📝 gpu/cpu_common.go (+3 -3)
📝 server/sched.go (+58 -0)

📄 Description

The GPU drivers take a while to update their free memory reporting, so we need to wait until the values converge with what we're expecting before proceeding to start another runner in order to get an accurate picture.

Prior to this fix, this can manifest as loading less layers than expected on a subsequent load and seeing slow inference, or worst case, toggling back and forth between GPU and CPU.

Fixes #4253


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/4299 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 5/9/2024 **Status:** ✅ Merged **Merged:** 5/9/2024 **Merged by:** [@dhiltgen](https://github.com/dhiltgen) **Base:** `main` ← **Head:** `handle_vram_reporting_lag` --- ### 📝 Commits (1) - [`354ad92`](https://github.com/ollama/ollama/commit/354ad9254ee64ada621f66192cf4686af8a2a6bb) Wait for GPU free memory reporting to converge ### 📊 Changes **2 files changed** (+61 additions, -3 deletions) <details> <summary>View changed files</summary> 📝 `gpu/cpu_common.go` (+3 -3) 📝 `server/sched.go` (+58 -0) </details> ### 📄 Description The GPU drivers take a while to update their free memory reporting, so we need to wait until the values converge with what we're expecting before proceeding to start another runner in order to get an accurate picture. Prior to this fix, this can manifest as loading less layers than expected on a subsequent load and seeing slow inference, or worst case, toggling back and forth between GPU and CPU. Fixes #4253 --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-22 22:02:09 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#37317