[PR #14569] [MERGED] sched: Model eviction for MLX #25268

Closed
opened 2026-04-19 18:06:48 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/14569
Author: @jessegross
Created: 3/3/2026
Status: Merged
Merged: 3/17/2026
Merged by: @jessegross

Base: mainHead: jessegross/mlx-swap


📝 Commits (1)

  • 66e8236 sched: Model eviction for MLX

📊 Changes

8 files changed (+291 additions, -290 deletions)

View changed files

📝 server/routes_debug_test.go (+2 -2)
📝 server/routes_generate_renderer_test.go (+2 -2)
📝 server/routes_generate_test.go (+7 -7)
📝 server/routes_harmony_streaming_test.go (+3 -3)
📝 server/sched.go (+37 -94)
📝 server/sched_test.go (+56 -19)
📝 x/imagegen/server.go (+54 -52)
📝 x/mlxrunner/client.go (+130 -111)

📄 Description

MLX runners (image generation and LLM) previously bypassed the scheduler's standard load path via a separate loadMLX method. This meant they skipped VRAM fitting checks and couldn't participate in model eviction.

Now all model types flow through the same load function. Model eviction for MLX is based on weights as KV cache and compute graph are dynamic. This means that eviction does not take into account the worst case memory and models can still compete for memory but it is a significant improvement.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/14569 **Author:** [@jessegross](https://github.com/jessegross) **Created:** 3/3/2026 **Status:** ✅ Merged **Merged:** 3/17/2026 **Merged by:** [@jessegross](https://github.com/jessegross) **Base:** `main` ← **Head:** `jessegross/mlx-swap` --- ### 📝 Commits (1) - [`66e8236`](https://github.com/ollama/ollama/commit/66e82367c6eca0dcaaa5b45dffe62ffd963df86c) sched: Model eviction for MLX ### 📊 Changes **8 files changed** (+291 additions, -290 deletions) <details> <summary>View changed files</summary> 📝 `server/routes_debug_test.go` (+2 -2) 📝 `server/routes_generate_renderer_test.go` (+2 -2) 📝 `server/routes_generate_test.go` (+7 -7) 📝 `server/routes_harmony_streaming_test.go` (+3 -3) 📝 `server/sched.go` (+37 -94) 📝 `server/sched_test.go` (+56 -19) 📝 `x/imagegen/server.go` (+54 -52) 📝 `x/mlxrunner/client.go` (+130 -111) </details> ### 📄 Description MLX runners (image generation and LLM) previously bypassed the scheduler's standard load path via a separate loadMLX method. This meant they skipped VRAM fitting checks and couldn't participate in model eviction. Now all model types flow through the same load function. Model eviction for MLX is based on weights as KV cache and compute graph are dynamic. This means that eviction does not take into account the worst case memory and models can still compete for memory but it is a significant improvement. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-19 18:06:48 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#25268