[PR #14470] [MERGED] MLX runner memory fixes #14695

Closed
opened 2026-04-13 01:00:54 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/14470
Author: @jessegross
Created: 2/26/2026
Status: Merged
Merged: 2/28/2026
Merged by: @jessegross

Base: mainHead: jessegross/mlx-context


📝 Commits (4)

  • c863f15 mlxrunner: Report actual memory usage from runner
  • ae0498f mlxrunner: Propagate pipeline errors to client via api.StatusError
  • eb878c6 mlxrunner: Enforce model context limit
  • 6c2ae11 mlxrunner: Fix prompt eval timing and count metrics

📊 Changes

16 files changed (+196 additions, -153 deletions)

View changed files

📝 llm/server.go (+11 -25)
📝 server/images.go (+4 -0)
📝 server/prompt.go (+31 -29)
📝 server/routes.go (+8 -1)
📝 server/sched.go (+8 -6)
📝 server/sched_test.go (+1 -2)
📝 x/imagegen/server.go (+3 -8)
📝 x/mlxrunner/client.go (+65 -43)
📝 x/mlxrunner/model/base/base.go (+1 -0)
📝 x/mlxrunner/pipeline.go (+25 -10)
📝 x/mlxrunner/runner.go (+20 -21)
📝 x/mlxrunner/server.go (+6 -7)
📝 x/models/gemma3/gemma3.go (+4 -0)
📝 x/models/glm4_moe_lite/glm4_moe_lite.go (+1 -1)
📝 x/models/llama/llama.go (+4 -0)
📝 x/models/qwen3/qwen3.go (+4 -0)

📄 Description

A series of fixes for the MLX runner, primarily around memory:

  • Report live memory usage through ollama ps
  • Enforce model context limits in a way that is similar to most cloud services
  • Better error and timing reporting

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/14470 **Author:** [@jessegross](https://github.com/jessegross) **Created:** 2/26/2026 **Status:** ✅ Merged **Merged:** 2/28/2026 **Merged by:** [@jessegross](https://github.com/jessegross) **Base:** `main` ← **Head:** `jessegross/mlx-context` --- ### 📝 Commits (4) - [`c863f15`](https://github.com/ollama/ollama/commit/c863f15496208f2dc0b3d9742e062bf7243c2852) mlxrunner: Report actual memory usage from runner - [`ae0498f`](https://github.com/ollama/ollama/commit/ae0498fa3e228f8104e746fbc834e551b5b7e26e) mlxrunner: Propagate pipeline errors to client via api.StatusError - [`eb878c6`](https://github.com/ollama/ollama/commit/eb878c61ca1f5d86ebddb0af665fe94cfac53a3a) mlxrunner: Enforce model context limit - [`6c2ae11`](https://github.com/ollama/ollama/commit/6c2ae116a6b0aeb20ed9a2bd759954d63fec0464) mlxrunner: Fix prompt eval timing and count metrics ### 📊 Changes **16 files changed** (+196 additions, -153 deletions) <details> <summary>View changed files</summary> 📝 `llm/server.go` (+11 -25) 📝 `server/images.go` (+4 -0) 📝 `server/prompt.go` (+31 -29) 📝 `server/routes.go` (+8 -1) 📝 `server/sched.go` (+8 -6) 📝 `server/sched_test.go` (+1 -2) 📝 `x/imagegen/server.go` (+3 -8) 📝 `x/mlxrunner/client.go` (+65 -43) 📝 `x/mlxrunner/model/base/base.go` (+1 -0) 📝 `x/mlxrunner/pipeline.go` (+25 -10) 📝 `x/mlxrunner/runner.go` (+20 -21) 📝 `x/mlxrunner/server.go` (+6 -7) 📝 `x/models/gemma3/gemma3.go` (+4 -0) 📝 `x/models/glm4_moe_lite/glm4_moe_lite.go` (+1 -1) 📝 `x/models/llama/llama.go` (+4 -0) 📝 `x/models/qwen3/qwen3.go` (+4 -0) </details> ### 📄 Description A series of fixes for the MLX runner, primarily around memory: - Report live memory usage through `ollama ps` - Enforce model context limits in a way that is similar to most cloud services - Better error and timing reporting --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-13 01:00:54 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#14695