[PR #15155] [CLOSED] mlx: enhance multimodal pipeline with snapshot scheduling and diagnostics #25593

Closed
opened 2026-04-19 18:17:53 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/15155
Author: @dhiltgen
Created: 3/31/2026
Status: Closed

Base: pdevine/qwen35_visionHead: vision_enhancements


📝 Commits (1)

  • 6a0e27a mlx: enhance multimodal pipeline with snapshot scheduling and diagnostics

📊 Changes

4 files changed (+288 additions, -130 deletions)

View changed files

📝 x/mlxrunner/cache.go (+107 -87)
📝 x/mlxrunner/client.go (+109 -29)
📝 x/mlxrunner/model/base/multimodal.go (+8 -3)
📝 x/mlxrunner/pipeline.go (+64 -11)

📄 Description

Enhancements on top of the multimodal infrastructure from #14968:

  • Cache: multi-snapshot system (pendingSnapshots) with requestSnapshot() and nextPendingSnapshot(), refined switchToPath with partial-match awareness, improved eviction policy
  • Pipeline: periodic snapshots every 8192 tokens, pre-thinking snapshot, stateless MultimodalPromptTokenizer fallback, clearPromptState() for compile-friendly generation, generation trace diagnostics
  • Client: statusWriter with circular buffer replacing lastErr/lock, CUDA header detection for MLX JIT compiler, done channel refactor
  • Multimodal: added MultimodalPromptTokenizer (stateless variant)

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/15155 **Author:** [@dhiltgen](https://github.com/dhiltgen) **Created:** 3/31/2026 **Status:** ❌ Closed **Base:** `pdevine/qwen35_vision` ← **Head:** `vision_enhancements` --- ### 📝 Commits (1) - [`6a0e27a`](https://github.com/ollama/ollama/commit/6a0e27a4e833d20f09e726fbd7f64c10531116a3) mlx: enhance multimodal pipeline with snapshot scheduling and diagnostics ### 📊 Changes **4 files changed** (+288 additions, -130 deletions) <details> <summary>View changed files</summary> 📝 `x/mlxrunner/cache.go` (+107 -87) 📝 `x/mlxrunner/client.go` (+109 -29) 📝 `x/mlxrunner/model/base/multimodal.go` (+8 -3) 📝 `x/mlxrunner/pipeline.go` (+64 -11) </details> ### 📄 Description Enhancements on top of the multimodal infrastructure from #14968: - Cache: multi-snapshot system (pendingSnapshots) with requestSnapshot() and nextPendingSnapshot(), refined switchToPath with partial-match awareness, improved eviction policy - Pipeline: periodic snapshots every 8192 tokens, pre-thinking snapshot, stateless MultimodalPromptTokenizer fallback, clearPromptState() for compile-friendly generation, generation trace diagnostics - Client: statusWriter with circular buffer replacing lastErr/lock, CUDA header detection for MLX JIT compiler, done channel refactor - Multimodal: added MultimodalPromptTokenizer (stateless variant) --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-19 18:17:53 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#25593