[PR #14919] [MERGED] Fix mlxrunner subprocess deadlocks #20189

Closed
opened 2026-04-16 07:29:49 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ollama/ollama/pull/14919
Author: @jessegross
Created: 3/18/2026
Status: Merged
Merged: 3/20/2026
Merged by: @jessegross

Base: mainHead: jessegross/mlx-log


📝 Commits (2)

  • 0fdfdcf mlx: fix subprocess log deadlock
  • 011aa26 llm, mlxrunner: fix done channel value consumed by first receiver

📊 Changes

2 files changed (+79 additions, -34 deletions)

View changed files

📝 llm/server.go (+8 -6)
📝 x/mlxrunner/client.go (+71 -28)

📄 Description

  • Deadlock on long stderr lines: The stderr reader used bufio.Scanner which stops at 64KB lines. If the subprocess exceeded this, the OS pipe buffer filled and the subprocess deadlocked. Replaced with an io.Copy-based statusWriter that streams stderr without buffering constraints.
  • Done channel consumed by first receiver: The done chan error only delivers its value once, so whichever of WaitUntilRunning, HasExited, or Close reads it first consumes the signal. This caused HasExited to return false after already returning true, and Close to block for 5s on an already-dead process. Replaced with a closed chan struct{} (receivable any number of times) plus a separate doneErr field. Applied the same fix to llm/server.go.

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ollama/ollama/pull/14919 **Author:** [@jessegross](https://github.com/jessegross) **Created:** 3/18/2026 **Status:** ✅ Merged **Merged:** 3/20/2026 **Merged by:** [@jessegross](https://github.com/jessegross) **Base:** `main` ← **Head:** `jessegross/mlx-log` --- ### 📝 Commits (2) - [`0fdfdcf`](https://github.com/ollama/ollama/commit/0fdfdcf35e5e53e20300cb73cb3eb168eed807ca) mlx: fix subprocess log deadlock - [`011aa26`](https://github.com/ollama/ollama/commit/011aa263fa62e93cf28a7e03c6ae4b1258d0440a) llm, mlxrunner: fix done channel value consumed by first receiver ### 📊 Changes **2 files changed** (+79 additions, -34 deletions) <details> <summary>View changed files</summary> 📝 `llm/server.go` (+8 -6) 📝 `x/mlxrunner/client.go` (+71 -28) </details> ### 📄 Description - Deadlock on long stderr lines: The stderr reader used bufio.Scanner which stops at 64KB lines. If the subprocess exceeded this, the OS pipe buffer filled and the subprocess deadlocked. Replaced with an io.Copy-based statusWriter that streams stderr without buffering constraints. - Done channel consumed by first receiver: The done chan error only delivers its value once, so whichever of WaitUntilRunning, HasExited, or Close reads it first consumes the signal. This caused HasExited to return false after already returning true, and Close to block for 5s on an already-dead process. Replaced with a closed chan struct{} (receivable any number of times) plus a separate doneErr field. Applied the same fix to llm/server.go. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-16 07:29:49 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#20189