[GH-ISSUE #15194] mlx runner failed #35486

Open
opened 2026-04-22 19:59:33 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @plainprince on GitHub (Apr 1, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15194

What is the issue?

Downloaded the new ollama 0.19 and tried a prompt on qwen3.5:35b-a3b-coding-nvfp4 . After the AI did websearch it took about a minute before it gave the following error:
500 Internal Server Error: mlx runner failed: libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Impacting Interactivity (0000000e:kIOGPUCommandBufferCallbackErrorImpactingInteractivity)

Relevant log output

Searching for steganography techniques hide information in images text 2025…
Search results for steganography techniques hide information in images text 2025
Error: 500 Internal Server Error: mlx runner failed: libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Impacting Interactivity (0000000e:kIOGPUCommandBufferCallbackErrorImpactingInteractivity)

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.19.0

Originally created by @plainprince on GitHub (Apr 1, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15194 ### What is the issue? Downloaded the new ollama 0.19 and tried a prompt on qwen3.5:35b-a3b-coding-nvfp4 . After the AI did websearch it took about a minute before it gave the following error: 500 Internal Server Error: mlx runner failed: libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Impacting Interactivity (0000000e:kIOGPUCommandBufferCallbackErrorImpactingInteractivity) ### Relevant log output ```shell Searching for steganography techniques hide information in images text 2025… Search results for steganography techniques hide information in images text 2025 Error: 500 Internal Server Error: mlx runner failed: libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Impacting Interactivity (0000000e:kIOGPUCommandBufferCallbackErrorImpactingInteractivity) ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.19.0
GiteaMirror added the bug label 2026-04-22 19:59:33 -05:00
Author
Owner

@iamkritika-official commented on GitHub (Apr 2, 2026):

Hey! I looked into this crash and found the root cause.

The macOS Metal GPU watchdog kills command buffers that take too long. With large models like qwen3.5:35b after a web search, the context gets very long and two things were happening:

  1. prefillChunkSize() was 2048 tokens — too large for a single Metal command buffer
  2. Generation loop was only syncing GPU every 256 tokens — letting buffers accumulate

I've raised a PR with a fix: reduced prefill chunk size to 512 and added periodic GPU sync every 64 tokens.

PR: https://github.com/iamkritika-official/ollama/pull/[YOUR_PR_NUMBER]

Can you test this on your machine and confirm if it resolves the crash?

<!-- gh-comment-id:4175022465 --> @iamkritika-official commented on GitHub (Apr 2, 2026): Hey! I looked into this crash and found the root cause. The macOS Metal GPU watchdog kills command buffers that take too long. With large models like qwen3.5:35b after a web search, the context gets very long and two things were happening: 1. prefillChunkSize() was 2048 tokens — too large for a single Metal command buffer 2. Generation loop was only syncing GPU every 256 tokens — letting buffers accumulate I've raised a PR with a fix: reduced prefill chunk size to 512 and added periodic GPU sync every 64 tokens. PR: https://github.com/iamkritika-official/ollama/pull/[YOUR_PR_NUMBER] Can you test this on your machine and confirm if it resolves the crash?
Author
Owner

@plainprince commented on GitHub (Apr 2, 2026):

https://github.com/iamkritika-official/ollama/pull/[YOUR_PR_NUMBER]

are you serious?
First of all, at least read your comment before you paste it from chatGPT
Second of all, what is the PR number?

<!-- gh-comment-id:4176163173 --> @plainprince commented on GitHub (Apr 2, 2026): > [https://github.com/iamkritika-official/ollama/pull/[YOUR_PR_NUMBER]](https://github.com/iamkritika-official/ollama/pull/%5BYOUR_PR_NUMBER%5D) are you serious? First of all, at least read your comment before you paste it from chatGPT Second of all, what is the PR number?
Author
Owner

@plainprince commented on GitHub (Apr 2, 2026):

Found the PR, #15206

<!-- gh-comment-id:4176168783 --> @plainprince commented on GitHub (Apr 2, 2026): Found the PR, #15206
Author
Owner

@PureBlissAK commented on GitHub (Apr 18, 2026):

🤖 Automated Triage & Analysis Report

Issue: #15194
Analyzed: 2026-04-18T18:22:54.962555

Analysis

  • Type: unknown
  • Severity: medium
  • Components: unknown

Implementation Plan

  • Effort: medium
  • Steps:

This issue has been triaged and marked for implementation.

<!-- gh-comment-id:4274310883 --> @PureBlissAK commented on GitHub (Apr 18, 2026): <!-- ollama-issue-orchestrator:v1 issue:15194 --> ## 🤖 Automated Triage & Analysis Report **Issue**: #15194 **Analyzed**: 2026-04-18T18:22:54.962555 ### Analysis - **Type**: unknown - **Severity**: medium - **Components**: unknown ### Implementation Plan - **Effort**: medium - **Steps**: *This issue has been triaged and marked for implementation.*
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#35486