[GH-ISSUE #15194] mlx runner failed #9726

Open
opened 2026-04-12 22:36:35 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @plainprince on GitHub (Apr 1, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15194

What is the issue?

Downloaded the new ollama 0.19 and tried a prompt on qwen3.5:35b-a3b-coding-nvfp4 . After the AI did websearch it took about a minute before it gave the following error:
500 Internal Server Error: mlx runner failed: libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Impacting Interactivity (0000000e:kIOGPUCommandBufferCallbackErrorImpactingInteractivity)

Relevant log output

Searching for steganography techniques hide information in images text 2025…
Search results for steganography techniques hide information in images text 2025
Error: 500 Internal Server Error: mlx runner failed: libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Impacting Interactivity (0000000e:kIOGPUCommandBufferCallbackErrorImpactingInteractivity)

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.19.0

Originally created by @plainprince on GitHub (Apr 1, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15194 ### What is the issue? Downloaded the new ollama 0.19 and tried a prompt on qwen3.5:35b-a3b-coding-nvfp4 . After the AI did websearch it took about a minute before it gave the following error: 500 Internal Server Error: mlx runner failed: libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Impacting Interactivity (0000000e:kIOGPUCommandBufferCallbackErrorImpactingInteractivity) ### Relevant log output ```shell Searching for steganography techniques hide information in images text 2025… Search results for steganography techniques hide information in images text 2025 Error: 500 Internal Server Error: mlx runner failed: libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Impacting Interactivity (0000000e:kIOGPUCommandBufferCallbackErrorImpactingInteractivity) ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.19.0
GiteaMirror added the bug label 2026-04-12 22:36:35 -05:00
Author
Owner

@iamkritika-official commented on GitHub (Apr 2, 2026):

Hey! I looked into this crash and found the root cause.

The macOS Metal GPU watchdog kills command buffers that take too long. With large models like qwen3.5:35b after a web search, the context gets very long and two things were happening:

  1. prefillChunkSize() was 2048 tokens — too large for a single Metal command buffer
  2. Generation loop was only syncing GPU every 256 tokens — letting buffers accumulate

I've raised a PR with a fix: reduced prefill chunk size to 512 and added periodic GPU sync every 64 tokens.

PR: https://github.com/iamkritika-official/ollama/pull/[YOUR_PR_NUMBER]

Can you test this on your machine and confirm if it resolves the crash?

<!-- gh-comment-id:4175022465 --> @iamkritika-official commented on GitHub (Apr 2, 2026): Hey! I looked into this crash and found the root cause. The macOS Metal GPU watchdog kills command buffers that take too long. With large models like qwen3.5:35b after a web search, the context gets very long and two things were happening: 1. prefillChunkSize() was 2048 tokens — too large for a single Metal command buffer 2. Generation loop was only syncing GPU every 256 tokens — letting buffers accumulate I've raised a PR with a fix: reduced prefill chunk size to 512 and added periodic GPU sync every 64 tokens. PR: https://github.com/iamkritika-official/ollama/pull/[YOUR_PR_NUMBER] Can you test this on your machine and confirm if it resolves the crash?
Author
Owner

@plainprince commented on GitHub (Apr 2, 2026):

https://github.com/iamkritika-official/ollama/pull/[YOUR_PR_NUMBER]

are you serious?
First of all, at least read your comment before you paste it from chatGPT
Second of all, what is the PR number?

<!-- gh-comment-id:4176163173 --> @plainprince commented on GitHub (Apr 2, 2026): > [https://github.com/iamkritika-official/ollama/pull/[YOUR_PR_NUMBER]](https://github.com/iamkritika-official/ollama/pull/%5BYOUR_PR_NUMBER%5D) are you serious? First of all, at least read your comment before you paste it from chatGPT Second of all, what is the PR number?
Author
Owner

@plainprince commented on GitHub (Apr 2, 2026):

Found the PR, #15206

<!-- gh-comment-id:4176168783 --> @plainprince commented on GitHub (Apr 2, 2026): Found the PR, #15206
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9726