[GH-ISSUE #14516] MiniCPM-o 4.5: Audio input crashes with GGML_ASSERT on macOS Metal (Apple M4) #71479

Closed
opened 2026-05-05 01:51:15 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @samuelazran on GitHub (Feb 28, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14516

Description

MiniCPM-o 4.5 (openbmb/minicpm-o4.5) crashes immediately when sending audio input via the /api/chat endpoint on macOS with Apple Silicon (M4). Vision and text-only queries work correctly — the crash is specific to audio modality.

Steps to Reproduce

  1. ollama pull openbmb/minicpm-o4.5
  2. Send an audio request:
import base64, requests

with open("test.wav", "rb") as f:
    audio_b64 = base64.b64encode(f.read()).decode()

resp = requests.post("http://localhost:11434/api/chat", json={
    "model": "openbmb/minicpm-o4.5",
    "messages": [{"role": "user", "content": "What do you hear?", "audios": [audio_b64]}],
    "stream": False,
})
print(resp.status_code, resp.text[:300])

Error

HTTP 500
{"error":"llama runner process has terminated: GGML_ASSERT([rsets->data count] == 0) failed\nWARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info.\nWARNING: GGML_BACKTRACE_LLDB may cause native MacOS Terminal.app to crash.\nSee: https://github.com/ggml-org/llama.cpp/pull/17869"}

The crash happens with any audio file (tested synthetic sine wave, real speech WAV files at 16kHz mono).

Expected Behavior

The model should process audio input and return a text response, same as it does for image/text inputs.

Workaround

Audio works correctly via llama.cpp-omni (the fork built specifically for MiniCPM-o), which handles Metal acceleration for audio without this assertion failure. The issue appears to be in Ollama's Metal backend when processing audio embeddings.

Environment

  • Ollama version: 0.17.4
  • OS: macOS 26.2 (Build 25C56)
  • Hardware: Apple M4, 16GB unified memory
  • Model: openbmb/minicpm-o4.5:latest (6.1 GB, Q4_0)
  • #14065 — MiniCPM-o 4.5 support request
  • #11798 — Audio input support for multimodal models
  • The assertion [rsets->data count] == 0 is in the Metal residency set handling code
Originally created by @samuelazran on GitHub (Feb 28, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14516 ## Description MiniCPM-o 4.5 (`openbmb/minicpm-o4.5`) crashes immediately when sending audio input via the `/api/chat` endpoint on macOS with Apple Silicon (M4). Vision and text-only queries work correctly — the crash is specific to audio modality. ## Steps to Reproduce 1. `ollama pull openbmb/minicpm-o4.5` 2. Send an audio request: ```python import base64, requests with open("test.wav", "rb") as f: audio_b64 = base64.b64encode(f.read()).decode() resp = requests.post("http://localhost:11434/api/chat", json={ "model": "openbmb/minicpm-o4.5", "messages": [{"role": "user", "content": "What do you hear?", "audios": [audio_b64]}], "stream": False, }) print(resp.status_code, resp.text[:300]) ``` ## Error ``` HTTP 500 {"error":"llama runner process has terminated: GGML_ASSERT([rsets->data count] == 0) failed\nWARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info.\nWARNING: GGML_BACKTRACE_LLDB may cause native MacOS Terminal.app to crash.\nSee: https://github.com/ggml-org/llama.cpp/pull/17869"} ``` The crash happens with any audio file (tested synthetic sine wave, real speech WAV files at 16kHz mono). ## Expected Behavior The model should process audio input and return a text response, same as it does for image/text inputs. ## Workaround Audio works correctly via [llama.cpp-omni](https://github.com/tc-mb/llama.cpp-omni) (the fork built specifically for MiniCPM-o), which handles Metal acceleration for audio without this assertion failure. The issue appears to be in Ollama's Metal backend when processing audio embeddings. ## Environment - **Ollama version**: 0.17.4 - **OS**: macOS 26.2 (Build 25C56) - **Hardware**: Apple M4, 16GB unified memory - **Model**: `openbmb/minicpm-o4.5:latest` (6.1 GB, Q4_0) ## Related - #14065 — MiniCPM-o 4.5 support request - #11798 — Audio input support for multimodal models - The assertion `[rsets->data count] == 0` is in the Metal residency set handling code
Author
Owner

@rick-github commented on GitHub (Feb 28, 2026):

Ollama doesn't currently support audio models.

<!-- gh-comment-id:3977670094 --> @rick-github commented on GitHub (Feb 28, 2026): Ollama doesn't currently support audio models.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#71479