[GH-ISSUE #15526] macOS 26.3.2 / Apple Silicon: all models fail to load, llama runner process terminates, Metal backend initialization fails #56435

Closed
opened 2026-04-29 10:49:20 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @kyoungsook70 on GitHub (Apr 12, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15526

What is the issue?

Hi, I’m seeing a persistent runner crash on an Apple Silicon MacBook Pro.

Environment

  • Ollama version: 0.20.5
  • macOS version: 26.3.2
  • Hardware: Apple Silicon MacBook Pro

Problem

All models fail during the load stage, including very small models.

Tested models:

  • gemma4:e4b
  • gemma4:e2b
  • gemma3:4b
  • qwen2.5:0.5b

They download successfully, verify successfully, and write the manifest successfully, but fail immediately when loading.

Typical error:

Error: 500 Internal Server Error: llama runner process has terminated: %!w(<nil>)

In some cases I also saw:

Error: 500 Internal Server Error: model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details

What I already tried

  • Reinstalled Ollama completely
  • Removed /Applications/Ollama.app and installed again
  • Moved ~/.ollama to ~/.ollama_backup
  • Removed quarantine attribute:
xattr -dr com.apple.quarantine /Applications/Ollama.app
  • Restarted Ollama
  • Tried CPU-only:
OLLAMA_NUM_GPU=0 ollama run qwen2.5:0.5b

Result

The same failure still happens, including with the very small qwen2.5:0.5b model.

Relevant logs

The key log lines are:

ggml_metal_init: error: failed to initialize the Metal library
ggml_backend_metal_device_init: error: failed to allocate context
llama_init_from_model: failed to initialize the context: failed to initialize Metal backend
panic: unable to create llama context

I also see MetalPerformancePrimitives-related errors above that, including a type mismatch involving bfloat and half, for example:

error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<half, bfloat>'
"Input types must match cooperative tensor types"

Reproduction

This reproduces consistently with:

ollama run qwen2.5:0.5b

The model downloads successfully, but the runner terminates during load.

Please let me know if you want the full server.log contents or any additional diagnostics.

This also reproduces with:
OLLAMA_NUM_GPU=0 ollama run qwen2.5:0.5b

Relevant log output


OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @kyoungsook70 on GitHub (Apr 12, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15526 ### What is the issue? Hi, I’m seeing a persistent runner crash on an Apple Silicon MacBook Pro. ## Environment * Ollama version: 0.20.5 * macOS version: 26.3.2 * Hardware: Apple Silicon MacBook Pro ## Problem All models fail during the load stage, including very small models. Tested models: * `gemma4:e4b` * `gemma4:e2b` * `gemma3:4b` * `qwen2.5:0.5b` They download successfully, verify successfully, and write the manifest successfully, but fail immediately when loading. Typical error: ```text Error: 500 Internal Server Error: llama runner process has terminated: %!w(<nil>) ``` In some cases I also saw: ```text Error: 500 Internal Server Error: model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details ``` ## What I already tried * Reinstalled Ollama completely * Removed `/Applications/Ollama.app` and installed again * Moved `~/.ollama` to `~/.ollama_backup` * Removed quarantine attribute: ```bash xattr -dr com.apple.quarantine /Applications/Ollama.app ``` * Restarted Ollama * Tried CPU-only: ```bash OLLAMA_NUM_GPU=0 ollama run qwen2.5:0.5b ``` ## Result The same failure still happens, including with the very small `qwen2.5:0.5b` model. ## Relevant logs The key log lines are: ```text ggml_metal_init: error: failed to initialize the Metal library ggml_backend_metal_device_init: error: failed to allocate context llama_init_from_model: failed to initialize the context: failed to initialize Metal backend panic: unable to create llama context ``` I also see MetalPerformancePrimitives-related errors above that, including a type mismatch involving `bfloat` and `half`, for example: ```text error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<half, bfloat>' "Input types must match cooperative tensor types" ``` ## Reproduction This reproduces consistently with: ```bash ollama run qwen2.5:0.5b ``` The model downloads successfully, but the runner terminates during load. Please let me know if you want the full `server.log` contents or any additional diagnostics. This also reproduces with: OLLAMA_NUM_GPU=0 ollama run qwen2.5:0.5b ### Relevant log output ```shell ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the bug label 2026-04-29 10:49:20 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#56435