[GH-ISSUE #13867] GGML_ASSERT crash on Apple M5 chip - regression in 0.13.x+ #55590

Open
opened 2026-04-29 09:27:27 -05:00 by GiteaMirror · 4 comments
Owner

Originally created by @jarlbrak on GitHub (Jan 23, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/13867

Description

Ollama crashes with "llama runner terminated: exit status 2" on Apple M5 chip when attempting to run any model. This appears to be a regression introduced in version 0.13.x and persists through 0.14.3.

Environment

  • Ollama Version: 0.14.3 (installed via Homebrew)
  • Operating System: macOS (Darwin 25.2.0)
  • Hardware: Apple M5 chip with Metal 4 support
  • Installation Method: Homebrew (brew install ollama)

Steps to Reproduce

  1. Install Ollama 0.14.3 on Apple M5 Mac
  2. Start Ollama server: ollama serve
  3. Attempt to run any model: ollama run nomic-embed-text
  4. Or use the embedding API: curl http://localhost:11434/api/embed -d '{"model":"nomic-embed-text","input":["test"]}'

Expected Behavior

Model loads and responds to queries/embeddings.

Actual Behavior

The llama runner crashes immediately after model loading begins:

time=2026-01-23T16:35:42.348-06:00 level=INFO source=server.go:397 msg="starting llama server" cmd="/private/var/folders/.../runners/metal/ollama_llama_server --port 62234 ..."
time=2026-01-23T16:35:42.409-06:00 level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner terminated: exit status 2"

The crash occurs in the GGML backend with an assertion failure during model loading.

Models Tested

  • nomic-embed-text:latest - crashes
  • qwen3-embedding:latest - crashes
  • All models fail with the same error

Troubleshooting Attempted

  1. ✗ Reinstalling Ollama via Homebrew
  2. ✗ Re-pulling models (ollama pull nomic-embed-text)
  3. ✗ CPU-only mode (OLLAMA_LLM_LIBRARY=cpu ollama serve)
  4. ✗ Clearing model cache
  5. Downgrade to 0.12.4 works

Workaround

Downgrading to Ollama 0.12.4 resolves the issue completely. Embeddings and model inference work correctly on 0.12.4.

# Install 0.12.4 from GitHub releases
gh release download v0.12.4 --repo ollama/ollama --pattern "ollama-darwin.tgz" --dir /tmp
tar -xzf /tmp/ollama-darwin.tgz -C ~/bin
~/bin/ollama serve

This appears related to GGML backend changes in the 0.13.x release line. Other users have reported similar issues with newer Apple Silicon chips and Metal 4 support.

Additional Context

The M5 chip uses the latest Metal 4 API. The crash may be related to Metal shader compilation or memory mapping changes in the GGML backend that are incompatible with Metal 4.

Originally created by @jarlbrak on GitHub (Jan 23, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/13867 ## Description Ollama crashes with "llama runner terminated: exit status 2" on Apple M5 chip when attempting to run any model. This appears to be a regression introduced in version 0.13.x and persists through 0.14.3. ## Environment - **Ollama Version**: 0.14.3 (installed via Homebrew) - **Operating System**: macOS (Darwin 25.2.0) - **Hardware**: Apple M5 chip with Metal 4 support - **Installation Method**: Homebrew (`brew install ollama`) ## Steps to Reproduce 1. Install Ollama 0.14.3 on Apple M5 Mac 2. Start Ollama server: `ollama serve` 3. Attempt to run any model: `ollama run nomic-embed-text` 4. Or use the embedding API: `curl http://localhost:11434/api/embed -d '{"model":"nomic-embed-text","input":["test"]}'` ## Expected Behavior Model loads and responds to queries/embeddings. ## Actual Behavior The llama runner crashes immediately after model loading begins: ``` time=2026-01-23T16:35:42.348-06:00 level=INFO source=server.go:397 msg="starting llama server" cmd="/private/var/folders/.../runners/metal/ollama_llama_server --port 62234 ..." time=2026-01-23T16:35:42.409-06:00 level=ERROR source=sched.go:455 msg="error loading llama server" error="llama runner terminated: exit status 2" ``` The crash occurs in the GGML backend with an assertion failure during model loading. ## Models Tested - `nomic-embed-text:latest` - crashes - `qwen3-embedding:latest` - crashes - All models fail with the same error ## Troubleshooting Attempted 1. ✗ Reinstalling Ollama via Homebrew 2. ✗ Re-pulling models (`ollama pull nomic-embed-text`) 3. ✗ CPU-only mode (`OLLAMA_LLM_LIBRARY=cpu ollama serve`) 4. ✗ Clearing model cache 5. ✓ **Downgrade to 0.12.4 works** ## Workaround Downgrading to Ollama 0.12.4 resolves the issue completely. Embeddings and model inference work correctly on 0.12.4. ```bash # Install 0.12.4 from GitHub releases gh release download v0.12.4 --repo ollama/ollama --pattern "ollama-darwin.tgz" --dir /tmp tar -xzf /tmp/ollama-darwin.tgz -C ~/bin ~/bin/ollama serve ``` ## Related Issues This appears related to GGML backend changes in the 0.13.x release line. Other users have reported similar issues with newer Apple Silicon chips and Metal 4 support. ## Additional Context The M5 chip uses the latest Metal 4 API. The crash may be related to Metal shader compilation or memory mapping changes in the GGML backend that are incompatible with Metal 4.
GiteaMirror added the macosbug labels 2026-04-29 09:27:27 -05:00
Author
Owner

@rick-github commented on GitHub (Jan 24, 2026):

Server log with OLLAMA_DEBUG=2.

<!-- gh-comment-id:3793234326 --> @rick-github commented on GitHub (Jan 24, 2026): Server log with `OLLAMA_DEBUG=2`.
Author
Owner

@jarlbrak commented on GitHub (Jan 24, 2026):

Debug Log with OLLAMA_DEBUG=2

After retesting with 0.14.3 and OLLAMA_DEBUG=2, I'm seeing Metal shader compilation errors but the fallback appears to work now:

ggml_metal_device_init: testing tensor API for f16 support
ggml_metal_library_init_from_source: error compiling source
ggml_metal_device_init: - the tensor API is not supported in this environment - disabling
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: loaded in 5.123 sec
ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s)
ggml_metal_device_init: GPU name:   Apple M5

The crash I originally experienced may have been intermittent. In my current test, the Metal tensor API compilation fails but it falls back to the embedded library and embeddings work.

Key observations:

  • ggml_metal_library_init_from_source: error compiling source occurs on M5
  • The fallback to embedded metal library kicks in
  • Embeddings now return successfully

The original crash (exit status 2) may have occurred when the fallback failed or wasn't triggered properly. I'm unable to reproduce the hard crash at this moment, but the Metal compilation error for the tensor API is consistently reproducible.

Would you like me to try anything specific to trigger the original crash behavior?

<!-- gh-comment-id:3793918573 --> @jarlbrak commented on GitHub (Jan 24, 2026): ## Debug Log with OLLAMA_DEBUG=2 After retesting with 0.14.3 and `OLLAMA_DEBUG=2`, I'm seeing Metal shader compilation errors but the fallback appears to work now: ``` ggml_metal_device_init: testing tensor API for f16 support ggml_metal_library_init_from_source: error compiling source ggml_metal_device_init: - the tensor API is not supported in this environment - disabling ggml_metal_library_init: using embedded metal library ggml_metal_library_init: loaded in 5.123 sec ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: Apple M5 ``` The crash I originally experienced may have been intermittent. In my current test, the Metal tensor API compilation fails but it falls back to the embedded library and embeddings work. **Key observations:** - `ggml_metal_library_init_from_source: error compiling source` occurs on M5 - The fallback to embedded metal library kicks in - Embeddings now return successfully The original crash (`exit status 2`) may have occurred when the fallback failed or wasn't triggered properly. I'm unable to reproduce the hard crash at this moment, but the Metal compilation error for the tensor API is consistently reproducible. Would you like me to try anything specific to trigger the original crash behavior?
Author
Owner

@jarlbrak commented on GitHub (Jan 24, 2026):

Update: Issue is Homebrew-specific

After extensive testing, I've identified that the crash only occurs with the Homebrew-installed version, not the GitHub-released binary.

Test Results

Installation Method Version Result
Homebrew (brew install ollama) 0.14.3 CRASHES with exit status 2
GitHub Release binary (ollama-darwin.tgz) 0.14.3 WORKS
GitHub Release binary 0.12.4 WORKS

Root Cause

The Homebrew-built binary fails with Metal shader compilation errors in Apple's MetalPerformancePrimitives framework:

ggml_metal_library_init: error: Error Domain=MTLLibraryErrorDomain Code=3
...
static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>' 
"Input types must match cooperative tensor types"

The full error shows type mismatches between bfloat and half types when compiling Metal tensor operations:

/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3266:5: 
error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>' 
"Input types must match cooperative tensor types"
    static_assert(__tensor_ops_detail::__is_same_v<_leftType, leftValueType>, "Input types must match cooperative tensor types");

Environment Details

  • Hardware: Apple M5 with Metal 4 support
  • OS: macOS Darwin 25.2.0
  • Affected: Homebrew source build only
  • Not affected: Pre-built GitHub release binaries

Conclusion

This appears to be an issue with how the Metal shaders are compiled when building from source (Homebrew) vs the pre-compiled shaders bundled in the GitHub release. The M5's Metal 4 implementation may have stricter type checking that causes the source-compiled shaders to fail.

Workaround: Use the GitHub release binary instead of Homebrew:

brew uninstall ollama
gh release download v0.14.3 --repo ollama/ollama --pattern "ollama-darwin.tgz" --dir /tmp
tar -xzf /tmp/ollama-darwin.tgz -C ~/bin
~/bin/ollama serve
<!-- gh-comment-id:3793960699 --> @jarlbrak commented on GitHub (Jan 24, 2026): ## Update: Issue is Homebrew-specific After extensive testing, I've identified that **the crash only occurs with the Homebrew-installed version**, not the GitHub-released binary. ### Test Results | Installation Method | Version | Result | |---------------------|---------|--------| | Homebrew (`brew install ollama`) | 0.14.3 | ❌ **CRASHES** with `exit status 2` | | GitHub Release binary (`ollama-darwin.tgz`) | 0.14.3 | ✅ **WORKS** | | GitHub Release binary | 0.12.4 | ✅ **WORKS** | ### Root Cause The Homebrew-built binary fails with Metal shader compilation errors in Apple's `MetalPerformancePrimitives` framework: ``` ggml_metal_library_init: error: Error Domain=MTLLibraryErrorDomain Code=3 ... static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>' "Input types must match cooperative tensor types" ``` The full error shows type mismatches between `bfloat` and `half` types when compiling Metal tensor operations: ``` /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3266:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>' "Input types must match cooperative tensor types" static_assert(__tensor_ops_detail::__is_same_v<_leftType, leftValueType>, "Input types must match cooperative tensor types"); ``` ### Environment Details - **Hardware**: Apple M5 with Metal 4 support - **OS**: macOS Darwin 25.2.0 - **Affected**: Homebrew source build only - **Not affected**: Pre-built GitHub release binaries ### Conclusion This appears to be an issue with how the Metal shaders are compiled when building from source (Homebrew) vs the pre-compiled shaders bundled in the GitHub release. The M5's Metal 4 implementation may have stricter type checking that causes the source-compiled shaders to fail. **Workaround**: Use the GitHub release binary instead of Homebrew: ```bash brew uninstall ollama gh release download v0.14.3 --repo ollama/ollama --pattern "ollama-darwin.tgz" --dir /tmp tar -xzf /tmp/ollama-darwin.tgz -C ~/bin ~/bin/ollama serve ```
Author
Owner

@natl-set commented on GitHub (Feb 19, 2026):

the workaround for now is to set GGML_METAL_TENSOR_DISABLE=1

<!-- gh-comment-id:3926403495 --> @natl-set commented on GitHub (Feb 19, 2026): the workaround for now is to set GGML_METAL_TENSOR_DISABLE=1
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#55590