[GH-ISSUE #15448] Apple M5 + macOS 26.3.1: 500 Internal Server Error, Metal compiler failed to build request even with OLLAMA_NUM_GPU=0 #9873

Open
opened 2026-04-12 22:44:02 -05:00 by GiteaMirror · 15 comments
Owner

Originally created by @frankcsliu on GitHub (Apr 9, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15448

What is the issue?

Describe the bug

On a MacBook Pro with Apple M5 chip and macOS 26.3.1, any attempt to run a model fails with:

Error: 500 Internal Server Error: llama runner process has terminated: %!w(<nil>)

This happens even when forcing CPU-only mode with OLLAMA_NUM_GPU=0. The system logs show repeated Metal shader compilation failures right before the runner terminates.


Environment

  • Device: MacBook Pro (Apple M5, 24 GB RAM)
  • OS: macOS 26.3.1 (from sw_vers):
    • ProductVersion: 26.3.1
    • ProductVersionExtra: (a)
    • BuildVersion: 25D771280a
  • Ollama version: ollama --version0.20.4
  • Install method: Homebrew / official install script / .dmg (please specify)

Steps to reproduce

On this machine:

  1. Install Ollama 0.20.4
  2. Pull a model, e.g.:
    ollama pull llama3.1:8b
    
  3. Try to run the model in CPU-only mode:
    OLLAMA_NUM_GPU=0 ollama run llama3.1:8b
    

Expected behavior

The model should start and respond (especially in CPU-only mode with OLLAMA_NUM_GPU=0).

Actual behavior

The command immediately fails with:

Error: 500 Internal Server Error: llama runner process has terminated: %!w(<nil>)

No additional error details are printed in the terminal.


Relevant logs

From:

log stream --predicate 'process == "ollama"' --info | grep -iE 'Error|Metal|fault|assert|panic'

I see (excerpt):

2026-04-09 17:49:22.708860+0800 0xa1b5     Activity    0x68f85              1940   0    ollama: (Metal) Metal Compiling Shader
2026-04-09 17:49:23.640608+0800 0xa1b5     Error       0x68f85              1940   0    ollama: (Metal) Compiler failed to build request

2026-04-09 17:49:23.799570+0800 0xa1cc     Error       0x0                  1998   0    ollama: (libCoreFSCache.dylib) flock failed to lock list file (<private>): errno = 35
2026-04-09 17:49:23.799940+0800 0xa1cc     Error       0x0                  1998   0    ollama: (libCoreFSCache.dylib) flock failed to lock list file (<private>): errno = 35

# more lines with MTLCompilerService and Metal Compiling Shader ...

The 500 error (llama runner process has terminated: %!w(<nil>)) appears every time right after the Metal compiler failure.


What I have already tried

  • Fully reinstalling Ollama (removing ~/.ollama, caches, logs) and installing 0.20.4 again
  • Rebooting the machine
  • Forcing CPU-only mode:
    OLLAMA_NUM_GPU=0 ollama run llama3.1:8b
    
  • Deleting and re-pulling the model:
    ollama rm llama3.1:8b
    ollama pull llama3.1:8b
    

All of the above still result in the same 500 error and the Metal compiler failure in the logs.


Questions

  • Is this a known regression for Apple M5 + macOS 26.3.1 in the current Metal / GGML backend?
  • Is there any workaround to run models on this machine (even CPU-only) until a fix is available?

Relevant log output


OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @frankcsliu on GitHub (Apr 9, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15448 ### What is the issue? **Describe the bug** On a MacBook Pro with Apple M5 chip and macOS 26.3.1, any attempt to run a model fails with: `Error: 500 Internal Server Error: llama runner process has terminated: %!w(<nil>)` This happens even when forcing CPU-only mode with `OLLAMA_NUM_GPU=0`. The system logs show repeated Metal shader compilation failures right before the runner terminates. --- **Environment** - Device: MacBook Pro (Apple M5, 24 GB RAM) - OS: macOS 26.3.1 (from `sw_vers`): - `ProductVersion`: `26.3.1` - `ProductVersionExtra`: `(a)` - `BuildVersion`: `25D771280a` - Ollama version: `ollama --version` → `0.20.4` - Install method: Homebrew / official install script / .dmg (please specify) --- **Steps to reproduce** On this machine: 1. Install Ollama 0.20.4 2. Pull a model, e.g.: ```bash ollama pull llama3.1:8b ``` 3. Try to run the model in CPU-only mode: ```bash OLLAMA_NUM_GPU=0 ollama run llama3.1:8b ``` **Expected behavior** The model should start and respond (especially in CPU-only mode with `OLLAMA_NUM_GPU=0`). **Actual behavior** The command immediately fails with: ```text Error: 500 Internal Server Error: llama runner process has terminated: %!w(<nil>) ``` No additional error details are printed in the terminal. --- **Relevant logs** From: ```bash log stream --predicate 'process == "ollama"' --info | grep -iE 'Error|Metal|fault|assert|panic' ``` I see (excerpt): ```text 2026-04-09 17:49:22.708860+0800 0xa1b5 Activity 0x68f85 1940 0 ollama: (Metal) Metal Compiling Shader 2026-04-09 17:49:23.640608+0800 0xa1b5 Error 0x68f85 1940 0 ollama: (Metal) Compiler failed to build request 2026-04-09 17:49:23.799570+0800 0xa1cc Error 0x0 1998 0 ollama: (libCoreFSCache.dylib) flock failed to lock list file (<private>): errno = 35 2026-04-09 17:49:23.799940+0800 0xa1cc Error 0x0 1998 0 ollama: (libCoreFSCache.dylib) flock failed to lock list file (<private>): errno = 35 # more lines with MTLCompilerService and Metal Compiling Shader ... ``` The 500 error (`llama runner process has terminated: %!w(<nil>)`) appears every time right after the Metal compiler failure. --- **What I have already tried** - Fully reinstalling Ollama (removing `~/.ollama`, caches, logs) and installing 0.20.4 again - Rebooting the machine - Forcing CPU-only mode: ```bash OLLAMA_NUM_GPU=0 ollama run llama3.1:8b ``` - Deleting and re-pulling the model: ```bash ollama rm llama3.1:8b ollama pull llama3.1:8b ``` All of the above still result in the same 500 error and the Metal compiler failure in the logs. --- **Questions** - Is this a known regression for Apple M5 + macOS 26.3.1 in the current Metal / GGML backend? - Is there any workaround to run models on this machine (even CPU-only) until a fix is available? ### Relevant log output ```shell ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the bug label 2026-04-12 22:44:02 -05:00
Author
Owner

@rick-github commented on GitHub (Apr 9, 2026):

OLLAMA_NUM_GPU is not an ollama configuration variable.

Server logs will aid in debugging.

<!-- gh-comment-id:4213379819 --> @rick-github commented on GitHub (Apr 9, 2026): `OLLAMA_NUM_GPU` is not an ollama configuration variable. [Server logs](https://docs.ollama.com/troubleshooting) will aid in debugging.
Author
Owner

@egdavid commented on GitHub (Apr 9, 2026):

Same here on the M5. Downgrading to 0.20.3 seems to be the only stable fix for now, couldn't find a workaround for the current build.

<!-- gh-comment-id:4213434094 --> @egdavid commented on GitHub (Apr 9, 2026): Same here on the M5. Downgrading to 0.20.3 seems to be the only stable fix for now, couldn't find a workaround for the current build.
Author
Owner

@NadavM-ux commented on GitHub (Apr 9, 2026):

Same here, using Macbook pro M5 , Downgrading to 0.20.3 worked for me.

<!-- gh-comment-id:4213457966 --> @NadavM-ux commented on GitHub (Apr 9, 2026): Same here, using Macbook pro M5 , Downgrading to 0.20.3 worked for me.
Author
Owner

@poreporec commented on GitHub (Apr 9, 2026):

Same

<!-- gh-comment-id:4213502464 --> @poreporec commented on GitHub (Apr 9, 2026): Same
Author
Owner

@Alex4386 commented on GitHub (Apr 9, 2026):

It seems they shipped the ollama with libmlxc binary for x86_64 on arm64 platforms on ollama 0.20.4.

Error: error loading model: 500 Internal Server Error: mlx runner failed: Error: failed to initialize MLX: MLX: Failed to load /Applications/Ollama.app/Contents/Resources/libmlxc.dylib: dlopen(/Applications/Ollama.app/Contents/Resources/libmlxc.dylib, 0x0009): tried: '/Applications/Ollama.app/Contents/Resources/libmlxc.dylib' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e' or 'arm64e.v1' or 'arm64' or 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/Applications/Ollama.app/Contents/Resources/libmlxc.dylib' (no such file), '/Applications/Ollama.app/Contents/Resources/libmlxc.dylib (exit: exit status 1)
<!-- gh-comment-id:4214084462 --> @Alex4386 commented on GitHub (Apr 9, 2026): It seems they shipped the ollama with libmlxc binary for `x86_64` on `arm64` platforms on ollama 0.20.4. ``` Error: error loading model: 500 Internal Server Error: mlx runner failed: Error: failed to initialize MLX: MLX: Failed to load /Applications/Ollama.app/Contents/Resources/libmlxc.dylib: dlopen(/Applications/Ollama.app/Contents/Resources/libmlxc.dylib, 0x0009): tried: '/Applications/Ollama.app/Contents/Resources/libmlxc.dylib' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e' or 'arm64e.v1' or 'arm64' or 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/Applications/Ollama.app/Contents/Resources/libmlxc.dylib' (no such file), '/Applications/Ollama.app/Contents/Resources/libmlxc.dylib (exit: exit status 1) ```
Author
Owner

@yoon-jeong-ho15 commented on GitHub (Apr 9, 2026):

downgrading to 0.20.3 also worked for me.

<!-- gh-comment-id:4214255241 --> @yoon-jeong-ho15 commented on GitHub (Apr 9, 2026): downgrading to 0.20.3 also worked for me.
Author
Owner

@ThomasHorn1967 commented on GitHub (Apr 9, 2026):

same here, downgrading to 0.20.3 also worked for me.

<!-- gh-comment-id:4215043071 --> @ThomasHorn1967 commented on GitHub (Apr 9, 2026): same here, downgrading to 0.20.3 also worked for me.
Author
Owner

@fairyorbital commented on GitHub (Apr 9, 2026):

Same here.
Disabling Flash Attention with OLLAMA_FLASH_ATTENTION=0 did NOT resolve the issue. Only downgrading to 0.20.3 worked for me.

My Environment:

  • Ollama Version: v0.20.4 (Regression)
  • Hardware: Apple M5 Pro, 64GB RAM
  • OS: macOS

Error Log Snippet (from my server logs):

ggml_metal_library_init: error: Error Domain=MTLLibraryErrorDomain Code=3 "program_source:7131:28: warning: variable 'theta_base' is used uninitialized..."
...
 /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3266:5: error: static_assert failed due to requirement
'__tensor_ops_detail::__is_same_v<bfloat, half>' "Input types must match cooperative tensor types"
...
ggml_metal_init: error: failed to create library
ggml_backend_metal_device_init: error: failed to allocate context
ggml-backend.cpp:258: GGML_ASSERT(backend) failed
SIGABRT: abort
<!-- gh-comment-id:4217440885 --> @fairyorbital commented on GitHub (Apr 9, 2026): Same here. Disabling Flash Attention with `OLLAMA_FLASH_ATTENTION=0` did NOT resolve the issue. Only downgrading to 0.20.3 worked for me. My Environment: - Ollama Version: v0.20.4 (Regression) - Hardware: Apple M5 Pro, 64GB RAM - OS: macOS Error Log Snippet (from my server logs): ``` ggml_metal_library_init: error: Error Domain=MTLLibraryErrorDomain Code=3 "program_source:7131:28: warning: variable 'theta_base' is used uninitialized..." ... /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3266:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>' "Input types must match cooperative tensor types" ... ggml_metal_init: error: failed to create library ggml_backend_metal_device_init: error: failed to allocate context ggml-backend.cpp:258: GGML_ASSERT(backend) failed SIGABRT: abort ```
Author
Owner

@frankli0324 commented on GitHub (Apr 9, 2026):

I this duplicates/closely relates to https://github.com/ollama/ollama/issues/14432

<!-- gh-comment-id:4218443579 --> @frankli0324 commented on GitHub (Apr 9, 2026): I this duplicates/closely relates to https://github.com/ollama/ollama/issues/14432
Author
Owner

@code-retriever commented on GitHub (Apr 10, 2026):

Confirming on Apple M5 Max (128GB, 40 GPU cores) — macOS 26.3.1

Version bisection results

Version Metal 4 GPU detection Model loading Notes
v0.19.0 Metal — Apple M5 Max, 107.5 GiB Works gemma4 architecture not supported
v0.20.0 Metal Works gemma4 not yet supported
v0.20.2 Metal Works gemma4:e4b , gemma4:26b partial (see note below)
v0.20.3 Metal Works All models pass — this is the last working version for M5 Max
v0.20.4 CPU only All fail Regression introduced here
v0.20.5 CPU only All fail Same as v0.20.4

Note: On M5 Max 128GB, v0.20.2 and v0.20.3 both work. Other reporters with M5 24GB see failures on v0.20.2 — this may be a GPU core count / memory difference.

Root cause: kernel_rope_multi shader

On v0.20.5 with verbose logging, the Metal library compilation error is visible:

ggml_metal_library_init: error: Error Domain=MTLLibraryErrorDomain Code=3
  "program_source:7131:28: warning: variable 'theta_base' is used uninitialized
   whenever 'if' condition is false [-Wsometimes-uninitialized]
   } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t

The Metal 4 compiler (MTLGPUFamilyMetal4 / MTLGPUFamilyApple10) treats this uninitialized variable as a fatal error, whereas Metal 3 was more lenient.

What doesn't work

  • OLLAMA_FLASH_ATTENTION=0 — no effect (crash happens before FA is relevant)
  • OLLAMA_KV_CACHE_TYPE="" — no effect
  • OLLAMA_LLM_LIBRARY=cpu — runner still initializes Metal library and crashes
  • OLLAMA_NUM_GPU=0 — same, Metal init crash kills the process before CPU fallback
  • Clean environment (env -i) — same result

Workaround

Downgrade to v0.20.3 (confirmed working).

pkill -f ollama
gh release download v0.20.3 --repo ollama/ollama --pattern "Ollama-darwin.zip" --dir /tmp
rm -rf /Applications/Ollama.app
unzip -q /tmp/Ollama-darwin.zip -d /Applications/

Environment: Apple M5 Max, 128GB unified memory, 40 GPU cores, macOS 26.3.1 (Build 25D2128), Xcode CLT 26.4.

<!-- gh-comment-id:4223596689 --> @code-retriever commented on GitHub (Apr 10, 2026): ## Confirming on Apple M5 Max (128GB, 40 GPU cores) — macOS 26.3.1 ### Version bisection results | Version | Metal 4 GPU detection | Model loading | Notes | |---------|----------------------|---------------|-------| | **v0.19.0** | ✅ Metal — Apple M5 Max, 107.5 GiB | ✅ Works | gemma4 architecture not supported | | **v0.20.0** | ✅ Metal | ✅ Works | gemma4 not yet supported | | **v0.20.2** | ✅ Metal | ✅ Works | gemma4:e4b ✅, gemma4:26b partial (see note below) | | **v0.20.3** | ✅ Metal | ✅ Works | All models pass — **this is the last working version for M5 Max** | | **v0.20.4** | ❌ CPU only | ❌ All fail | Regression introduced here | | **v0.20.5** | ❌ CPU only | ❌ All fail | Same as v0.20.4 | > Note: On M5 Max 128GB, v0.20.2 and v0.20.3 both work. Other reporters with M5 24GB see failures on v0.20.2 — this may be a GPU core count / memory difference. ### Root cause: `kernel_rope_multi` shader On v0.20.5 with verbose logging, the Metal library compilation error is visible: ``` ggml_metal_library_init: error: Error Domain=MTLLibraryErrorDomain Code=3 "program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ``` The Metal 4 compiler (MTLGPUFamilyMetal4 / MTLGPUFamilyApple10) treats this uninitialized variable as a fatal error, whereas Metal 3 was more lenient. ### What doesn't work - `OLLAMA_FLASH_ATTENTION=0` — no effect (crash happens before FA is relevant) - `OLLAMA_KV_CACHE_TYPE=""` — no effect - `OLLAMA_LLM_LIBRARY=cpu` — runner still initializes Metal library and crashes - `OLLAMA_NUM_GPU=0` — same, Metal init crash kills the process before CPU fallback - Clean environment (`env -i`) — same result ### Workaround Downgrade to **v0.20.3** (confirmed working). ```bash pkill -f ollama gh release download v0.20.3 --repo ollama/ollama --pattern "Ollama-darwin.zip" --dir /tmp rm -rf /Applications/Ollama.app unzip -q /tmp/Ollama-darwin.zip -d /Applications/ ``` Environment: Apple M5 Max, 128GB unified memory, 40 GPU cores, macOS 26.3.1 (Build 25D2128), Xcode CLT 26.4.
Author
Owner

@executeautomation commented on GitHub (Apr 11, 2026):

I am having the same issue as well with MacBook Pro M5 Max 128GB

Image
<!-- gh-comment-id:4228079775 --> @executeautomation commented on GitHub (Apr 11, 2026): I am having the same issue as well with MacBook Pro M5 Max 128GB <img width="1453" height="1063" alt="Image" src="https://github.com/user-attachments/assets/21c27356-5764-4d88-ac8a-7cc414335a35" />
Author
Owner

@chenghongm commented on GitHub (Apr 12, 2026):

ollama --version
ollama version is 0.20.5

same issue: "Error: 500 Internal Server Error: model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details", no log at all

<!-- gh-comment-id:4230499686 --> @chenghongm commented on GitHub (Apr 12, 2026): ollama --version ollama version is 0.20.5 same issue: "Error: 500 Internal Server Error: model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details", no log at all
Author
Owner

@reawake97 commented on GitHub (Apr 12, 2026):

Downgrading to 0.20.3 worked for me.

<!-- gh-comment-id:4230613051 --> @reawake97 commented on GitHub (Apr 12, 2026): Downgrading to 0.20.3 worked for me.
Author
Owner

@boite2chocolat commented on GitHub (Apr 12, 2026):

Hello folks
Same here, my macbook pro m5 with 24gb of ram is getting the same error when install models.
Downgrading to 0.20.3 worked for me as well.

<!-- gh-comment-id:4231578293 --> @boite2chocolat commented on GitHub (Apr 12, 2026): Hello folks Same here, my macbook pro m5 with 24gb of ram is getting the same error when install models. Downgrading to 0.20.3 worked for me as well.
Author
Owner

@exoad commented on GitHub (Apr 12, 2026):

downgrading to 0.20.3 worked for me as well on my m5 air!

<!-- gh-comment-id:4232275102 --> @exoad commented on GitHub (Apr 12, 2026): downgrading to 0.20.3 worked for me as well on my m5 air!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9873