[GH-ISSUE #13460] Metal library compilation error after macOS 26.2 / Xcode CLT update: bfloat/half type mismatch #70940

Closed
opened 2026-05-04 23:30:33 -05:00 by GiteaMirror · 14 comments
Owner

Originally created by @StrangerSVN on GitHub (Dec 13, 2025).
Original GitHub issue: https://github.com/ollama/ollama/issues/13460

What is the issue?

After updating to macOS 26.2 and the latest Xcode Command Line Tools, Ollama fails to load any model with a Metal library compilation error. The error occurs during Metal backend initialization with type mismatches between bfloat and half types in the Metal Performance Primitives framework.
This issue started immediately after the OS/toolchain update - Ollama was working correctly before the update.

ggml_metal_library_init: error: Error Domain=MTLLibraryErrorDomain Code=3
error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>' 
"Input types must match cooperative tensor types"

Relevant log output

ggml_metal_device_init: testing tensor API for f16 support
ggml_metal_device_init: testing tensor API for bfloat support
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: error: Error Domain=MTLLibraryErrorDomain Code=3 "program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7170:76: note: in instantiation of function template specialization 'kernel_rope_multi<float>' requested here
template [[host_name("kernel_rope_multi_f32")]] kernel kernel_rope_multi_t kernel_rope_multi<float>;
                                                                           ^
program_source:7068:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7050:24: note: remove the 'if' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~
program_source:7068:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7050:28: note: remove the '&&' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~
program_source:7044:29: note: initialize the variable 'theta_base' to silence this warning
            float theta_base;
                            ^
                             = 0.0
program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7171:76: note: in instantiation of function template specialization 'kernel_rope_multi<half>' requested here
template [[host_name("kernel_rope_multi_f16")]] kernel kernel_rope_multi_t kernel_rope_multi<half>;
                                                                           ^
program_source:7068:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7050:24: note: remove the 'if' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~
program_source:7068:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7050:28: note: remove the '&&' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~
program_source:7044:29: note: initialize the variable 'theta_base' to silence this warning
            float theta_base;
                            ^
                             = 0.0
In file included from program_source:2816:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368:
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3266:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>' "Input types must match cooperative tensor types"
    static_assert(__tensor_ops_detail::__is_same_v<_leftType, leftValueType>, "Input types must match cooperative tensor types");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:394:24: note: in instantiation of function template specialization 'mpp::tensor_ops::__mutmul2d_detail::__run<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>, metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>>' requested here
    __mutmul2d_detail::__run<Descriptor, Scope, LeftOperandType,
                       ^
program_source:12046:12: note: in instantiation of function template specialization 'mpp::tensor_ops::matmul2d<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>>::run<metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>, void>' requested here
        mm.run(sB, sA, cT);
           ^
In file included from program_source:2816:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368:
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3267:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<half, bfloat>' "Input types must match cooperative tensor types"
    static_assert(__tensor_ops_detail::__is_same_v<_rightType, rightValueType>, "Input types must match cooperative tensor types");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
" UserInfo={NSLocalizedDescription=program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7170:76: note: in instantiation of function template specialization 'kernel_rope_multi<float>' requested here
template [[host_name("kernel_rope_multi_f32")]] kernel kernel_rope_multi_t kernel_rope_multi<float>;
                                                                           ^
program_source:7068:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7050:24: note: remove the 'if' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~
program_source:7068:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7050:28: note: remove the '&&' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~
program_source:7044:29: note: initialize the variable 'theta_base' to silence this warning
            float theta_base;
                            ^
                             = 0.0
program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7171:76: note: in instantiation of function template specialization 'kernel_rope_multi<half>' requested here
template [[host_name("kernel_rope_multi_f16")]] kernel kernel_rope_multi_t kernel_rope_multi<half>;
                                                                           ^
program_source:7068:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7050:24: note: remove the 'if' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~
program_source:7068:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7050:28: note: remove the '&&' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~
program_source:7044:29: note: initialize the variable 'theta_base' to silence this warning
            float theta_base;
                            ^
                             = 0.0
In file included from program_source:2816:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368:
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3266:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>' "Input types must match cooperative tensor types"
    static_assert(__tensor_ops_detail::__is_same_v<_leftType, leftValueType>, "Input types must match cooperative tensor types");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:394:24: note: in instantiation of function template specialization 'mpp::tensor_ops::__mutmul2d_detail::__run<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>, metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>>' requested here
    __mutmul2d_detail::__run<Descriptor, Scope, LeftOperandType,
                       ^
program_source:12046:12: note: in instantiation of function template specialization 'mpp::tensor_ops::matmul2d<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>>::run<metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>, void>' requested here
        mm.run(sB, sA, cT);
           ^
In file included from program_source:2816:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368:
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3267:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<half, bfloat>' "Input types must match cooperative tensor types"
    static_assert(__tensor_ops_detail::__is_same_v<_rightType, rightValueType>, "Input types must match cooperative tensor types");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
}
ggml_metal_device_init: error: failed to create library
ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s)
ggml_metal_device_init: GPU name:   Apple M5
ggml_metal_device_init: GPU family: MTLGPUFamilyApple10  (1010)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4  (5002)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: has tensor            = true
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 12713.12 MB
llama_model_load_from_file_impl: using device Metal (Apple M5) (unknown id) - 12123 MiB free
llama_model_loader: loaded meta data with 34 key-value pairs and 339 tensors from /Users/stranger/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 Coder 7B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Qwen2.5-Coder
llama_model_loader: - kv   5:                         general.size_label str              = 7B
llama_model_loader: - kv   6:                            general.license str              = apache-2.0
llama_model_loader: - kv   7:                       general.license.link str              = https://huggingface.co/Qwen/Qwen2.5-C...
llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
llama_model_loader: - kv   9:                  general.base_model.0.name str              = Qwen2.5 Coder 7B
llama_model_loader: - kv  10:          general.base_model.0.organization str              = Qwen
llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/Qwen/Qwen2.5-C...
llama_model_loader: - kv  12:                               general.tags arr[str,6]       = ["code", "codeqwen", "chat", "qwen", ...
llama_model_loader: - kv  13:                          general.languages arr[str,1]       = ["en"]
llama_model_loader: - kv  14:                          qwen2.block_count u32              = 28
llama_model_loader: - kv  15:                       qwen2.context_length u32              = 32768
llama_model_loader: - kv  16:                     qwen2.embedding_length u32              = 3584
llama_model_loader: - kv  17:                  qwen2.feed_forward_length u32              = 18944
llama_model_loader: - kv  18:                 qwen2.attention.head_count u32              = 28
llama_model_loader: - kv  19:              qwen2.attention.head_count_kv u32              = 4
llama_model_loader: - kv  20:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  21:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  22:                          general.file_type u32              = 15
llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,152064]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  26:                  tokenizer.ggml.token_type arr[i32,152064]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  27:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  28:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  29:            tokenizer.ggml.padding_token_id u32              = 151643
llama_model_loader: - kv  30:                tokenizer.ggml.bos_token_id u32              = 151643
llama_model_loader: - kv  31:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  32:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
llama_model_loader: - kv  33:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  141 tensors
llama_model_loader: - type q4_K:  169 tensors
llama_model_loader: - type q6_K:   29 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 4.36 GiB (4.91 BPW) 
load: printing all EOG tokens:
load:   - 151643 ('<|endoftext|>')
load:   - 151645 ('<|im_end|>')
load:   - 151662 ('<|fim_pad|>')
load:   - 151663 ('<|repo_name|>')
load:   - 151664 ('<|file_sep|>')
load: special tokens cache size = 22
load: token to piece cache size = 0.9310 MB
print_info: arch             = qwen2
print_info: vocab_only       = 1
print_info: model type       = ?B
print_info: model params     = 7.62 B
print_info: general.name     = Qwen2.5 Coder 7B Instruct
print_info: vocab type       = BPE
print_info: n_vocab          = 152064
print_info: n_merges         = 151387
print_info: BOS token        = 151643 '<|endoftext|>'
print_info: EOS token        = 151645 '<|im_end|>'
print_info: EOT token        = 151645 '<|im_end|>'
print_info: PAD token        = 151643 '<|endoftext|>'
print_info: LF token         = 198 'Ċ'
print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
print_info: FIM MID token    = 151660 '<|fim_middle|>'
print_info: FIM PAD token    = 151662 '<|fim_pad|>'
print_info: FIM REP token    = 151663 '<|repo_name|>'
print_info: FIM SEP token    = 151664 '<|file_sep|>'
print_info: EOG token        = 151643 '<|endoftext|>'
print_info: EOG token        = 151645 '<|im_end|>'
print_info: EOG token        = 151662 '<|fim_pad|>'
print_info: EOG token        = 151663 '<|repo_name|>'
print_info: EOG token        = 151664 '<|file_sep|>'
print_info: max token length = 256
llama_model_load: vocab only - skipping tensors
time=2025-12-13T21:10:51.989+02:00 level=INFO source=server.go:209 msg="enabling flash attention"
time=2025-12-13T21:10:51.990+02:00 level=INFO source=server.go:392 msg="starting runner" cmd="/opt/homebrew/Cellar/ollama/0.13.3/bin/ollama runner --model /Users/stranger/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 --port 52653"
time=2025-12-13T21:10:51.991+02:00 level=INFO source=sched.go:443 msg="system memory" total="16.0 GiB" free="6.1 GiB" free_swap="0 B"
time=2025-12-13T21:10:51.991+02:00 level=INFO source=server.go:459 msg="loading model" "model layers"=29 requested=-1
time=2025-12-13T21:10:51.992+02:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="4.1 GiB"
time=2025-12-13T21:10:51.992+02:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="112.0 MiB"
time=2025-12-13T21:10:51.992+02:00 level=INFO source=device.go:272 msg="total memory" size="4.2 GiB"
time=2025-12-13T21:10:52.013+02:00 level=INFO source=runner.go:964 msg="starting go runner"
ggml_metal_device_init: testing tensor API for f16 support
ggml_metal_device_init: testing tensor API for bfloat support
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: error: Error Domain=MTLLibraryErrorDomain Code=3 "program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7170:76: note: in instantiation of function template specialization 'kernel_rope_multi<float>' requested here
template [[host_name("kernel_rope_multi_f32")]] kernel kernel_rope_multi_t kernel_rope_multi<float>;
                                                                           ^
program_source:7068:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7050:24: note: remove the 'if' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~
program_source:7068:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7050:28: note: remove the '&&' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~
program_source:7044:29: note: initialize the variable 'theta_base' to silence this warning
            float theta_base;
                            ^
                             = 0.0
program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7171:76: note: in instantiation of function template specialization 'kernel_rope_multi<half>' requested here
template [[host_name("kernel_rope_multi_f16")]] kernel kernel_rope_multi_t kernel_rope_multi<half>;
                                                                           ^
program_source:7068:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7050:24: note: remove the 'if' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~
program_source:7068:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7050:28: note: remove the '&&' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~
program_source:7044:29: note: initialize the variable 'theta_base' to silence this warning
            float theta_base;
                            ^
                             = 0.0
In file included from program_source:2816:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368:
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3266:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>' "Input types must match cooperative tensor types"
    static_assert(__tensor_ops_detail::__is_same_v<_leftType, leftValueType>, "Input types must match cooperative tensor types");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:394:24: note: in instantiation of function template specialization 'mpp::tensor_ops::__mutmul2d_detail::__run<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>, metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>>' requested here
    __mutmul2d_detail::__run<Descriptor, Scope, LeftOperandType,
                       ^
program_source:12046:12: note: in instantiation of function template specialization 'mpp::tensor_ops::matmul2d<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>>::run<metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>, void>' requested here
        mm.run(sB, sA, cT);
           ^
In file included from program_source:2816:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368:
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3267:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<half, bfloat>' "Input types must match cooperative tensor types"
    static_assert(__tensor_ops_detail::__is_same_v<_rightType, rightValueType>, "Input types must match cooperative tensor types");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
" UserInfo={NSLocalizedDescription=program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7170:76: note: in instantiation of function template specialization 'kernel_rope_multi<float>' requested here
template [[host_name("kernel_rope_multi_f32")]] kernel kernel_rope_multi_t kernel_rope_multi<float>;
                                                                           ^
program_source:7068:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7050:24: note: remove the 'if' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~
program_source:7068:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7050:28: note: remove the '&&' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~
program_source:7044:29: note: initialize the variable 'theta_base' to silence this warning
            float theta_base;
                            ^
                             = 0.0
program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7171:76: note: in instantiation of function template specialization 'kernel_rope_multi<half>' requested here
template [[host_name("kernel_rope_multi_f16")]] kernel kernel_rope_multi_t kernel_rope_multi<half>;
                                                                           ^
program_source:7068:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7050:24: note: remove the 'if' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~
program_source:7068:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7050:28: note: remove the '&&' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~
program_source:7044:29: note: initialize the variable 'theta_base' to silence this warning
            float theta_base;
                            ^
                             = 0.0
In file included from program_source:2816:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368:
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3266:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>' "Input types must match cooperative tensor types"
    static_assert(__tensor_ops_detail::__is_same_v<_leftType, leftValueType>, "Input types must match cooperative tensor types");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:394:24: note: in instantiation of function template specialization 'mpp::tensor_ops::__mutmul2d_detail::__run<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>, metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>>' requested here
    __mutmul2d_detail::__run<Descriptor, Scope, LeftOperandType,
                       ^
program_source:12046:12: note: in instantiation of function template specialization 'mpp::tensor_ops::matmul2d<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>>::run<metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>, void>' requested here
        mm.run(sB, sA, cT);
           ^
In file included from program_source:2816:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368:
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3267:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<half, bfloat>' "Input types must match cooperative tensor types"
    static_assert(__tensor_ops_detail::__is_same_v<_rightType, rightValueType>, "Input types must match cooperative tensor types");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
}
ggml_metal_device_init: error: failed to create library
ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s)
ggml_metal_device_init: GPU name:   Apple M5
ggml_metal_device_init: GPU family: MTLGPUFamilyApple10  (1010)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4  (5002)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: has tensor            = true
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 12713.12 MB
time=2025-12-13T21:10:52.013+02:00 level=INFO source=ggml.go:104 msg=system Metal.0.EMBED_LIBRARY=1 CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.FP16_VA=1 CPU.0.DOTPROD=1 CPU.0.LLAMAFILE=1 CPU.0.ACCELERATE=1 compiler=cgo(clang)
time=2025-12-13T21:10:52.846+02:00 level=INFO source=runner.go:1000 msg="Server listening on 127.0.0.1:52653"
time=2025-12-13T21:10:52.858+02:00 level=INFO source=runner.go:894 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType:q8_0 NumThreads:4 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
llama_model_load_from_file_impl: using device Metal (Apple M5) (unknown id) - 12123 MiB free
time=2025-12-13T21:10:52.858+02:00 level=INFO source=server.go:1301 msg="waiting for llama runner to start responding"
time=2025-12-13T21:10:52.858+02:00 level=INFO source=server.go:1335 msg="waiting for server to become available" status="llm server loading model"
llama_model_loader: loaded meta data with 34 key-value pairs and 339 tensors from /Users/stranger/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 Coder 7B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Qwen2.5-Coder
llama_model_loader: - kv   5:                         general.size_label str              = 7B
llama_model_loader: - kv   6:                            general.license str              = apache-2.0
llama_model_loader: - kv   7:                       general.license.link str              = https://huggingface.co/Qwen/Qwen2.5-C...
llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
llama_model_loader: - kv   9:                  general.base_model.0.name str              = Qwen2.5 Coder 7B
llama_model_loader: - kv  10:          general.base_model.0.organization str              = Qwen
llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/Qwen/Qwen2.5-C...
llama_model_loader: - kv  12:                               general.tags arr[str,6]       = ["code", "codeqwen", "chat", "qwen", ...
llama_model_loader: - kv  13:                          general.languages arr[str,1]       = ["en"]
llama_model_loader: - kv  14:                          qwen2.block_count u32              = 28
llama_model_loader: - kv  15:                       qwen2.context_length u32              = 32768
llama_model_loader: - kv  16:                     qwen2.embedding_length u32              = 3584
llama_model_loader: - kv  17:                  qwen2.feed_forward_length u32              = 18944
llama_model_loader: - kv  18:                 qwen2.attention.head_count u32              = 28
llama_model_loader: - kv  19:              qwen2.attention.head_count_kv u32              = 4
llama_model_loader: - kv  20:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  21:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  22:                          general.file_type u32              = 15
llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,152064]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  26:                  tokenizer.ggml.token_type arr[i32,152064]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  27:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  28:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  29:            tokenizer.ggml.padding_token_id u32              = 151643
llama_model_loader: - kv  30:                tokenizer.ggml.bos_token_id u32              = 151643
llama_model_loader: - kv  31:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  32:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
llama_model_loader: - kv  33:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  141 tensors
llama_model_loader: - type q4_K:  169 tensors
llama_model_loader: - type q6_K:   29 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 4.36 GiB (4.91 BPW) 
load: printing all EOG tokens:
load:   - 151643 ('<|endoftext|>')
load:   - 151645 ('<|im_end|>')
load:   - 151662 ('<|fim_pad|>')
load:   - 151663 ('<|repo_name|>')
load:   - 151664 ('<|file_sep|>')
load: special tokens cache size = 22
load: token to piece cache size = 0.9310 MB
print_info: arch             = qwen2
print_info: vocab_only       = 0
print_info: n_ctx_train      = 32768
print_info: n_embd           = 3584
print_info: n_embd_inp       = 3584
print_info: n_layer          = 28
print_info: n_head           = 28
print_info: n_head_kv        = 4
print_info: n_rot            = 128
print_info: n_swa            = 0
print_info: is_swa_any       = 0
print_info: n_embd_head_k    = 128
print_info: n_embd_head_v    = 128
print_info: n_gqa            = 7
print_info: n_embd_k_gqa     = 512
print_info: n_embd_v_gqa     = 512
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-06
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 18944
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: n_expert_groups  = 0
print_info: n_group_used     = 0
print_info: causal attn      = 1
print_info: pooling type     = -1
print_info: rope type        = 2
print_info: rope scaling     = linear
print_info: freq_base_train  = 1000000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 32768
print_info: rope_finetuned   = unknown
print_info: model type       = 7B
print_info: model params     = 7.62 B
print_info: general.name     = Qwen2.5 Coder 7B Instruct
print_info: vocab type       = BPE
print_info: n_vocab          = 152064
print_info: n_merges         = 151387
print_info: BOS token        = 151643 '<|endoftext|>'
print_info: EOS token        = 151645 '<|im_end|>'
print_info: EOT token        = 151645 '<|im_end|>'
print_info: PAD token        = 151643 '<|endoftext|>'
print_info: LF token         = 198 'Ċ'
print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
print_info: FIM MID token    = 151660 '<|fim_middle|>'
print_info: FIM PAD token    = 151662 '<|fim_pad|>'
print_info: FIM REP token    = 151663 '<|repo_name|>'
print_info: FIM SEP token    = 151664 '<|file_sep|>'
print_info: EOG token        = 151643 '<|endoftext|>'
print_info: EOG token        = 151645 '<|im_end|>'
print_info: EOG token        = 151662 '<|fim_pad|>'
print_info: EOG token        = 151663 '<|repo_name|>'
print_info: EOG token        = 151664 '<|file_sep|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = false)
load_tensors: offloading 0 repeating layers to GPU
load_tensors: offloaded 0/29 layers to GPU
load_tensors:          CPU model buffer size =  4460.45 MiB
llama_context: constructing llama_context
llama_context: n_seq_max     = 1
llama_context: n_ctx         = 4096
llama_context: n_ctx_seq     = 4096
llama_context: n_batch       = 512
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = enabled
llama_context: kv_unified    = false
llama_context: freq_base     = 1000000.0
llama_context: freq_scale    = 1
llama_context: n_ctx_seq (4096) < n_ctx_train (32768) -- the full capacity of the model will not be utilized
ggml_metal_init: allocating
ggml_metal_init: picking default device: Apple M5
ggml_metal_init: the device does not have a precompiled Metal library - this is unexpected
ggml_metal_init: will try to compile it on the fly
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: error: Error Domain=MTLLibraryErrorDomain Code=3 "program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7170:76: note: in instantiation of function template specialization 'kernel_rope_multi<float>' requested here
template [[host_name("kernel_rope_multi_f32")]] kernel kernel_rope_multi_t kernel_rope_multi<float>;
                                                                           ^
program_source:7068:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7050:24: note: remove the 'if' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~
program_source:7068:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7050:28: note: remove the '&&' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~
program_source:7044:29: note: initialize the variable 'theta_base' to silence this warning
            float theta_base;
                            ^
                             = 0.0
program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7171:76: note: in instantiation of function template specialization 'kernel_rope_multi<half>' requested here
template [[host_name("kernel_rope_multi_f16")]] kernel kernel_rope_multi_t kernel_rope_multi<half>;
                                                                           ^
program_source:7068:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7050:24: note: remove the 'if' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~
program_source:7068:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7050:28: note: remove the '&&' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~
program_source:7044:29: note: initialize the variable 'theta_base' to silence this warning
            float theta_base;
                            ^
                             = 0.0
In file included from program_source:2816:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368:
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3266:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>' "Input types must match cooperative tensor types"
    static_assert(__tensor_ops_detail::__is_same_v<_leftType, leftValueType>, "Input types must match cooperative tensor types");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:394:24: note: in instantiation of function template specialization 'mpp::tensor_ops::__mutmul2d_detail::__run<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>, metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>>' requested here
    __mutmul2d_detail::__run<Descriptor, Scope, LeftOperandType,
                       ^
program_source:12046:12: note: in instantiation of function template specialization 'mpp::tensor_ops::matmul2d<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>>::run<metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>, void>' requested here
        mm.run(sB, sA, cT);
           ^
In file included from program_source:2816:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368:
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3267:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<half, bfloat>' "Input types must match cooperative tensor types"
    static_assert(__tensor_ops_detail::__is_same_v<_rightType, rightValueType>, "Input types must match cooperative tensor types");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
" UserInfo={NSLocalizedDescription=program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7170:76: note: in instantiation of function template specialization 'kernel_rope_multi<float>' requested here
template [[host_name("kernel_rope_multi_f32")]] kernel kernel_rope_multi_t kernel_rope_multi<float>;
                                                                           ^
program_source:7068:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7050:24: note: remove the 'if' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~
program_source:7068:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7050:28: note: remove the '&&' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~
program_source:7044:29: note: initialize the variable 'theta_base' to silence this warning
            float theta_base;
                            ^
                             = 0.0
program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7171:76: note: in instantiation of function template specialization 'kernel_rope_multi<half>' requested here
template [[host_name("kernel_rope_multi_f16")]] kernel kernel_rope_multi_t kernel_rope_multi<half>;
                                                                           ^
program_source:7068:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7050:24: note: remove the 'if' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~
program_source:7068:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7050:28: note: remove the '&&' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~
program_source:7044:29: note: initialize the variable 'theta_base' to silence this warning
            float theta_base;
                            ^
                             = 0.0
In file included from program_source:2816:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368:
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3266:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>' "Input types must match cooperative tensor types"
    static_assert(__tensor_ops_detail::__is_same_v<_leftType, leftValueType>, "Input types must match cooperative tensor types");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:394:24: note: in instantiation of function template specialization 'mpp::tensor_ops::__mutmul2d_detail::__run<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>, metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>>' requested here
    __mutmul2d_detail::__run<Descriptor, Scope, LeftOperandType,
                       ^
program_source:12046:12: note: in instantiation of function template specialization 'mpp::tensor_ops::matmul2d<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>>::run<metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>, void>' requested here
        mm.run(sB, sA, cT);
           ^
In file included from program_source:2816:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368:
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3267:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<half, bfloat>' "Input types must match cooperative tensor types"
    static_assert(__tensor_ops_detail::__is_same_v<_rightType, rightValueType>, "Input types must match cooperative tensor types");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
}
ggml_metal_init: error: failed to initialize the Metal library
ggml_backend_metal_device_init: error: failed to allocate context
llama_init_from_model: failed to initialize the context: failed to initialize Metal backend
panic: unable to create llama context

goroutine 4 [running]:
github.com/ollama/ollama/runner/llamarunner.(*Server).loadModel(0x14000386320, {{0x0, 0x0, 0x0}, 0x0, 0x0, 0x0, {0x0, 0x0, 0x0}, ...}, ...)
	github.com/ollama/ollama/runner/llamarunner/runner.go:848 +0x268
created by github.com/ollama/ollama/runner/llamarunner.(*Server).load in goroutine 24
	github.com/ollama/ollama/runner/llamarunner/runner.go:933 +0x680
time=2025-12-13T21:10:54.666+02:00 level=INFO source=server.go:1335 msg="waiting for server to become available" status="llm server not responding"
time=2025-12-13T21:10:54.668+02:00 level=ERROR source=server.go:265 msg="llama runner terminated" error="exit status 2"
time=2025-12-13T21:10:54.917+02:00 level=INFO source=sched.go:470 msg="Load failed" model=/Users/stranger/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 error="llama runner process has terminated: error:failed to allocate context"
[GIN] 2025/12/13 - 21:10:54 | 500 |  3.867274417s |       127.0.0.1 | POST     "/api/generate"

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.13.3 - ...

Originally created by @StrangerSVN on GitHub (Dec 13, 2025). Original GitHub issue: https://github.com/ollama/ollama/issues/13460 ### What is the issue? After updating to macOS 26.2 and the latest Xcode Command Line Tools, Ollama fails to load any model with a Metal library compilation error. The error occurs during Metal backend initialization with type mismatches between bfloat and half types in the Metal Performance Primitives framework. This issue started immediately after the OS/toolchain update - Ollama was working correctly before the update. ``` ggml_metal_library_init: error: Error Domain=MTLLibraryErrorDomain Code=3 error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>' "Input types must match cooperative tensor types" ``` ### Relevant log output ```shell ggml_metal_device_init: testing tensor API for f16 support ggml_metal_device_init: testing tensor API for bfloat support ggml_metal_library_init: using embedded metal library ggml_metal_library_init: error: Error Domain=MTLLibraryErrorDomain Code=3 "program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7170:76: note: in instantiation of function template specialization 'kernel_rope_multi<float>' requested here template [[host_name("kernel_rope_multi_f32")]] kernel kernel_rope_multi_t kernel_rope_multi<float>; ^ program_source:7068:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7050:24: note: remove the 'if' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~ program_source:7068:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7050:28: note: remove the '&&' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~ program_source:7044:29: note: initialize the variable 'theta_base' to silence this warning float theta_base; ^ = 0.0 program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7171:76: note: in instantiation of function template specialization 'kernel_rope_multi<half>' requested here template [[host_name("kernel_rope_multi_f16")]] kernel kernel_rope_multi_t kernel_rope_multi<half>; ^ program_source:7068:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7050:24: note: remove the 'if' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~ program_source:7068:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7050:28: note: remove the '&&' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~ program_source:7044:29: note: initialize the variable 'theta_base' to silence this warning float theta_base; ^ = 0.0 In file included from program_source:2816: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368: /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3266:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>' "Input types must match cooperative tensor types" static_assert(__tensor_ops_detail::__is_same_v<_leftType, leftValueType>, "Input types must match cooperative tensor types"); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:394:24: note: in instantiation of function template specialization 'mpp::tensor_ops::__mutmul2d_detail::__run<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>, metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>>' requested here __mutmul2d_detail::__run<Descriptor, Scope, LeftOperandType, ^ program_source:12046:12: note: in instantiation of function template specialization 'mpp::tensor_ops::matmul2d<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>>::run<metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>, void>' requested here mm.run(sB, sA, cT); ^ In file included from program_source:2816: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368: /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3267:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<half, bfloat>' "Input types must match cooperative tensor types" static_assert(__tensor_ops_detail::__is_same_v<_rightType, rightValueType>, "Input types must match cooperative tensor types"); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ " UserInfo={NSLocalizedDescription=program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7170:76: note: in instantiation of function template specialization 'kernel_rope_multi<float>' requested here template [[host_name("kernel_rope_multi_f32")]] kernel kernel_rope_multi_t kernel_rope_multi<float>; ^ program_source:7068:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7050:24: note: remove the 'if' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~ program_source:7068:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7050:28: note: remove the '&&' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~ program_source:7044:29: note: initialize the variable 'theta_base' to silence this warning float theta_base; ^ = 0.0 program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7171:76: note: in instantiation of function template specialization 'kernel_rope_multi<half>' requested here template [[host_name("kernel_rope_multi_f16")]] kernel kernel_rope_multi_t kernel_rope_multi<half>; ^ program_source:7068:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7050:24: note: remove the 'if' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~ program_source:7068:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7050:28: note: remove the '&&' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~ program_source:7044:29: note: initialize the variable 'theta_base' to silence this warning float theta_base; ^ = 0.0 In file included from program_source:2816: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368: /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3266:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>' "Input types must match cooperative tensor types" static_assert(__tensor_ops_detail::__is_same_v<_leftType, leftValueType>, "Input types must match cooperative tensor types"); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:394:24: note: in instantiation of function template specialization 'mpp::tensor_ops::__mutmul2d_detail::__run<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>, metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>>' requested here __mutmul2d_detail::__run<Descriptor, Scope, LeftOperandType, ^ program_source:12046:12: note: in instantiation of function template specialization 'mpp::tensor_ops::matmul2d<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>>::run<metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>, void>' requested here mm.run(sB, sA, cT); ^ In file included from program_source:2816: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368: /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3267:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<half, bfloat>' "Input types must match cooperative tensor types" static_assert(__tensor_ops_detail::__is_same_v<_rightType, rightValueType>, "Input types must match cooperative tensor types"); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ } ggml_metal_device_init: error: failed to create library ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: Apple M5 ggml_metal_device_init: GPU family: MTLGPUFamilyApple10 (1010) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = true ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 12713.12 MB llama_model_load_from_file_impl: using device Metal (Apple M5) (unknown id) - 12123 MiB free llama_model_loader: loaded meta data with 34 key-value pairs and 339 tensors from /Users/stranger/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = qwen2 llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Qwen2.5 Coder 7B Instruct llama_model_loader: - kv 3: general.finetune str = Instruct llama_model_loader: - kv 4: general.basename str = Qwen2.5-Coder llama_model_loader: - kv 5: general.size_label str = 7B llama_model_loader: - kv 6: general.license str = apache-2.0 llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-C... llama_model_loader: - kv 8: general.base_model.count u32 = 1 llama_model_loader: - kv 9: general.base_model.0.name str = Qwen2.5 Coder 7B llama_model_loader: - kv 10: general.base_model.0.organization str = Qwen llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-C... llama_model_loader: - kv 12: general.tags arr[str,6] = ["code", "codeqwen", "chat", "qwen", ... llama_model_loader: - kv 13: general.languages arr[str,1] = ["en"] llama_model_loader: - kv 14: qwen2.block_count u32 = 28 llama_model_loader: - kv 15: qwen2.context_length u32 = 32768 llama_model_loader: - kv 16: qwen2.embedding_length u32 = 3584 llama_model_loader: - kv 17: qwen2.feed_forward_length u32 = 18944 llama_model_loader: - kv 18: qwen2.attention.head_count u32 = 28 llama_model_loader: - kv 19: qwen2.attention.head_count_kv u32 = 4 llama_model_loader: - kv 20: qwen2.rope.freq_base f32 = 1000000.000000 llama_model_loader: - kv 21: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 22: general.file_type u32 = 15 llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 24: tokenizer.ggml.pre str = qwen2 llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,152064] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 151645 llama_model_loader: - kv 29: tokenizer.ggml.padding_token_id u32 = 151643 llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643 llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... llama_model_loader: - kv 33: general.quantization_version u32 = 2 llama_model_loader: - type f32: 141 tensors llama_model_loader: - type q4_K: 169 tensors llama_model_loader: - type q6_K: 29 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 4.36 GiB (4.91 BPW) load: printing all EOG tokens: load: - 151643 ('<|endoftext|>') load: - 151645 ('<|im_end|>') load: - 151662 ('<|fim_pad|>') load: - 151663 ('<|repo_name|>') load: - 151664 ('<|file_sep|>') load: special tokens cache size = 22 load: token to piece cache size = 0.9310 MB print_info: arch = qwen2 print_info: vocab_only = 1 print_info: model type = ?B print_info: model params = 7.62 B print_info: general.name = Qwen2.5 Coder 7B Instruct print_info: vocab type = BPE print_info: n_vocab = 152064 print_info: n_merges = 151387 print_info: BOS token = 151643 '<|endoftext|>' print_info: EOS token = 151645 '<|im_end|>' print_info: EOT token = 151645 '<|im_end|>' print_info: PAD token = 151643 '<|endoftext|>' print_info: LF token = 198 'Ċ' print_info: FIM PRE token = 151659 '<|fim_prefix|>' print_info: FIM SUF token = 151661 '<|fim_suffix|>' print_info: FIM MID token = 151660 '<|fim_middle|>' print_info: FIM PAD token = 151662 '<|fim_pad|>' print_info: FIM REP token = 151663 '<|repo_name|>' print_info: FIM SEP token = 151664 '<|file_sep|>' print_info: EOG token = 151643 '<|endoftext|>' print_info: EOG token = 151645 '<|im_end|>' print_info: EOG token = 151662 '<|fim_pad|>' print_info: EOG token = 151663 '<|repo_name|>' print_info: EOG token = 151664 '<|file_sep|>' print_info: max token length = 256 llama_model_load: vocab only - skipping tensors time=2025-12-13T21:10:51.989+02:00 level=INFO source=server.go:209 msg="enabling flash attention" time=2025-12-13T21:10:51.990+02:00 level=INFO source=server.go:392 msg="starting runner" cmd="/opt/homebrew/Cellar/ollama/0.13.3/bin/ollama runner --model /Users/stranger/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 --port 52653" time=2025-12-13T21:10:51.991+02:00 level=INFO source=sched.go:443 msg="system memory" total="16.0 GiB" free="6.1 GiB" free_swap="0 B" time=2025-12-13T21:10:51.991+02:00 level=INFO source=server.go:459 msg="loading model" "model layers"=29 requested=-1 time=2025-12-13T21:10:51.992+02:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="4.1 GiB" time=2025-12-13T21:10:51.992+02:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="112.0 MiB" time=2025-12-13T21:10:51.992+02:00 level=INFO source=device.go:272 msg="total memory" size="4.2 GiB" time=2025-12-13T21:10:52.013+02:00 level=INFO source=runner.go:964 msg="starting go runner" ggml_metal_device_init: testing tensor API for f16 support ggml_metal_device_init: testing tensor API for bfloat support ggml_metal_library_init: using embedded metal library ggml_metal_library_init: error: Error Domain=MTLLibraryErrorDomain Code=3 "program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7170:76: note: in instantiation of function template specialization 'kernel_rope_multi<float>' requested here template [[host_name("kernel_rope_multi_f32")]] kernel kernel_rope_multi_t kernel_rope_multi<float>; ^ program_source:7068:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7050:24: note: remove the 'if' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~ program_source:7068:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7050:28: note: remove the '&&' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~ program_source:7044:29: note: initialize the variable 'theta_base' to silence this warning float theta_base; ^ = 0.0 program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7171:76: note: in instantiation of function template specialization 'kernel_rope_multi<half>' requested here template [[host_name("kernel_rope_multi_f16")]] kernel kernel_rope_multi_t kernel_rope_multi<half>; ^ program_source:7068:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7050:24: note: remove the 'if' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~ program_source:7068:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7050:28: note: remove the '&&' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~ program_source:7044:29: note: initialize the variable 'theta_base' to silence this warning float theta_base; ^ = 0.0 In file included from program_source:2816: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368: /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3266:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>' "Input types must match cooperative tensor types" static_assert(__tensor_ops_detail::__is_same_v<_leftType, leftValueType>, "Input types must match cooperative tensor types"); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:394:24: note: in instantiation of function template specialization 'mpp::tensor_ops::__mutmul2d_detail::__run<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>, metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>>' requested here __mutmul2d_detail::__run<Descriptor, Scope, LeftOperandType, ^ program_source:12046:12: note: in instantiation of function template specialization 'mpp::tensor_ops::matmul2d<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>>::run<metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>, void>' requested here mm.run(sB, sA, cT); ^ In file included from program_source:2816: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368: /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3267:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<half, bfloat>' "Input types must match cooperative tensor types" static_assert(__tensor_ops_detail::__is_same_v<_rightType, rightValueType>, "Input types must match cooperative tensor types"); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ " UserInfo={NSLocalizedDescription=program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7170:76: note: in instantiation of function template specialization 'kernel_rope_multi<float>' requested here template [[host_name("kernel_rope_multi_f32")]] kernel kernel_rope_multi_t kernel_rope_multi<float>; ^ program_source:7068:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7050:24: note: remove the 'if' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~ program_source:7068:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7050:28: note: remove the '&&' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~ program_source:7044:29: note: initialize the variable 'theta_base' to silence this warning float theta_base; ^ = 0.0 program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7171:76: note: in instantiation of function template specialization 'kernel_rope_multi<half>' requested here template [[host_name("kernel_rope_multi_f16")]] kernel kernel_rope_multi_t kernel_rope_multi<half>; ^ program_source:7068:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7050:24: note: remove the 'if' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~ program_source:7068:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7050:28: note: remove the '&&' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~ program_source:7044:29: note: initialize the variable 'theta_base' to silence this warning float theta_base; ^ = 0.0 In file included from program_source:2816: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368: /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3266:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>' "Input types must match cooperative tensor types" static_assert(__tensor_ops_detail::__is_same_v<_leftType, leftValueType>, "Input types must match cooperative tensor types"); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:394:24: note: in instantiation of function template specialization 'mpp::tensor_ops::__mutmul2d_detail::__run<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>, metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>>' requested here __mutmul2d_detail::__run<Descriptor, Scope, LeftOperandType, ^ program_source:12046:12: note: in instantiation of function template specialization 'mpp::tensor_ops::matmul2d<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>>::run<metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>, void>' requested here mm.run(sB, sA, cT); ^ In file included from program_source:2816: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368: /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3267:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<half, bfloat>' "Input types must match cooperative tensor types" static_assert(__tensor_ops_detail::__is_same_v<_rightType, rightValueType>, "Input types must match cooperative tensor types"); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ } ggml_metal_device_init: error: failed to create library ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: Apple M5 ggml_metal_device_init: GPU family: MTLGPUFamilyApple10 (1010) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = true ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 12713.12 MB time=2025-12-13T21:10:52.013+02:00 level=INFO source=ggml.go:104 msg=system Metal.0.EMBED_LIBRARY=1 CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.FP16_VA=1 CPU.0.DOTPROD=1 CPU.0.LLAMAFILE=1 CPU.0.ACCELERATE=1 compiler=cgo(clang) time=2025-12-13T21:10:52.846+02:00 level=INFO source=runner.go:1000 msg="Server listening on 127.0.0.1:52653" time=2025-12-13T21:10:52.858+02:00 level=INFO source=runner.go:894 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType:q8_0 NumThreads:4 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" llama_model_load_from_file_impl: using device Metal (Apple M5) (unknown id) - 12123 MiB free time=2025-12-13T21:10:52.858+02:00 level=INFO source=server.go:1301 msg="waiting for llama runner to start responding" time=2025-12-13T21:10:52.858+02:00 level=INFO source=server.go:1335 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: loaded meta data with 34 key-value pairs and 339 tensors from /Users/stranger/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = qwen2 llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Qwen2.5 Coder 7B Instruct llama_model_loader: - kv 3: general.finetune str = Instruct llama_model_loader: - kv 4: general.basename str = Qwen2.5-Coder llama_model_loader: - kv 5: general.size_label str = 7B llama_model_loader: - kv 6: general.license str = apache-2.0 llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-C... llama_model_loader: - kv 8: general.base_model.count u32 = 1 llama_model_loader: - kv 9: general.base_model.0.name str = Qwen2.5 Coder 7B llama_model_loader: - kv 10: general.base_model.0.organization str = Qwen llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-C... llama_model_loader: - kv 12: general.tags arr[str,6] = ["code", "codeqwen", "chat", "qwen", ... llama_model_loader: - kv 13: general.languages arr[str,1] = ["en"] llama_model_loader: - kv 14: qwen2.block_count u32 = 28 llama_model_loader: - kv 15: qwen2.context_length u32 = 32768 llama_model_loader: - kv 16: qwen2.embedding_length u32 = 3584 llama_model_loader: - kv 17: qwen2.feed_forward_length u32 = 18944 llama_model_loader: - kv 18: qwen2.attention.head_count u32 = 28 llama_model_loader: - kv 19: qwen2.attention.head_count_kv u32 = 4 llama_model_loader: - kv 20: qwen2.rope.freq_base f32 = 1000000.000000 llama_model_loader: - kv 21: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 22: general.file_type u32 = 15 llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 24: tokenizer.ggml.pre str = qwen2 llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,152064] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 151645 llama_model_loader: - kv 29: tokenizer.ggml.padding_token_id u32 = 151643 llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643 llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... llama_model_loader: - kv 33: general.quantization_version u32 = 2 llama_model_loader: - type f32: 141 tensors llama_model_loader: - type q4_K: 169 tensors llama_model_loader: - type q6_K: 29 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 4.36 GiB (4.91 BPW) load: printing all EOG tokens: load: - 151643 ('<|endoftext|>') load: - 151645 ('<|im_end|>') load: - 151662 ('<|fim_pad|>') load: - 151663 ('<|repo_name|>') load: - 151664 ('<|file_sep|>') load: special tokens cache size = 22 load: token to piece cache size = 0.9310 MB print_info: arch = qwen2 print_info: vocab_only = 0 print_info: n_ctx_train = 32768 print_info: n_embd = 3584 print_info: n_embd_inp = 3584 print_info: n_layer = 28 print_info: n_head = 28 print_info: n_head_kv = 4 print_info: n_rot = 128 print_info: n_swa = 0 print_info: is_swa_any = 0 print_info: n_embd_head_k = 128 print_info: n_embd_head_v = 128 print_info: n_gqa = 7 print_info: n_embd_k_gqa = 512 print_info: n_embd_v_gqa = 512 print_info: f_norm_eps = 0.0e+00 print_info: f_norm_rms_eps = 1.0e-06 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: f_attn_scale = 0.0e+00 print_info: n_ff = 18944 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: n_expert_groups = 0 print_info: n_group_used = 0 print_info: causal attn = 1 print_info: pooling type = -1 print_info: rope type = 2 print_info: rope scaling = linear print_info: freq_base_train = 1000000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 32768 print_info: rope_finetuned = unknown print_info: model type = 7B print_info: model params = 7.62 B print_info: general.name = Qwen2.5 Coder 7B Instruct print_info: vocab type = BPE print_info: n_vocab = 152064 print_info: n_merges = 151387 print_info: BOS token = 151643 '<|endoftext|>' print_info: EOS token = 151645 '<|im_end|>' print_info: EOT token = 151645 '<|im_end|>' print_info: PAD token = 151643 '<|endoftext|>' print_info: LF token = 198 'Ċ' print_info: FIM PRE token = 151659 '<|fim_prefix|>' print_info: FIM SUF token = 151661 '<|fim_suffix|>' print_info: FIM MID token = 151660 '<|fim_middle|>' print_info: FIM PAD token = 151662 '<|fim_pad|>' print_info: FIM REP token = 151663 '<|repo_name|>' print_info: FIM SEP token = 151664 '<|file_sep|>' print_info: EOG token = 151643 '<|endoftext|>' print_info: EOG token = 151645 '<|im_end|>' print_info: EOG token = 151662 '<|fim_pad|>' print_info: EOG token = 151663 '<|repo_name|>' print_info: EOG token = 151664 '<|file_sep|>' print_info: max token length = 256 load_tensors: loading model tensors, this can take a while... (mmap = false) load_tensors: offloading 0 repeating layers to GPU load_tensors: offloaded 0/29 layers to GPU load_tensors: CPU model buffer size = 4460.45 MiB llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 4096 llama_context: n_ctx_seq = 4096 llama_context: n_batch = 512 llama_context: n_ubatch = 512 llama_context: causal_attn = 1 llama_context: flash_attn = enabled llama_context: kv_unified = false llama_context: freq_base = 1000000.0 llama_context: freq_scale = 1 llama_context: n_ctx_seq (4096) < n_ctx_train (32768) -- the full capacity of the model will not be utilized ggml_metal_init: allocating ggml_metal_init: picking default device: Apple M5 ggml_metal_init: the device does not have a precompiled Metal library - this is unexpected ggml_metal_init: will try to compile it on the fly ggml_metal_library_init: using embedded metal library ggml_metal_library_init: error: Error Domain=MTLLibraryErrorDomain Code=3 "program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7170:76: note: in instantiation of function template specialization 'kernel_rope_multi<float>' requested here template [[host_name("kernel_rope_multi_f32")]] kernel kernel_rope_multi_t kernel_rope_multi<float>; ^ program_source:7068:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7050:24: note: remove the 'if' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~ program_source:7068:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7050:28: note: remove the '&&' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~ program_source:7044:29: note: initialize the variable 'theta_base' to silence this warning float theta_base; ^ = 0.0 program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7171:76: note: in instantiation of function template specialization 'kernel_rope_multi<half>' requested here template [[host_name("kernel_rope_multi_f16")]] kernel kernel_rope_multi_t kernel_rope_multi<half>; ^ program_source:7068:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7050:24: note: remove the 'if' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~ program_source:7068:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7050:28: note: remove the '&&' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~ program_source:7044:29: note: initialize the variable 'theta_base' to silence this warning float theta_base; ^ = 0.0 In file included from program_source:2816: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368: /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3266:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>' "Input types must match cooperative tensor types" static_assert(__tensor_ops_detail::__is_same_v<_leftType, leftValueType>, "Input types must match cooperative tensor types"); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:394:24: note: in instantiation of function template specialization 'mpp::tensor_ops::__mutmul2d_detail::__run<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>, metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>>' requested here __mutmul2d_detail::__run<Descriptor, Scope, LeftOperandType, ^ program_source:12046:12: note: in instantiation of function template specialization 'mpp::tensor_ops::matmul2d<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>>::run<metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>, void>' requested here mm.run(sB, sA, cT); ^ In file included from program_source:2816: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368: /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3267:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<half, bfloat>' "Input types must match cooperative tensor types" static_assert(__tensor_ops_detail::__is_same_v<_rightType, rightValueType>, "Input types must match cooperative tensor types"); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ " UserInfo={NSLocalizedDescription=program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7170:76: note: in instantiation of function template specialization 'kernel_rope_multi<float>' requested here template [[host_name("kernel_rope_multi_f32")]] kernel kernel_rope_multi_t kernel_rope_multi<float>; ^ program_source:7068:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7050:24: note: remove the 'if' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~ program_source:7068:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7050:28: note: remove the '&&' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~ program_source:7044:29: note: initialize the variable 'theta_base' to silence this warning float theta_base; ^ = 0.0 program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7171:76: note: in instantiation of function template specialization 'kernel_rope_multi<half>' requested here template [[host_name("kernel_rope_multi_f16")]] kernel kernel_rope_multi_t kernel_rope_multi<half>; ^ program_source:7068:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7050:24: note: remove the 'if' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7050:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~ program_source:7068:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7050:28: note: remove the '&&' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~ program_source:7044:29: note: initialize the variable 'theta_base' to silence this warning float theta_base; ^ = 0.0 In file included from program_source:2816: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368: /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3266:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>' "Input types must match cooperative tensor types" static_assert(__tensor_ops_detail::__is_same_v<_leftType, leftValueType>, "Input types must match cooperative tensor types"); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:394:24: note: in instantiation of function template specialization 'mpp::tensor_ops::__mutmul2d_detail::__run<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>, metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>>' requested here __mutmul2d_detail::__run<Descriptor, Scope, LeftOperandType, ^ program_source:12046:12: note: in instantiation of function template specialization 'mpp::tensor_ops::matmul2d<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>>::run<metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>, void>' requested here mm.run(sB, sA, cT); ^ In file included from program_source:2816: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368: /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3267:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<half, bfloat>' "Input types must match cooperative tensor types" static_assert(__tensor_ops_detail::__is_same_v<_rightType, rightValueType>, "Input types must match cooperative tensor types"); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ } ggml_metal_init: error: failed to initialize the Metal library ggml_backend_metal_device_init: error: failed to allocate context llama_init_from_model: failed to initialize the context: failed to initialize Metal backend panic: unable to create llama context goroutine 4 [running]: github.com/ollama/ollama/runner/llamarunner.(*Server).loadModel(0x14000386320, {{0x0, 0x0, 0x0}, 0x0, 0x0, 0x0, {0x0, 0x0, 0x0}, ...}, ...) github.com/ollama/ollama/runner/llamarunner/runner.go:848 +0x268 created by github.com/ollama/ollama/runner/llamarunner.(*Server).load in goroutine 24 github.com/ollama/ollama/runner/llamarunner/runner.go:933 +0x680 time=2025-12-13T21:10:54.666+02:00 level=INFO source=server.go:1335 msg="waiting for server to become available" status="llm server not responding" time=2025-12-13T21:10:54.668+02:00 level=ERROR source=server.go:265 msg="llama runner terminated" error="exit status 2" time=2025-12-13T21:10:54.917+02:00 level=INFO source=sched.go:470 msg="Load failed" model=/Users/stranger/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 error="llama runner process has terminated: error:failed to allocate context" [GIN] 2025/12/13 - 21:10:54 | 500 | 3.867274417s | 127.0.0.1 | POST "/api/generate" ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.13.3 - ...
GiteaMirror added the bug label 2026-05-04 23:30:33 -05:00
Author
Owner

@WaitF0r1t commented on GitHub (Dec 21, 2025):

I am facing the same issue. Do you need more information to solve this bug?

<!-- gh-comment-id:3679371492 --> @WaitF0r1t commented on GitHub (Dec 21, 2025): I am facing the same issue. Do you need more information to solve this bug?
Author
Owner

@JCLiu-hfut commented on GitHub (Dec 22, 2025):

I also encountered this problem; they say it's an M5 chip compatibility issue.

<!-- gh-comment-id:3681136574 --> @JCLiu-hfut commented on GitHub (Dec 22, 2025): I also encountered this problem; they say it's an M5 chip compatibility issue.
Author
Owner

@Bazze commented on GitHub (Dec 31, 2025):

I'm seeing the same problem... Don't know why I always should be so fast to update 🙈

<!-- gh-comment-id:3701793642 --> @Bazze commented on GitHub (Dec 31, 2025): I'm seeing the same problem... Don't know why I always should be so fast to update 🙈
Author
Owner

@NickFirmani commented on GitHub (Jan 6, 2026):

As of last week, the fix was made in the ggml-llama repo here https://github.com/ggml-org/llama.cpp/pull/18456

<!-- gh-comment-id:3715995227 --> @NickFirmani commented on GitHub (Jan 6, 2026): As of last week, the fix was made in the ggml-llama repo here https://github.com/ggml-org/llama.cpp/pull/18456
Author
Owner

@fsiler commented on GitHub (Jan 19, 2026):

@NickFirmani amazing, thanks for digging that up. Rookie question: how do I roll a build with that update so I can actually run on my M5? Thanks!

<!-- gh-comment-id:3770408703 --> @fsiler commented on GitHub (Jan 19, 2026): @NickFirmani amazing, thanks for digging that up. Rookie question: how do I roll a build with that update so I can actually run on my M5? Thanks!
Author
Owner

@wolfgang555 commented on GitHub (Jan 25, 2026):

same issue here

<!-- gh-comment-id:3796166020 --> @wolfgang555 commented on GitHub (Jan 25, 2026): same issue here
Author
Owner

@thomas-maurice commented on GitHub (Feb 1, 2026):

I am also very interested in finding a way of running ollama on an M5 chip!

<!-- gh-comment-id:3830970756 --> @thomas-maurice commented on GitHub (Feb 1, 2026): I am also very interested in finding a way of running ollama on an M5 chip!
Author
Owner

@jhoopmann commented on GitHub (Feb 6, 2026):

I am also very interested in finding a way of running ollama on an M5 chip!

Just use the last release 0.15.5, tested it today on my M5 with qwen3,gpt-oss-safeguard and mistral: https://github.com/ollama/ollama/releases/tag/v0.15.5

<!-- gh-comment-id:3862149322 --> @jhoopmann commented on GitHub (Feb 6, 2026): > I am also very interested in finding a way of running ollama on an M5 chip! Just use the last release 0.15.5, tested it today on my M5 with qwen3,gpt-oss-safeguard and mistral: https://github.com/ollama/ollama/releases/tag/v0.15.5
Author
Owner

@StrangerSVN commented on GitHub (Feb 6, 2026):

I am also very interested in finding a way of running ollama on an M5 chip!

Just use the last release 0.15.5, tested it today on my M5 with qwen3,gpt-oss-safeguard and mistral: https://github.com/ollama/ollama/releases/tag/v0.15.5

What version of macOS do you have? Problem after updating to 26.2

<!-- gh-comment-id:3862966921 --> @StrangerSVN commented on GitHub (Feb 6, 2026): > > I am also very interested in finding a way of running ollama on an M5 chip! > > Just use the last release 0.15.5, tested it today on my M5 with qwen3,gpt-oss-safeguard and mistral: https://github.com/ollama/ollama/releases/tag/v0.15.5 What version of macOS do you have? Problem after updating to 26.2
Author
Owner

@jhoopmann commented on GitHub (Feb 7, 2026):

I am also very interested in finding a way of running ollama on an M5 chip!

Just use the last release 0.15.5, tested it today on my M5 with qwen3,gpt-oss-safeguard and mistral: https://github.com/ollama/ollama/releases/tag/v0.15.5

What version of macOS do you have? Problem after updating to 26.2

Bought a brand new M5 32GB 1TB yesterday morning, just installed the Upgrade from 26.1 to 26.2, XCode Tools, Brew, Intellij IDEA, JDK 25 and Ollama 15.4 via Brew Formula. The error occurred and then i installed Ollama 15.5 with GUI from the repo package DMG without uninstalling brew formula of 15.4 but shutdown the service and startup Ollama 15.5 from Applications, so i still use the old 15.4 CLI Client, but the 15.5 API and Backend. Works for me and Spring AI, i'll wait for the Brew Formula instead of playing around with the PATH & Service Setup.

<!-- gh-comment-id:3864422489 --> @jhoopmann commented on GitHub (Feb 7, 2026): > > > I am also very interested in finding a way of running ollama on an M5 chip! > > > > > > Just use the last release 0.15.5, tested it today on my M5 with qwen3,gpt-oss-safeguard and mistral: https://github.com/ollama/ollama/releases/tag/v0.15.5 > > What version of macOS do you have? Problem after updating to 26.2 Bought a brand new M5 32GB 1TB yesterday morning, just installed the Upgrade from 26.1 to 26.2, XCode Tools, Brew, Intellij IDEA, JDK 25 and Ollama 15.4 via Brew Formula. The error occurred and then i installed Ollama 15.5 with GUI from the repo package DMG without uninstalling brew formula of 15.4 but shutdown the service and startup Ollama 15.5 from Applications, so i still use the old 15.4 CLI Client, but the 15.5 API and Backend. Works for me and Spring AI, i'll wait for the Brew Formula instead of playing around with the PATH & Service Setup.
Author
Owner

@thomas-maurice commented on GitHub (Feb 8, 2026):

I have an m5 with OSX 26.2.

$ ollama --version
MLX: Failed to load symbol: mlx_metal_device_info
Warning: could not connect to a running Ollama instance
Warning: client version is 0.15.5

Ollama is installed from Brew

I cannot run ollama run qwen3:4b

What I get is this

This is what the `ollama serve` process gives me in the logs

MLX: Failed to load symbol: mlx_metal_device_info
time=2026-02-08T12:58:59.220+01:00 level=INFO source=routes.go:1636 msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/thomas/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]"
time=2026-02-08T12:58:59.222+01:00 level=INFO source=images.go:473 msg="total blobs: 21"
time=2026-02-08T12:58:59.222+01:00 level=INFO source=images.go:480 msg="total unused blobs removed: 0"
time=2026-02-08T12:58:59.223+01:00 level=INFO source=routes.go:1689 msg="Listening on 127.0.0.1:11434 (version 0.15.5)"
time=2026-02-08T12:58:59.223+01:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-02-08T12:58:59.223+01:00 level=INFO source=server.go:430 msg="starting runner" cmd="/opt/homebrew/Cellar/ollama/0.15.5/bin/ollama runner --ollama-engine --port 62152"
time=2026-02-08T12:59:00.984+01:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="32.0 GiB" available="21.9 GiB"
time=2026-02-08T12:59:00.984+01:00 level=INFO source=routes.go:1739 msg="vram-based default context" total_vram="0 B" default_num_ctx=4096
[GIN] 2026/02/08 - 12:59:05 | 200 |     154.458µs |       127.0.0.1 | HEAD     "/"
[GIN] 2026/02/08 - 12:59:05 | 200 |   53.572208ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2026/02/08 - 12:59:05 | 200 |   39.898167ms |       127.0.0.1 | POST     "/api/show"
time=2026-02-08T12:59:05.656+01:00 level=INFO source=server.go:246 msg="enabling flash attention"
time=2026-02-08T12:59:05.657+01:00 level=INFO source=server.go:430 msg="starting runner" cmd="/opt/homebrew/Cellar/ollama/0.15.5/bin/ollama runner --ollama-engine --model /Users/thomas/.ollama/models/blobs/sha256-3e4cb14174460404e7a233e531675303b2fbf7749c02f91864fe311ab6344e4f --port 62156"
time=2026-02-08T12:59:05.657+01:00 level=INFO source=sched.go:463 msg="system memory" total="32.0 GiB" free="22.0 GiB" free_swap="0 B"
time=2026-02-08T12:59:05.657+01:00 level=INFO source=server.go:756 msg="loading model" "model layers"=37 requested=-1
MLX: Failed to load symbol: mlx_metal_device_info
time=2026-02-08T12:59:05.667+01:00 level=INFO source=runner.go:1410 msg="starting ollama engine"
time=2026-02-08T12:59:05.667+01:00 level=INFO source=runner.go:1445 msg="Server listening on 127.0.0.1:62156"
time=2026-02-08T12:59:05.668+01:00 level=INFO source=runner.go:1283 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:4 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-02-08T12:59:05.678+01:00 level=INFO source=ggml.go:136 msg="" architecture=qwen3 file_type=Q4_K_M name="Qwen3 4B Thinking 2507" description="" num_tensors=398 num_key_values=33
ggml_metal_device_init: testing tensor API for f16 support
ggml_metal_device_init: testing tensor API for bfloat support
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: error: Error Domain=MTLLibraryErrorDomain Code=3 "program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7251:76: note: in instantiation of function template specialization 'kernel_rope_multi<float>' requested here
template [[host_name("kernel_rope_multi_f32")]] kernel kernel_rope_multi_t kernel_rope_multi<float>;
                                                                           ^
program_source:7149:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7131:24: note: remove the 'if' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~
program_source:7149:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7131:28: note: remove the '&&' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~
program_source:7125:29: note: initialize the variable 'theta_base' to silence this warning
            float theta_base;
                            ^
                             = 0.0
program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7252:76: note: in instantiation of function template specialization 'kernel_rope_multi<half>' requested here
template [[host_name("kernel_rope_multi_f16")]] kernel kernel_rope_multi_t kernel_rope_multi<half>;
                                                                           ^
program_source:7149:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7131:24: note: remove the 'if' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~
program_source:7149:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7131:28: note: remove the '&&' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~
program_source:7125:29: note: initialize the variable 'theta_base' to silence this warning
            float theta_base;
                            ^
                             = 0.0
In file included from program_source:2837:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368:
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3266:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>' "Input types must match cooperative tensor types"
    static_assert(__tensor_ops_detail::__is_same_v<_leftType, leftValueType>, "Input types must match cooperative tensor types");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:394:24: note: in instantiation of function template specialization 'mpp::tensor_ops::__mutmul2d_detail::__run<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>, metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>>' requested here
    __mutmul2d_detail::__run<Descriptor, Scope, LeftOperandType,
                       ^
program_source:12128:12: note: in instantiation of function template specialization 'mpp::tensor_ops::matmul2d<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>>::run<metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>, void>' requested here
        mm.run(sB, sA, cT);
           ^
In file included from program_source:2837:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368:
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3267:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<half, bfloat>' "Input types must match cooperative tensor types"
    static_assert(__tensor_ops_detail::__is_same_v<_rightType, rightValueType>, "Input types must match cooperative tensor types");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
" UserInfo={NSLocalizedDescription=program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7251:76: note: in instantiation of function template specialization 'kernel_rope_multi<float>' requested here
template [[host_name("kernel_rope_multi_f32")]] kernel kernel_rope_multi_t kernel_rope_multi<float>;
                                                                           ^
program_source:7149:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7131:24: note: remove the 'if' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~
program_source:7149:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7131:28: note: remove the '&&' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~
program_source:7125:29: note: initialize the variable 'theta_base' to silence this warning
            float theta_base;
                            ^
                             = 0.0
program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7252:76: note: in instantiation of function template specialization 'kernel_rope_multi<half>' requested here
template [[host_name("kernel_rope_multi_f16")]] kernel kernel_rope_multi_t kernel_rope_multi<half>;
                                                                           ^
program_source:7149:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7131:24: note: remove the 'if' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~
program_source:7149:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7131:28: note: remove the '&&' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~
program_source:7125:29: note: initialize the variable 'theta_base' to silence this warning
            float theta_base;
                            ^
                             = 0.0
In file included from program_source:2837:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368:
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3266:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>' "Input types must match cooperative tensor types"
    static_assert(__tensor_ops_detail::__is_same_v<_leftType, leftValueType>, "Input types must match cooperative tensor types");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:394:24: note: in instantiation of function template specialization 'mpp::tensor_ops::__mutmul2d_detail::__run<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>, metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>>' requested here
    __mutmul2d_detail::__run<Descriptor, Scope, LeftOperandType,
                       ^
program_source:12128:12: note: in instantiation of function template specialization 'mpp::tensor_ops::matmul2d<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>>::run<metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>, void>' requested here
        mm.run(sB, sA, cT);
           ^
In file included from program_source:2837:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368:
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3267:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<half, bfloat>' "Input types must match cooperative tensor types"
    static_assert(__tensor_ops_detail::__is_same_v<_rightType, rightValueType>, "Input types must match cooperative tensor types");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
}
ggml_metal_device_init: error: failed to create library
ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s)
ggml_metal_device_init: GPU name:   Apple M5
ggml_metal_device_init: GPU family: MTLGPUFamilyApple10  (1010)
ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4  (5002)
ggml_metal_device_init: simdgroup reduction   = true
ggml_metal_device_init: simdgroup matrix mul. = true
ggml_metal_device_init: has unified memory    = true
ggml_metal_device_init: has bfloat            = true
ggml_metal_device_init: has tensor            = true
ggml_metal_device_init: use residency sets    = true
ggml_metal_device_init: use shared buffers    = true
ggml_metal_device_init: recommendedMaxWorkingSetSize  = 26800.60 MB
time=2026-02-08T12:59:05.678+01:00 level=INFO source=ggml.go:104 msg=system Metal.0.EMBED_LIBRARY=1 CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.FP16_VA=1 CPU.0.DOTPROD=1 CPU.0.LLAMAFILE=1 CPU.0.ACCELERATE=1 compiler=cgo(clang)
ggml_metal_init: allocating
ggml_metal_init: picking default device: Apple M5
ggml_metal_init: the device does not have a precompiled Metal library - this is unexpected
ggml_metal_init: will try to compile it on the fly
ggml_metal_library_init: using embedded metal library
ggml_metal_library_init: error: Error Domain=MTLLibraryErrorDomain Code=3 "program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7251:76: note: in instantiation of function template specialization 'kernel_rope_multi<float>' requested here
template [[host_name("kernel_rope_multi_f32")]] kernel kernel_rope_multi_t kernel_rope_multi<float>;
                                                                           ^
program_source:7149:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7131:24: note: remove the 'if' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~
program_source:7149:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7131:28: note: remove the '&&' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~
program_source:7125:29: note: initialize the variable 'theta_base' to silence this warning
            float theta_base;
                            ^
                             = 0.0
program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7252:76: note: in instantiation of function template specialization 'kernel_rope_multi<half>' requested here
template [[host_name("kernel_rope_multi_f16")]] kernel kernel_rope_multi_t kernel_rope_multi<half>;
                                                                           ^
program_source:7149:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7131:24: note: remove the 'if' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~
program_source:7149:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7131:28: note: remove the '&&' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~
program_source:7125:29: note: initialize the variable 'theta_base' to silence this warning
            float theta_base;
                            ^
                             = 0.0
In file included from program_source:2837:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368:
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3266:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>' "Input types must match cooperative tensor types"
    static_assert(__tensor_ops_detail::__is_same_v<_leftType, leftValueType>, "Input types must match cooperative tensor types");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:394:24: note: in instantiation of function template specialization 'mpp::tensor_ops::__mutmul2d_detail::__run<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>, metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>>' requested here
    __mutmul2d_detail::__run<Descriptor, Scope, LeftOperandType,
                       ^
program_source:12128:12: note: in instantiation of function template specialization 'mpp::tensor_ops::matmul2d<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>>::run<metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>, void>' requested here
        mm.run(sB, sA, cT);
           ^
In file included from program_source:2837:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368:
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3267:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<half, bfloat>' "Input types must match cooperative tensor types"
    static_assert(__tensor_ops_detail::__is_same_v<_rightType, rightValueType>, "Input types must match cooperative tensor types");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
" UserInfo={NSLocalizedDescription=program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7251:76: note: in instantiation of function template specialization 'kernel_rope_multi<float>' requested here
template [[host_name("kernel_rope_multi_f32")]] kernel kernel_rope_multi_t kernel_rope_multi<float>;
                                                                           ^
program_source:7149:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7131:24: note: remove the 'if' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~
program_source:7149:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7131:28: note: remove the '&&' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~
program_source:7125:29: note: initialize the variable 'theta_base' to silence this warning
            float theta_base;
                            ^
                             = 0.0
program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7252:76: note: in instantiation of function template specialization 'kernel_rope_multi<half>' requested here
template [[host_name("kernel_rope_multi_f16")]] kernel kernel_rope_multi_t kernel_rope_multi<half>;
                                                                           ^
program_source:7149:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7131:24: note: remove the 'if' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized]
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~
program_source:7149:33: note: uninitialized use occurs here
            const float theta = theta_base * pow(args.freq_base, inv_ndims*i0);
                                ^~~~~~~~~~
program_source:7131:28: note: remove the '&&' if its condition is always true
                } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t
                           ^~~~~~~~~~~~~~~~~~
program_source:7125:29: note: initialize the variable 'theta_base' to silence this warning
            float theta_base;
                            ^
                             = 0.0
In file included from program_source:2837:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368:
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3266:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>' "Input types must match cooperative tensor types"
    static_assert(__tensor_ops_detail::__is_same_v<_leftType, leftValueType>, "Input types must match cooperative tensor types");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:394:24: note: in instantiation of function template specialization 'mpp::tensor_ops::__mutmul2d_detail::__run<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>, metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>>' requested here
    __mutmul2d_detail::__run<Descriptor, Scope, LeftOperandType,
                       ^
program_source:12128:12: note: in instantiation of function template specialization 'mpp::tensor_ops::matmul2d<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>>::run<metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>, void>' requested here
        mm.run(sB, sA, cT);
           ^
In file included from program_source:2837:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10:
In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368:
/System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3267:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<half, bfloat>' "Input types must match cooperative tensor types"
    static_assert(__tensor_ops_detail::__is_same_v<_rightType, rightValueType>, "Input types must match cooperative tensor types");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
}
ggml_metal_init: error: failed to initialize the Metal library
ggml_backend_metal_device_init: error: failed to allocate context
ggml-backend.cpp:258: GGML_ASSERT(backend) failed
WARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info.
WARNING: GGML_BACKTRACE_LLDB may cause native MacOS Terminal.app to crash.
See: https://github.com/ggml-org/llama.cpp/pull/17869
0   ollama                              0x000000010370952c ggml_print_backtrace + 276
1   ollama                              0x0000000103709718 ggml_abort + 156
2   ollama                              0x00000001037207cc ggml_backend_get_default_buffer_type + 76
3   ollama                              0x00000001036cefc8 _cgo_78bf35ffe8be_Cfunc_ggml_backend_get_default_buffer_type + 36
4   ollama                              0x0000000102ae1b4c ollama + 531276
SIGABRT: abort
PC=0x1900135b0 m=7 sigcode=0
signal arrived during cgo execution

goroutine 12 gp=0x14000103340 m=7 mp=0x14000100808 [syscall]:
runtime.cgocall(0x1036cefa4, 0x1400004ec48)
	runtime/cgocall.go:167 +0x44 fp=0x1400004ec10 sp=0x1400004ebd0 pc=0x102ad66c4
github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_get_default_buffer_type(0x0)
	_cgo_gotypes.go:872 +0x30 fp=0x1400004ec40 sp=0x1400004ec10 pc=0x102edcc60
github.com/ollama/ollama/ml/backend/ggml.New.func17(...)
	github.com/ollama/ollama/ml/backend/ggml/ggml.go:361
github.com/ollama/ollama/ml/backend/ggml.New({0x16d39ea2d, 0x6a}, {0x0, 0x4, {0x10487faa0, 0x0, 0x0}, 0x1})
	github.com/ollama/ollama/ml/backend/ggml/ggml.go:361 +0x1298 fp=0x1400004f540 sp=0x1400004ec40 pc=0x102ee2c38
github.com/ollama/ollama/ml.NewBackend({0x16d39ea2d, 0x6a}, {0x0, 0x4, {0x10487faa0, 0x0, 0x0}, 0x1})
	github.com/ollama/ollama/ml/backend.go:88 +0x78 fp=0x1400004f590 sp=0x1400004f540 pc=0x102e6ab78
github.com/ollama/ollama/model.New({0x16d39ea2d?, 0x0?}, {0x0, 0x4, {0x10487faa0, 0x0, 0x0}, 0x1})
	github.com/ollama/ollama/model/model.go:113 +0x3c fp=0x1400004f650 sp=0x1400004f590 pc=0x102ef47bc
github.com/ollama/ollama/runner/ollamarunner.(*Server).allocModel(0x140006601e0, {0x16d39ea2d?, 0x0?}, {0x0, 0x4, {0x10487faa0, 0x0, 0x0}, 0x1}, {0x0?, ...}, ...)
	github.com/ollama/ollama/runner/ollamarunner/runner.go:1201 +0x98 fp=0x1400004f700 sp=0x1400004f650 pc=0x102faaba8
github.com/ollama/ollama/runner/ollamarunner.(*Server).load(0x140006601e0, {0x103e9ac20, 0x140006303c0}, 0x1400045e000)
	github.com/ollama/ollama/runner/ollamarunner/runner.go:1310 +0x428 fp=0x1400004fa90 sp=0x1400004f700 pc=0x102fab668
github.com/ollama/ollama/runner/ollamarunner.(*Server).load-fm({0x103e9ac20?, 0x140006303c0?}, 0x140004d7b18?)
	<autogenerated>:1 +0x40 fp=0x1400004fac0 sp=0x1400004fa90 pc=0x102fad3f0
net/http.HandlerFunc.ServeHTTP(0x140005aad80?, {0x103e9ac20?, 0x140006303c0?}, 0x140004d7b00?)
	net/http/server.go:2322 +0x38 fp=0x1400004faf0 sp=0x1400004fac0 pc=0x102d7dfb8
net/http.(*ServeMux).ServeHTTP(0x10?, {0x103e9ac20, 0x140006303c0}, 0x1400045e000)
	net/http/server.go:2861 +0x190 fp=0x1400004fb40 sp=0x1400004faf0 pc=0x102d7fa50
net/http.serverHandler.ServeHTTP({0x103e974f0?}, {0x103e9ac20?, 0x140006303c0?}, 0x1?)
	net/http/server.go:3340 +0xb0 fp=0x1400004fb70 sp=0x1400004fb40 pc=0x102d9a240
net/http.(*conn).serve(0x1400028c6c0, {0x103e9d1c8, 0x14000690ff0})
	net/http/server.go:2109 +0x528 fp=0x1400004ffa0 sp=0x1400004fb70 pc=0x102d7c3a8
net/http.(*Server).Serve.gowrap3()
	net/http/server.go:3493 +0x2c fp=0x1400004ffd0 sp=0x1400004ffa0 pc=0x102d816dc
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400004ffd0 sp=0x1400004ffd0 pc=0x102ae1d44
created by net/http.(*Server).Serve in goroutine 1
	net/http/server.go:3493 +0x384

goroutine 1 gp=0x140000021c0 m=nil [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000c3f720 sp=0x14000c3f700 pc=0x102ad9a90
runtime.netpollblock(0x1400012d7b8?, 0x2b5bccc?, 0x1?)
	runtime/netpoll.go:575 +0x150 fp=0x14000c3f760 sp=0x14000c3f720 pc=0x102a9f660
internal/poll.runtime_pollWait(0x13021fe00, 0x72)
	runtime/netpoll.go:351 +0xa0 fp=0x14000c3f790 sp=0x14000c3f760 pc=0x102ad8cc0
internal/poll.(*pollDesc).wait(0x140001dd300?, 0x102b5dd5c?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x14000c3f7c0 sp=0x14000c3f790 pc=0x102b577c8
internal/poll.(*pollDesc).waitRead(...)
	internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0x140001dd300)
	internal/poll/fd_unix.go:613 +0x21c fp=0x14000c3f870 sp=0x14000c3f7c0 pc=0x102b5bdac
net.(*netFD).accept(0x140001dd300)
	net/fd_unix.go:161 +0x28 fp=0x14000c3f930 sp=0x14000c3f870 pc=0x102bbdbc8
net.(*TCPListener).accept(0x14000624b40)
	net/tcpsock_posix.go:159 +0x24 fp=0x14000c3f980 sp=0x14000c3f930 pc=0x102bd1754
net.(*TCPListener).Accept(0x14000624b40)
	net/tcpsock.go:380 +0x2c fp=0x14000c3f9c0 sp=0x14000c3f980 pc=0x102bd07fc
net/http.(*onceCloseListener).Accept(0x1400028c6c0?)
	<autogenerated>:1 +0x2c fp=0x14000c3f9e0 sp=0x14000c3f9c0 pc=0x102da61ec
net/http.(*Server).Serve(0x14000125300, {0x103e9aa40, 0x14000624b40})
	net/http/server.go:3463 +0x24c fp=0x14000c3fb10 sp=0x14000c3f9e0 pc=0x102d8137c
github.com/ollama/ollama/runner/ollamarunner.Execute({0x140000340a0, 0x4, 0x4})
	github.com/ollama/ollama/runner/ollamarunner/runner.go:1446 +0x7a4 fp=0x14000c3fce0 sp=0x14000c3fb10 pc=0x102face94
github.com/ollama/ollama/runner.Execute({0x14000034080?, 0x0?, 0x0?})
	github.com/ollama/ollama/runner/runner.go:28 +0x184 fp=0x14000c3fd10 sp=0x14000c3fce0 pc=0x102ffe154
github.com/ollama/ollama/cmd.NewCLI.func3(0x14000125100?, {0x10396e566?, 0x4?, 0x10396e56a?})
	github.com/ollama/ollama/cmd/cmd.go:1979 +0x50 fp=0x14000c3fd40 sp=0x14000c3fd10 pc=0x103682030
github.com/spf13/cobra.(*Command).execute(0x140004b3508, {0x14000595ea0, 0x5, 0x5})
	github.com/spf13/cobra@v1.7.0/command.go:940 +0x5e0 fp=0x14000c3fe60 sp=0x14000c3fd40 pc=0x102c28cf0
github.com/spf13/cobra.(*Command).ExecuteC(0x140004a4f08)
	github.com/spf13/cobra@v1.7.0/command.go:1068 +0x2ec fp=0x14000c3ff20 sp=0x14000c3fe60 pc=0x102c293cc
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
	github.com/ollama/ollama/main.go:12 +0x54 fp=0x14000c3ff40 sp=0x14000c3ff20 pc=0x103682b54
runtime.main()
	runtime/proc.go:285 +0x278 fp=0x14000c3ffd0 sp=0x14000c3ff40 pc=0x102aa6118
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000c3ffd0 sp=0x14000c3ffd0 pc=0x102ae1d44

goroutine 2 gp=0x14000002c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000072f90 sp=0x14000072f70 pc=0x102ad9a90
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.forcegchelper()
	runtime/proc.go:373 +0xb4 fp=0x14000072fd0 sp=0x14000072f90 pc=0x102aa6464
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000072fd0 sp=0x14000072fd0 pc=0x102ae1d44
created by runtime.init.7 in goroutine 1
	runtime/proc.go:361 +0x24

goroutine 3 gp=0x14000003180 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000073760 sp=0x14000073740 pc=0x102ad9a90
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.bgsweep(0x1400007e000)
	runtime/mgcsweep.go:323 +0x104 fp=0x140000737b0 sp=0x14000073760 pc=0x102a90f24
runtime.gcenable.gowrap1()
	runtime/mgc.go:212 +0x28 fp=0x140000737d0 sp=0x140000737b0 pc=0x102a84b78
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140000737d0 sp=0x140000737d0 pc=0x102ae1d44
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:212 +0x6c

goroutine 4 gp=0x14000003340 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x103b56028?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000073f60 sp=0x14000073f40 pc=0x102ad9a90
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.(*scavengerState).park(0x104838d80)
	runtime/mgcscavenge.go:425 +0x5c fp=0x14000073f90 sp=0x14000073f60 pc=0x102a8ea3c
runtime.bgscavenge(0x1400007e000)
	runtime/mgcscavenge.go:658 +0xac fp=0x14000073fb0 sp=0x14000073f90 pc=0x102a8efdc
runtime.gcenable.gowrap2()
	runtime/mgc.go:213 +0x28 fp=0x14000073fd0 sp=0x14000073fb0 pc=0x102a84b18
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000073fd0 sp=0x14000073fd0 pc=0x102ae1d44
created by runtime.gcenable in goroutine 1
	runtime/mgc.go:213 +0xac

goroutine 5 gp=0x14000003c00 m=nil [finalizer wait]:
runtime.gopark(0x103d33f40?, 0x1e?, 0xc8?, 0x25?, 0x1000000000000?)
	runtime/proc.go:460 +0xc0 fp=0x14000072580 sp=0x14000072560 pc=0x102ad9a90
runtime.runFinalizers()
	runtime/mfinal.go:210 +0x104 fp=0x140000727d0 sp=0x14000072580 pc=0x102a83b64
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140000727d0 sp=0x140000727d0 pc=0x102ae1d44
created by runtime.createfing in goroutine 1
	runtime/mfinal.go:172 +0x78

goroutine 6 gp=0x140001ec700 m=nil [cleanup wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000074740 sp=0x14000074720 pc=0x102ad9a90
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.(*cleanupQueue).dequeue(0x104839c60)
	runtime/mcleanup.go:439 +0x110 fp=0x14000074780 sp=0x14000074740 pc=0x102a81050
runtime.runCleanups()
	runtime/mcleanup.go:635 +0x40 fp=0x140000747d0 sp=0x14000074780 pc=0x102a81860
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140000747d0 sp=0x140000747d0 pc=0x102ae1d44
created by runtime.(*cleanupQueue).createGs in goroutine 1
	runtime/mcleanup.go:589 +0x108

goroutine 7 gp=0x140001ece00 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000074f10 sp=0x14000074ef0 pc=0x102ad9a90
runtime.gcBgMarkWorker(0x140000458f0)
	runtime/mgc.go:1463 +0xe0 fp=0x14000074fb0 sp=0x14000074f10 pc=0x102a871f0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x14000074fd0 sp=0x14000074fb0 pc=0x102a870d8
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000074fd0 sp=0x14000074fd0 pc=0x102ae1d44
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 18 gp=0x14000504000 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006e710 sp=0x1400006e6f0 pc=0x102ad9a90
runtime.gcBgMarkWorker(0x140000458f0)
	runtime/mgc.go:1463 +0xe0 fp=0x1400006e7b0 sp=0x1400006e710 pc=0x102a871f0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400006e7d0 sp=0x1400006e7b0 pc=0x102a870d8
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006e7d0 sp=0x1400006e7d0 pc=0x102ae1d44
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 34 gp=0x14000102380 m=nil [GC worker (idle)]:
runtime.gopark(0x105fb7031c983?, 0x3?, 0xba?, 0xd0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000118710 sp=0x140001186f0 pc=0x102ad9a90
runtime.gcBgMarkWorker(0x140000458f0)
	runtime/mgc.go:1463 +0xe0 fp=0x140001187b0 sp=0x14000118710 pc=0x102a871f0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x140001187d0 sp=0x140001187b0 pc=0x102a870d8
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140001187d0 sp=0x140001187d0 pc=0x102ae1d44
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 8 gp=0x140001ecfc0 m=nil [GC worker (idle)]:
runtime.gopark(0x105fb702b414f?, 0x3?, 0xe1?, 0x9d?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000075710 sp=0x140000756f0 pc=0x102ad9a90
runtime.gcBgMarkWorker(0x140000458f0)
	runtime/mgc.go:1463 +0xe0 fp=0x140000757b0 sp=0x14000075710 pc=0x102a871f0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x140000757d0 sp=0x140000757b0 pc=0x102a870d8
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140000757d0 sp=0x140000757d0 pc=0x102ae1d44
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 19 gp=0x140005041c0 m=nil [GC worker (idle)]:
runtime.gopark(0x105fb702b41f6?, 0x0?, 0x0?, 0x0?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x140004cdf10 sp=0x140004cdef0 pc=0x102ad9a90
runtime.gcBgMarkWorker(0x140000458f0)
	runtime/mgc.go:1463 +0xe0 fp=0x140004cdfb0 sp=0x140004cdf10 pc=0x102a871f0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x140004cdfd0 sp=0x140004cdfb0 pc=0x102a870d8
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140004cdfd0 sp=0x140004cdfd0 pc=0x102ae1d44
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 35 gp=0x14000102540 m=nil [GC worker (idle)]:
runtime.gopark(0x104881fa0?, 0x3?, 0x2d?, 0x31?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000118f10 sp=0x14000118ef0 pc=0x102ad9a90
runtime.gcBgMarkWorker(0x140000458f0)
	runtime/mgc.go:1463 +0xe0 fp=0x14000118fb0 sp=0x14000118f10 pc=0x102a871f0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x14000118fd0 sp=0x14000118fb0 pc=0x102a870d8
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000118fd0 sp=0x14000118fd0 pc=0x102ae1d44
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 9 gp=0x140001ed180 m=nil [GC worker (idle)]:
runtime.gopark(0x105fb702e3a61?, 0x3?, 0x97?, 0x4f?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000075f10 sp=0x14000075ef0 pc=0x102ad9a90
runtime.gcBgMarkWorker(0x140000458f0)
	runtime/mgc.go:1463 +0xe0 fp=0x14000075fb0 sp=0x14000075f10 pc=0x102a871f0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x14000075fd0 sp=0x14000075fb0 pc=0x102a870d8
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000075fd0 sp=0x14000075fd0 pc=0x102ae1d44
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 20 gp=0x14000504380 m=nil [GC worker (idle)]:
runtime.gopark(0x104881fa0?, 0x1?, 0x58?, 0xf5?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400006f710 sp=0x1400006f6f0 pc=0x102ad9a90
runtime.gcBgMarkWorker(0x140000458f0)
	runtime/mgc.go:1463 +0xe0 fp=0x1400006f7b0 sp=0x1400006f710 pc=0x102a871f0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400006f7d0 sp=0x1400006f7b0 pc=0x102a870d8
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400006f7d0 sp=0x1400006f7d0 pc=0x102ae1d44
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 10 gp=0x140001ed340 m=nil [GC worker (idle)]:
runtime.gopark(0x105fb7032b4b3?, 0x1?, 0xe4?, 0x45?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x14000114710 sp=0x140001146f0 pc=0x102ad9a90
runtime.gcBgMarkWorker(0x140000458f0)
	runtime/mgc.go:1463 +0xe0 fp=0x140001147b0 sp=0x14000114710 pc=0x102a871f0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x140001147d0 sp=0x140001147b0 pc=0x102a870d8
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x140001147d0 sp=0x140001147d0 pc=0x102ae1d44
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 36 gp=0x14000102a80 m=nil [GC worker (idle)]:
runtime.gopark(0x105fb7032c32f?, 0x3?, 0x9b?, 0x66?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400008bf10 sp=0x1400008bef0 pc=0x102ad9a90
runtime.gcBgMarkWorker(0x140000458f0)
	runtime/mgc.go:1463 +0xe0 fp=0x1400008bfb0 sp=0x1400008bf10 pc=0x102a871f0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1373 +0x28 fp=0x1400008bfd0 sp=0x1400008bfb0 pc=0x102a870d8
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400008bfd0 sp=0x1400008bfd0 pc=0x102ae1d44
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1373 +0x140

goroutine 11 gp=0x14000103180 m=nil [sync.WaitGroup.Wait]:
runtime.gopark(0x10484d8c0?, 0x0?, 0x0?, 0x20?, 0x0?)
	runtime/proc.go:460 +0xc0 fp=0x1400008aa80 sp=0x1400008aa60 pc=0x102ad9a90
runtime.goparkunlock(...)
	runtime/proc.go:466
runtime.semacquire1(0x14000660298, 0x0, 0x1, 0x0, 0x19)
	runtime/sema.go:192 +0x204 fp=0x1400008aad0 sp=0x1400008aa80 pc=0x102aba844
sync.runtime_SemacquireWaitGroup(0x0?, 0x0?)
	runtime/sema.go:114 +0x38 fp=0x1400008ab10 sp=0x1400008aad0 pc=0x102adb468
sync.(*WaitGroup).Wait(0x14000660290)
	sync/waitgroup.go:206 +0xa8 fp=0x1400008ab40 sp=0x1400008ab10 pc=0x102aed7a8
github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0x140006601e0, {0x103e9d200, 0x14000626b40})
	github.com/ollama/ollama/runner/ollamarunner/runner.go:441 +0x38 fp=0x1400008afa0 sp=0x1400008ab40 pc=0x102fa57b8
github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap1()
	github.com/ollama/ollama/runner/ollamarunner/runner.go:1423 +0x2c fp=0x1400008afd0 sp=0x1400008afa0 pc=0x102fad07c
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x1400008afd0 sp=0x1400008afd0 pc=0x102ae1d44
created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1
	github.com/ollama/ollama/runner/ollamarunner/runner.go:1423 +0x414

goroutine 50 gp=0x14000504540 m=nil [IO wait]:
runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x102afd640?)
	runtime/proc.go:460 +0xc0 fp=0x14000116d80 sp=0x14000116d60 pc=0x102ad9a90
runtime.netpollblock(0x0?, 0x0?, 0x0?)
	runtime/netpoll.go:575 +0x150 fp=0x14000116dc0 sp=0x14000116d80 pc=0x102a9f660
internal/poll.runtime_pollWait(0x13021fc00, 0x72)
	runtime/netpoll.go:351 +0xa0 fp=0x14000116df0 sp=0x14000116dc0 pc=0x102ad8cc0
internal/poll.(*pollDesc).wait(0x140001dd380?, 0x14000624ba1?, 0x0)
	internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x14000116e20 sp=0x14000116df0 pc=0x102b577c8
internal/poll.(*pollDesc).waitRead(...)
	internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x140001dd380, {0x14000624ba1, 0x1, 0x1})
	internal/poll/fd_unix.go:165 +0x1e0 fp=0x14000116ec0 sp=0x14000116e20 pc=0x102b589e0
net.(*netFD).Read(0x140001dd380, {0x14000624ba1?, 0x0?, 0x0?})
	net/fd_posix.go:68 +0x28 fp=0x14000116f10 sp=0x14000116ec0 pc=0x102bbc3c8
net.(*conn).Read(0x140005a6450, {0x14000624ba1?, 0x0?, 0x0?})
	net/net.go:196 +0x34 fp=0x14000116f60 sp=0x14000116f10 pc=0x102bc89a4
net/http.(*connReader).backgroundRead(0x14000624b80)
	net/http/server.go:702 +0x38 fp=0x14000116fb0 sp=0x14000116f60 pc=0x102d77418
net/http.(*connReader).startBackgroundRead.gowrap2()
	net/http/server.go:698 +0x28 fp=0x14000116fd0 sp=0x14000116fb0 pc=0x102d77308
runtime.goexit({})
	runtime/asm_arm64.s:1268 +0x4 fp=0x14000116fd0 sp=0x14000116fd0 pc=0x102ae1d44
created by net/http.(*connReader).startBackgroundRead in goroutine 12
	net/http/server.go:698 +0xb8

r0      0x0
r1      0x0
r2      0x0
r3      0x0
r4      0x18ff55a08
r5      0x1703d5d20
r6      0x36
r7      0x0
r8      0xf6eed3141e993b36
r9      0xf6eed3156ea44b36
r10     0x2
r11     0x10000000000
r12     0xfffffffd
r13     0x0
r14     0x0
r15     0x0
r16     0x148
r17     0x1fe52c990
r18     0x0
r19     0x6
r20     0x1c03
r21     0x1703d70e0
r22     0x0
r23     0x0
r24     0x0
r25     0x1400005eee8
r26     0x103e84f50
r27     0x818
r28     0x140001028c0
r29     0x1703d6610
lr      0x19004d888
sp      0x1703d65f0
pc      0x1900135b0
fault   0x1900135b0
time=2026-02-08T12:59:07.406+01:00 level=ERROR source=server.go:1204 msg="do load request" error="Post \"http://127.0.0.1:62156/load\": EOF"
time=2026-02-08T12:59:07.406+01:00 level=ERROR source=server.go:303 msg="llama runner terminated" error="exit status 2"
time=2026-02-08T12:59:07.406+01:00 level=ERROR source=server.go:1204 msg="do load request" error="Post \"http://127.0.0.1:62156/load\": dial tcp 127.0.0.1:62156: connect: connection refused"
time=2026-02-08T12:59:07.406+01:00 level=INFO source=sched.go:490 msg="Load failed" model=/Users/thomas/.ollama/models/blobs/sha256-3e4cb14174460404e7a233e531675303b2fbf7749c02f91864fe311ab6344e4f error="model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details"
[GIN] 2026/02/08 - 12:59:07 | 500 |  1.821442917s |       127.0.0.1 | POST     "/api/generate"

<!-- gh-comment-id:3867066335 --> @thomas-maurice commented on GitHub (Feb 8, 2026): I have an m5 with OSX 26.2. ``` $ ollama --version MLX: Failed to load symbol: mlx_metal_device_info Warning: could not connect to a running Ollama instance Warning: client version is 0.15.5 ``` Ollama is installed from Brew I cannot run `ollama run qwen3:4b` What I get is this <details> <summary>This is what the `ollama serve` process gives me in the logs</summary> ``` MLX: Failed to load symbol: mlx_metal_device_info time=2026-02-08T12:58:59.220+01:00 level=INFO source=routes.go:1636 msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/thomas/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]" time=2026-02-08T12:58:59.222+01:00 level=INFO source=images.go:473 msg="total blobs: 21" time=2026-02-08T12:58:59.222+01:00 level=INFO source=images.go:480 msg="total unused blobs removed: 0" time=2026-02-08T12:58:59.223+01:00 level=INFO source=routes.go:1689 msg="Listening on 127.0.0.1:11434 (version 0.15.5)" time=2026-02-08T12:58:59.223+01:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-02-08T12:58:59.223+01:00 level=INFO source=server.go:430 msg="starting runner" cmd="/opt/homebrew/Cellar/ollama/0.15.5/bin/ollama runner --ollama-engine --port 62152" time=2026-02-08T12:59:00.984+01:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="32.0 GiB" available="21.9 GiB" time=2026-02-08T12:59:00.984+01:00 level=INFO source=routes.go:1739 msg="vram-based default context" total_vram="0 B" default_num_ctx=4096 [GIN] 2026/02/08 - 12:59:05 | 200 | 154.458µs | 127.0.0.1 | HEAD "/" [GIN] 2026/02/08 - 12:59:05 | 200 | 53.572208ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/02/08 - 12:59:05 | 200 | 39.898167ms | 127.0.0.1 | POST "/api/show" time=2026-02-08T12:59:05.656+01:00 level=INFO source=server.go:246 msg="enabling flash attention" time=2026-02-08T12:59:05.657+01:00 level=INFO source=server.go:430 msg="starting runner" cmd="/opt/homebrew/Cellar/ollama/0.15.5/bin/ollama runner --ollama-engine --model /Users/thomas/.ollama/models/blobs/sha256-3e4cb14174460404e7a233e531675303b2fbf7749c02f91864fe311ab6344e4f --port 62156" time=2026-02-08T12:59:05.657+01:00 level=INFO source=sched.go:463 msg="system memory" total="32.0 GiB" free="22.0 GiB" free_swap="0 B" time=2026-02-08T12:59:05.657+01:00 level=INFO source=server.go:756 msg="loading model" "model layers"=37 requested=-1 MLX: Failed to load symbol: mlx_metal_device_info time=2026-02-08T12:59:05.667+01:00 level=INFO source=runner.go:1410 msg="starting ollama engine" time=2026-02-08T12:59:05.667+01:00 level=INFO source=runner.go:1445 msg="Server listening on 127.0.0.1:62156" time=2026-02-08T12:59:05.668+01:00 level=INFO source=runner.go:1283 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:4096 KvCacheType: NumThreads:4 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-02-08T12:59:05.678+01:00 level=INFO source=ggml.go:136 msg="" architecture=qwen3 file_type=Q4_K_M name="Qwen3 4B Thinking 2507" description="" num_tensors=398 num_key_values=33 ggml_metal_device_init: testing tensor API for f16 support ggml_metal_device_init: testing tensor API for bfloat support ggml_metal_library_init: using embedded metal library ggml_metal_library_init: error: Error Domain=MTLLibraryErrorDomain Code=3 "program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7251:76: note: in instantiation of function template specialization 'kernel_rope_multi<float>' requested here template [[host_name("kernel_rope_multi_f32")]] kernel kernel_rope_multi_t kernel_rope_multi<float>; ^ program_source:7149:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7131:24: note: remove the 'if' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~ program_source:7149:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7131:28: note: remove the '&&' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~ program_source:7125:29: note: initialize the variable 'theta_base' to silence this warning float theta_base; ^ = 0.0 program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7252:76: note: in instantiation of function template specialization 'kernel_rope_multi<half>' requested here template [[host_name("kernel_rope_multi_f16")]] kernel kernel_rope_multi_t kernel_rope_multi<half>; ^ program_source:7149:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7131:24: note: remove the 'if' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~ program_source:7149:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7131:28: note: remove the '&&' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~ program_source:7125:29: note: initialize the variable 'theta_base' to silence this warning float theta_base; ^ = 0.0 In file included from program_source:2837: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368: /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3266:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>' "Input types must match cooperative tensor types" static_assert(__tensor_ops_detail::__is_same_v<_leftType, leftValueType>, "Input types must match cooperative tensor types"); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:394:24: note: in instantiation of function template specialization 'mpp::tensor_ops::__mutmul2d_detail::__run<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>, metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>>' requested here __mutmul2d_detail::__run<Descriptor, Scope, LeftOperandType, ^ program_source:12128:12: note: in instantiation of function template specialization 'mpp::tensor_ops::matmul2d<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>>::run<metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>, void>' requested here mm.run(sB, sA, cT); ^ In file included from program_source:2837: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368: /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3267:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<half, bfloat>' "Input types must match cooperative tensor types" static_assert(__tensor_ops_detail::__is_same_v<_rightType, rightValueType>, "Input types must match cooperative tensor types"); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ " UserInfo={NSLocalizedDescription=program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7251:76: note: in instantiation of function template specialization 'kernel_rope_multi<float>' requested here template [[host_name("kernel_rope_multi_f32")]] kernel kernel_rope_multi_t kernel_rope_multi<float>; ^ program_source:7149:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7131:24: note: remove the 'if' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~ program_source:7149:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7131:28: note: remove the '&&' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~ program_source:7125:29: note: initialize the variable 'theta_base' to silence this warning float theta_base; ^ = 0.0 program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7252:76: note: in instantiation of function template specialization 'kernel_rope_multi<half>' requested here template [[host_name("kernel_rope_multi_f16")]] kernel kernel_rope_multi_t kernel_rope_multi<half>; ^ program_source:7149:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7131:24: note: remove the 'if' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~ program_source:7149:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7131:28: note: remove the '&&' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~ program_source:7125:29: note: initialize the variable 'theta_base' to silence this warning float theta_base; ^ = 0.0 In file included from program_source:2837: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368: /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3266:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>' "Input types must match cooperative tensor types" static_assert(__tensor_ops_detail::__is_same_v<_leftType, leftValueType>, "Input types must match cooperative tensor types"); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:394:24: note: in instantiation of function template specialization 'mpp::tensor_ops::__mutmul2d_detail::__run<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>, metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>>' requested here __mutmul2d_detail::__run<Descriptor, Scope, LeftOperandType, ^ program_source:12128:12: note: in instantiation of function template specialization 'mpp::tensor_ops::matmul2d<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>>::run<metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>, void>' requested here mm.run(sB, sA, cT); ^ In file included from program_source:2837: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368: /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3267:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<half, bfloat>' "Input types must match cooperative tensor types" static_assert(__tensor_ops_detail::__is_same_v<_rightType, rightValueType>, "Input types must match cooperative tensor types"); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ } ggml_metal_device_init: error: failed to create library ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: Apple M5 ggml_metal_device_init: GPU family: MTLGPUFamilyApple10 (1010) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = true ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 26800.60 MB time=2026-02-08T12:59:05.678+01:00 level=INFO source=ggml.go:104 msg=system Metal.0.EMBED_LIBRARY=1 CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.FP16_VA=1 CPU.0.DOTPROD=1 CPU.0.LLAMAFILE=1 CPU.0.ACCELERATE=1 compiler=cgo(clang) ggml_metal_init: allocating ggml_metal_init: picking default device: Apple M5 ggml_metal_init: the device does not have a precompiled Metal library - this is unexpected ggml_metal_init: will try to compile it on the fly ggml_metal_library_init: using embedded metal library ggml_metal_library_init: error: Error Domain=MTLLibraryErrorDomain Code=3 "program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7251:76: note: in instantiation of function template specialization 'kernel_rope_multi<float>' requested here template [[host_name("kernel_rope_multi_f32")]] kernel kernel_rope_multi_t kernel_rope_multi<float>; ^ program_source:7149:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7131:24: note: remove the 'if' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~ program_source:7149:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7131:28: note: remove the '&&' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~ program_source:7125:29: note: initialize the variable 'theta_base' to silence this warning float theta_base; ^ = 0.0 program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7252:76: note: in instantiation of function template specialization 'kernel_rope_multi<half>' requested here template [[host_name("kernel_rope_multi_f16")]] kernel kernel_rope_multi_t kernel_rope_multi<half>; ^ program_source:7149:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7131:24: note: remove the 'if' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~ program_source:7149:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7131:28: note: remove the '&&' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~ program_source:7125:29: note: initialize the variable 'theta_base' to silence this warning float theta_base; ^ = 0.0 In file included from program_source:2837: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368: /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3266:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>' "Input types must match cooperative tensor types" static_assert(__tensor_ops_detail::__is_same_v<_leftType, leftValueType>, "Input types must match cooperative tensor types"); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:394:24: note: in instantiation of function template specialization 'mpp::tensor_ops::__mutmul2d_detail::__run<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>, metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>>' requested here __mutmul2d_detail::__run<Descriptor, Scope, LeftOperandType, ^ program_source:12128:12: note: in instantiation of function template specialization 'mpp::tensor_ops::matmul2d<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>>::run<metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>, void>' requested here mm.run(sB, sA, cT); ^ In file included from program_source:2837: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368: /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3267:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<half, bfloat>' "Input types must match cooperative tensor types" static_assert(__tensor_ops_detail::__is_same_v<_rightType, rightValueType>, "Input types must match cooperative tensor types"); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ " UserInfo={NSLocalizedDescription=program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7251:76: note: in instantiation of function template specialization 'kernel_rope_multi<float>' requested here template [[host_name("kernel_rope_multi_f32")]] kernel kernel_rope_multi_t kernel_rope_multi<float>; ^ program_source:7149:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7131:24: note: remove the 'if' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~ program_source:7149:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7131:28: note: remove the '&&' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~ program_source:7125:29: note: initialize the variable 'theta_base' to silence this warning float theta_base; ^ = 0.0 program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7252:76: note: in instantiation of function template specialization 'kernel_rope_multi<half>' requested here template [[host_name("kernel_rope_multi_f16")]] kernel kernel_rope_multi_t kernel_rope_multi<half>; ^ program_source:7149:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7131:24: note: remove the 'if' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ program_source:7131:28: warning: variable 'theta_base' is used uninitialized whenever '&&' condition is false [-Wsometimes-uninitialized] } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~ program_source:7149:33: note: uninitialized use occurs here const float theta = theta_base * pow(args.freq_base, inv_ndims*i0); ^~~~~~~~~~ program_source:7131:28: note: remove the '&&' if its condition is always true } else if (sector % 3 == 0 && sector < 3 * args.sect_0) { // t ^~~~~~~~~~~~~~~~~~ program_source:7125:29: note: initialize the variable 'theta_base' to silence this warning float theta_base; ^ = 0.0 In file included from program_source:2837: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368: /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3266:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>' "Input types must match cooperative tensor types" static_assert(__tensor_ops_detail::__is_same_v<_leftType, leftValueType>, "Input types must match cooperative tensor types"); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:394:24: note: in instantiation of function template specialization 'mpp::tensor_ops::__mutmul2d_detail::__run<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>, metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>>' requested here __mutmul2d_detail::__run<Descriptor, Scope, LeftOperandType, ^ program_source:12128:12: note: in instantiation of function template specialization 'mpp::tensor_ops::matmul2d<{32, 64, 32, false, true, false, 1}, metal::execution_simdgroups<4>>::run<metal::tensor<threadgroup half, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::tensor<threadgroup bfloat, metal::extents<int, 18446744073709551615, 18446744073709551615>, metal::tensor_inline>, metal::cooperative_tensor<float, metal::extents<int, 18446744073709551615, 18446744073709551615>, mpp::tensor_ops::__mutmul2d_detail::__operand_layout<{32, 64, 32, false, true, false, 1}, mpp::tensor_ops::__mutmul2d_detail::__matmul2d_cooperative_operand_index::destination, metal::execution_simdgroups<4>, bfloat, half, float, int>>, void>' requested here mm.run(sB, sA, cT); ^ In file included from program_source:2837: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MetalPerformancePrimitives.h:10: In file included from /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/MPPTensorOpsMatMul2d.h:368: /System/Library/Frameworks/MetalPerformancePrimitives.framework/Headers/__impl/MPPTensorOpsMatMul2dImpl.h:3267:5: error: static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<half, bfloat>' "Input types must match cooperative tensor types" static_assert(__tensor_ops_detail::__is_same_v<_rightType, rightValueType>, "Input types must match cooperative tensor types"); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ } ggml_metal_init: error: failed to initialize the Metal library ggml_backend_metal_device_init: error: failed to allocate context ggml-backend.cpp:258: GGML_ASSERT(backend) failed WARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info. WARNING: GGML_BACKTRACE_LLDB may cause native MacOS Terminal.app to crash. See: https://github.com/ggml-org/llama.cpp/pull/17869 0 ollama 0x000000010370952c ggml_print_backtrace + 276 1 ollama 0x0000000103709718 ggml_abort + 156 2 ollama 0x00000001037207cc ggml_backend_get_default_buffer_type + 76 3 ollama 0x00000001036cefc8 _cgo_78bf35ffe8be_Cfunc_ggml_backend_get_default_buffer_type + 36 4 ollama 0x0000000102ae1b4c ollama + 531276 SIGABRT: abort PC=0x1900135b0 m=7 sigcode=0 signal arrived during cgo execution goroutine 12 gp=0x14000103340 m=7 mp=0x14000100808 [syscall]: runtime.cgocall(0x1036cefa4, 0x1400004ec48) runtime/cgocall.go:167 +0x44 fp=0x1400004ec10 sp=0x1400004ebd0 pc=0x102ad66c4 github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_get_default_buffer_type(0x0) _cgo_gotypes.go:872 +0x30 fp=0x1400004ec40 sp=0x1400004ec10 pc=0x102edcc60 github.com/ollama/ollama/ml/backend/ggml.New.func17(...) github.com/ollama/ollama/ml/backend/ggml/ggml.go:361 github.com/ollama/ollama/ml/backend/ggml.New({0x16d39ea2d, 0x6a}, {0x0, 0x4, {0x10487faa0, 0x0, 0x0}, 0x1}) github.com/ollama/ollama/ml/backend/ggml/ggml.go:361 +0x1298 fp=0x1400004f540 sp=0x1400004ec40 pc=0x102ee2c38 github.com/ollama/ollama/ml.NewBackend({0x16d39ea2d, 0x6a}, {0x0, 0x4, {0x10487faa0, 0x0, 0x0}, 0x1}) github.com/ollama/ollama/ml/backend.go:88 +0x78 fp=0x1400004f590 sp=0x1400004f540 pc=0x102e6ab78 github.com/ollama/ollama/model.New({0x16d39ea2d?, 0x0?}, {0x0, 0x4, {0x10487faa0, 0x0, 0x0}, 0x1}) github.com/ollama/ollama/model/model.go:113 +0x3c fp=0x1400004f650 sp=0x1400004f590 pc=0x102ef47bc github.com/ollama/ollama/runner/ollamarunner.(*Server).allocModel(0x140006601e0, {0x16d39ea2d?, 0x0?}, {0x0, 0x4, {0x10487faa0, 0x0, 0x0}, 0x1}, {0x0?, ...}, ...) github.com/ollama/ollama/runner/ollamarunner/runner.go:1201 +0x98 fp=0x1400004f700 sp=0x1400004f650 pc=0x102faaba8 github.com/ollama/ollama/runner/ollamarunner.(*Server).load(0x140006601e0, {0x103e9ac20, 0x140006303c0}, 0x1400045e000) github.com/ollama/ollama/runner/ollamarunner/runner.go:1310 +0x428 fp=0x1400004fa90 sp=0x1400004f700 pc=0x102fab668 github.com/ollama/ollama/runner/ollamarunner.(*Server).load-fm({0x103e9ac20?, 0x140006303c0?}, 0x140004d7b18?) <autogenerated>:1 +0x40 fp=0x1400004fac0 sp=0x1400004fa90 pc=0x102fad3f0 net/http.HandlerFunc.ServeHTTP(0x140005aad80?, {0x103e9ac20?, 0x140006303c0?}, 0x140004d7b00?) net/http/server.go:2322 +0x38 fp=0x1400004faf0 sp=0x1400004fac0 pc=0x102d7dfb8 net/http.(*ServeMux).ServeHTTP(0x10?, {0x103e9ac20, 0x140006303c0}, 0x1400045e000) net/http/server.go:2861 +0x190 fp=0x1400004fb40 sp=0x1400004faf0 pc=0x102d7fa50 net/http.serverHandler.ServeHTTP({0x103e974f0?}, {0x103e9ac20?, 0x140006303c0?}, 0x1?) net/http/server.go:3340 +0xb0 fp=0x1400004fb70 sp=0x1400004fb40 pc=0x102d9a240 net/http.(*conn).serve(0x1400028c6c0, {0x103e9d1c8, 0x14000690ff0}) net/http/server.go:2109 +0x528 fp=0x1400004ffa0 sp=0x1400004fb70 pc=0x102d7c3a8 net/http.(*Server).Serve.gowrap3() net/http/server.go:3493 +0x2c fp=0x1400004ffd0 sp=0x1400004ffa0 pc=0x102d816dc runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400004ffd0 sp=0x1400004ffd0 pc=0x102ae1d44 created by net/http.(*Server).Serve in goroutine 1 net/http/server.go:3493 +0x384 goroutine 1 gp=0x140000021c0 m=nil [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000c3f720 sp=0x14000c3f700 pc=0x102ad9a90 runtime.netpollblock(0x1400012d7b8?, 0x2b5bccc?, 0x1?) runtime/netpoll.go:575 +0x150 fp=0x14000c3f760 sp=0x14000c3f720 pc=0x102a9f660 internal/poll.runtime_pollWait(0x13021fe00, 0x72) runtime/netpoll.go:351 +0xa0 fp=0x14000c3f790 sp=0x14000c3f760 pc=0x102ad8cc0 internal/poll.(*pollDesc).wait(0x140001dd300?, 0x102b5dd5c?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x14000c3f7c0 sp=0x14000c3f790 pc=0x102b577c8 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0x140001dd300) internal/poll/fd_unix.go:613 +0x21c fp=0x14000c3f870 sp=0x14000c3f7c0 pc=0x102b5bdac net.(*netFD).accept(0x140001dd300) net/fd_unix.go:161 +0x28 fp=0x14000c3f930 sp=0x14000c3f870 pc=0x102bbdbc8 net.(*TCPListener).accept(0x14000624b40) net/tcpsock_posix.go:159 +0x24 fp=0x14000c3f980 sp=0x14000c3f930 pc=0x102bd1754 net.(*TCPListener).Accept(0x14000624b40) net/tcpsock.go:380 +0x2c fp=0x14000c3f9c0 sp=0x14000c3f980 pc=0x102bd07fc net/http.(*onceCloseListener).Accept(0x1400028c6c0?) <autogenerated>:1 +0x2c fp=0x14000c3f9e0 sp=0x14000c3f9c0 pc=0x102da61ec net/http.(*Server).Serve(0x14000125300, {0x103e9aa40, 0x14000624b40}) net/http/server.go:3463 +0x24c fp=0x14000c3fb10 sp=0x14000c3f9e0 pc=0x102d8137c github.com/ollama/ollama/runner/ollamarunner.Execute({0x140000340a0, 0x4, 0x4}) github.com/ollama/ollama/runner/ollamarunner/runner.go:1446 +0x7a4 fp=0x14000c3fce0 sp=0x14000c3fb10 pc=0x102face94 github.com/ollama/ollama/runner.Execute({0x14000034080?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:28 +0x184 fp=0x14000c3fd10 sp=0x14000c3fce0 pc=0x102ffe154 github.com/ollama/ollama/cmd.NewCLI.func3(0x14000125100?, {0x10396e566?, 0x4?, 0x10396e56a?}) github.com/ollama/ollama/cmd/cmd.go:1979 +0x50 fp=0x14000c3fd40 sp=0x14000c3fd10 pc=0x103682030 github.com/spf13/cobra.(*Command).execute(0x140004b3508, {0x14000595ea0, 0x5, 0x5}) github.com/spf13/cobra@v1.7.0/command.go:940 +0x5e0 fp=0x14000c3fe60 sp=0x14000c3fd40 pc=0x102c28cf0 github.com/spf13/cobra.(*Command).ExecuteC(0x140004a4f08) github.com/spf13/cobra@v1.7.0/command.go:1068 +0x2ec fp=0x14000c3ff20 sp=0x14000c3fe60 pc=0x102c293cc github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/cobra@v1.7.0/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x54 fp=0x14000c3ff40 sp=0x14000c3ff20 pc=0x103682b54 runtime.main() runtime/proc.go:285 +0x278 fp=0x14000c3ffd0 sp=0x14000c3ff40 pc=0x102aa6118 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000c3ffd0 sp=0x14000c3ffd0 pc=0x102ae1d44 goroutine 2 gp=0x14000002c40 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000072f90 sp=0x14000072f70 pc=0x102ad9a90 runtime.goparkunlock(...) runtime/proc.go:466 runtime.forcegchelper() runtime/proc.go:373 +0xb4 fp=0x14000072fd0 sp=0x14000072f90 pc=0x102aa6464 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000072fd0 sp=0x14000072fd0 pc=0x102ae1d44 created by runtime.init.7 in goroutine 1 runtime/proc.go:361 +0x24 goroutine 3 gp=0x14000003180 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000073760 sp=0x14000073740 pc=0x102ad9a90 runtime.goparkunlock(...) runtime/proc.go:466 runtime.bgsweep(0x1400007e000) runtime/mgcsweep.go:323 +0x104 fp=0x140000737b0 sp=0x14000073760 pc=0x102a90f24 runtime.gcenable.gowrap1() runtime/mgc.go:212 +0x28 fp=0x140000737d0 sp=0x140000737b0 pc=0x102a84b78 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140000737d0 sp=0x140000737d0 pc=0x102ae1d44 created by runtime.gcenable in goroutine 1 runtime/mgc.go:212 +0x6c goroutine 4 gp=0x14000003340 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x103b56028?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000073f60 sp=0x14000073f40 pc=0x102ad9a90 runtime.goparkunlock(...) runtime/proc.go:466 runtime.(*scavengerState).park(0x104838d80) runtime/mgcscavenge.go:425 +0x5c fp=0x14000073f90 sp=0x14000073f60 pc=0x102a8ea3c runtime.bgscavenge(0x1400007e000) runtime/mgcscavenge.go:658 +0xac fp=0x14000073fb0 sp=0x14000073f90 pc=0x102a8efdc runtime.gcenable.gowrap2() runtime/mgc.go:213 +0x28 fp=0x14000073fd0 sp=0x14000073fb0 pc=0x102a84b18 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000073fd0 sp=0x14000073fd0 pc=0x102ae1d44 created by runtime.gcenable in goroutine 1 runtime/mgc.go:213 +0xac goroutine 5 gp=0x14000003c00 m=nil [finalizer wait]: runtime.gopark(0x103d33f40?, 0x1e?, 0xc8?, 0x25?, 0x1000000000000?) runtime/proc.go:460 +0xc0 fp=0x14000072580 sp=0x14000072560 pc=0x102ad9a90 runtime.runFinalizers() runtime/mfinal.go:210 +0x104 fp=0x140000727d0 sp=0x14000072580 pc=0x102a83b64 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140000727d0 sp=0x140000727d0 pc=0x102ae1d44 created by runtime.createfing in goroutine 1 runtime/mfinal.go:172 +0x78 goroutine 6 gp=0x140001ec700 m=nil [cleanup wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000074740 sp=0x14000074720 pc=0x102ad9a90 runtime.goparkunlock(...) runtime/proc.go:466 runtime.(*cleanupQueue).dequeue(0x104839c60) runtime/mcleanup.go:439 +0x110 fp=0x14000074780 sp=0x14000074740 pc=0x102a81050 runtime.runCleanups() runtime/mcleanup.go:635 +0x40 fp=0x140000747d0 sp=0x14000074780 pc=0x102a81860 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140000747d0 sp=0x140000747d0 pc=0x102ae1d44 created by runtime.(*cleanupQueue).createGs in goroutine 1 runtime/mcleanup.go:589 +0x108 goroutine 7 gp=0x140001ece00 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000074f10 sp=0x14000074ef0 pc=0x102ad9a90 runtime.gcBgMarkWorker(0x140000458f0) runtime/mgc.go:1463 +0xe0 fp=0x14000074fb0 sp=0x14000074f10 pc=0x102a871f0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x14000074fd0 sp=0x14000074fb0 pc=0x102a870d8 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000074fd0 sp=0x14000074fd0 pc=0x102ae1d44 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 18 gp=0x14000504000 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006e710 sp=0x1400006e6f0 pc=0x102ad9a90 runtime.gcBgMarkWorker(0x140000458f0) runtime/mgc.go:1463 +0xe0 fp=0x1400006e7b0 sp=0x1400006e710 pc=0x102a871f0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400006e7d0 sp=0x1400006e7b0 pc=0x102a870d8 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006e7d0 sp=0x1400006e7d0 pc=0x102ae1d44 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 34 gp=0x14000102380 m=nil [GC worker (idle)]: runtime.gopark(0x105fb7031c983?, 0x3?, 0xba?, 0xd0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000118710 sp=0x140001186f0 pc=0x102ad9a90 runtime.gcBgMarkWorker(0x140000458f0) runtime/mgc.go:1463 +0xe0 fp=0x140001187b0 sp=0x14000118710 pc=0x102a871f0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x140001187d0 sp=0x140001187b0 pc=0x102a870d8 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140001187d0 sp=0x140001187d0 pc=0x102ae1d44 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 8 gp=0x140001ecfc0 m=nil [GC worker (idle)]: runtime.gopark(0x105fb702b414f?, 0x3?, 0xe1?, 0x9d?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000075710 sp=0x140000756f0 pc=0x102ad9a90 runtime.gcBgMarkWorker(0x140000458f0) runtime/mgc.go:1463 +0xe0 fp=0x140000757b0 sp=0x14000075710 pc=0x102a871f0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x140000757d0 sp=0x140000757b0 pc=0x102a870d8 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140000757d0 sp=0x140000757d0 pc=0x102ae1d44 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 19 gp=0x140005041c0 m=nil [GC worker (idle)]: runtime.gopark(0x105fb702b41f6?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x140004cdf10 sp=0x140004cdef0 pc=0x102ad9a90 runtime.gcBgMarkWorker(0x140000458f0) runtime/mgc.go:1463 +0xe0 fp=0x140004cdfb0 sp=0x140004cdf10 pc=0x102a871f0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x140004cdfd0 sp=0x140004cdfb0 pc=0x102a870d8 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140004cdfd0 sp=0x140004cdfd0 pc=0x102ae1d44 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 35 gp=0x14000102540 m=nil [GC worker (idle)]: runtime.gopark(0x104881fa0?, 0x3?, 0x2d?, 0x31?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000118f10 sp=0x14000118ef0 pc=0x102ad9a90 runtime.gcBgMarkWorker(0x140000458f0) runtime/mgc.go:1463 +0xe0 fp=0x14000118fb0 sp=0x14000118f10 pc=0x102a871f0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x14000118fd0 sp=0x14000118fb0 pc=0x102a870d8 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000118fd0 sp=0x14000118fd0 pc=0x102ae1d44 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 9 gp=0x140001ed180 m=nil [GC worker (idle)]: runtime.gopark(0x105fb702e3a61?, 0x3?, 0x97?, 0x4f?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000075f10 sp=0x14000075ef0 pc=0x102ad9a90 runtime.gcBgMarkWorker(0x140000458f0) runtime/mgc.go:1463 +0xe0 fp=0x14000075fb0 sp=0x14000075f10 pc=0x102a871f0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x14000075fd0 sp=0x14000075fb0 pc=0x102a870d8 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000075fd0 sp=0x14000075fd0 pc=0x102ae1d44 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 20 gp=0x14000504380 m=nil [GC worker (idle)]: runtime.gopark(0x104881fa0?, 0x1?, 0x58?, 0xf5?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400006f710 sp=0x1400006f6f0 pc=0x102ad9a90 runtime.gcBgMarkWorker(0x140000458f0) runtime/mgc.go:1463 +0xe0 fp=0x1400006f7b0 sp=0x1400006f710 pc=0x102a871f0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400006f7d0 sp=0x1400006f7b0 pc=0x102a870d8 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400006f7d0 sp=0x1400006f7d0 pc=0x102ae1d44 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 10 gp=0x140001ed340 m=nil [GC worker (idle)]: runtime.gopark(0x105fb7032b4b3?, 0x1?, 0xe4?, 0x45?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x14000114710 sp=0x140001146f0 pc=0x102ad9a90 runtime.gcBgMarkWorker(0x140000458f0) runtime/mgc.go:1463 +0xe0 fp=0x140001147b0 sp=0x14000114710 pc=0x102a871f0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x140001147d0 sp=0x140001147b0 pc=0x102a870d8 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x140001147d0 sp=0x140001147d0 pc=0x102ae1d44 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 36 gp=0x14000102a80 m=nil [GC worker (idle)]: runtime.gopark(0x105fb7032c32f?, 0x3?, 0x9b?, 0x66?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400008bf10 sp=0x1400008bef0 pc=0x102ad9a90 runtime.gcBgMarkWorker(0x140000458f0) runtime/mgc.go:1463 +0xe0 fp=0x1400008bfb0 sp=0x1400008bf10 pc=0x102a871f0 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1373 +0x28 fp=0x1400008bfd0 sp=0x1400008bfb0 pc=0x102a870d8 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400008bfd0 sp=0x1400008bfd0 pc=0x102ae1d44 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1373 +0x140 goroutine 11 gp=0x14000103180 m=nil [sync.WaitGroup.Wait]: runtime.gopark(0x10484d8c0?, 0x0?, 0x0?, 0x20?, 0x0?) runtime/proc.go:460 +0xc0 fp=0x1400008aa80 sp=0x1400008aa60 pc=0x102ad9a90 runtime.goparkunlock(...) runtime/proc.go:466 runtime.semacquire1(0x14000660298, 0x0, 0x1, 0x0, 0x19) runtime/sema.go:192 +0x204 fp=0x1400008aad0 sp=0x1400008aa80 pc=0x102aba844 sync.runtime_SemacquireWaitGroup(0x0?, 0x0?) runtime/sema.go:114 +0x38 fp=0x1400008ab10 sp=0x1400008aad0 pc=0x102adb468 sync.(*WaitGroup).Wait(0x14000660290) sync/waitgroup.go:206 +0xa8 fp=0x1400008ab40 sp=0x1400008ab10 pc=0x102aed7a8 github.com/ollama/ollama/runner/ollamarunner.(*Server).run(0x140006601e0, {0x103e9d200, 0x14000626b40}) github.com/ollama/ollama/runner/ollamarunner/runner.go:441 +0x38 fp=0x1400008afa0 sp=0x1400008ab40 pc=0x102fa57b8 github.com/ollama/ollama/runner/ollamarunner.Execute.gowrap1() github.com/ollama/ollama/runner/ollamarunner/runner.go:1423 +0x2c fp=0x1400008afd0 sp=0x1400008afa0 pc=0x102fad07c runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x1400008afd0 sp=0x1400008afd0 pc=0x102ae1d44 created by github.com/ollama/ollama/runner/ollamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/ollamarunner/runner.go:1423 +0x414 goroutine 50 gp=0x14000504540 m=nil [IO wait]: runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x102afd640?) runtime/proc.go:460 +0xc0 fp=0x14000116d80 sp=0x14000116d60 pc=0x102ad9a90 runtime.netpollblock(0x0?, 0x0?, 0x0?) runtime/netpoll.go:575 +0x150 fp=0x14000116dc0 sp=0x14000116d80 pc=0x102a9f660 internal/poll.runtime_pollWait(0x13021fc00, 0x72) runtime/netpoll.go:351 +0xa0 fp=0x14000116df0 sp=0x14000116dc0 pc=0x102ad8cc0 internal/poll.(*pollDesc).wait(0x140001dd380?, 0x14000624ba1?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x14000116e20 sp=0x14000116df0 pc=0x102b577c8 internal/poll.(*pollDesc).waitRead(...) internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0x140001dd380, {0x14000624ba1, 0x1, 0x1}) internal/poll/fd_unix.go:165 +0x1e0 fp=0x14000116ec0 sp=0x14000116e20 pc=0x102b589e0 net.(*netFD).Read(0x140001dd380, {0x14000624ba1?, 0x0?, 0x0?}) net/fd_posix.go:68 +0x28 fp=0x14000116f10 sp=0x14000116ec0 pc=0x102bbc3c8 net.(*conn).Read(0x140005a6450, {0x14000624ba1?, 0x0?, 0x0?}) net/net.go:196 +0x34 fp=0x14000116f60 sp=0x14000116f10 pc=0x102bc89a4 net/http.(*connReader).backgroundRead(0x14000624b80) net/http/server.go:702 +0x38 fp=0x14000116fb0 sp=0x14000116f60 pc=0x102d77418 net/http.(*connReader).startBackgroundRead.gowrap2() net/http/server.go:698 +0x28 fp=0x14000116fd0 sp=0x14000116fb0 pc=0x102d77308 runtime.goexit({}) runtime/asm_arm64.s:1268 +0x4 fp=0x14000116fd0 sp=0x14000116fd0 pc=0x102ae1d44 created by net/http.(*connReader).startBackgroundRead in goroutine 12 net/http/server.go:698 +0xb8 r0 0x0 r1 0x0 r2 0x0 r3 0x0 r4 0x18ff55a08 r5 0x1703d5d20 r6 0x36 r7 0x0 r8 0xf6eed3141e993b36 r9 0xf6eed3156ea44b36 r10 0x2 r11 0x10000000000 r12 0xfffffffd r13 0x0 r14 0x0 r15 0x0 r16 0x148 r17 0x1fe52c990 r18 0x0 r19 0x6 r20 0x1c03 r21 0x1703d70e0 r22 0x0 r23 0x0 r24 0x0 r25 0x1400005eee8 r26 0x103e84f50 r27 0x818 r28 0x140001028c0 r29 0x1703d6610 lr 0x19004d888 sp 0x1703d65f0 pc 0x1900135b0 fault 0x1900135b0 time=2026-02-08T12:59:07.406+01:00 level=ERROR source=server.go:1204 msg="do load request" error="Post \"http://127.0.0.1:62156/load\": EOF" time=2026-02-08T12:59:07.406+01:00 level=ERROR source=server.go:303 msg="llama runner terminated" error="exit status 2" time=2026-02-08T12:59:07.406+01:00 level=ERROR source=server.go:1204 msg="do load request" error="Post \"http://127.0.0.1:62156/load\": dial tcp 127.0.0.1:62156: connect: connection refused" time=2026-02-08T12:59:07.406+01:00 level=INFO source=sched.go:490 msg="Load failed" model=/Users/thomas/.ollama/models/blobs/sha256-3e4cb14174460404e7a233e531675303b2fbf7749c02f91864fe311ab6344e4f error="model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details" [GIN] 2026/02/08 - 12:59:07 | 500 | 1.821442917s | 127.0.0.1 | POST "/api/generate" ``` </details>
Author
Owner

@ksjoshi commented on GitHub (Feb 15, 2026):

I removed the Homebrew installation of Ollama and installed version 0.16.1 from the UI instead. That resolved the issue, and I can now run the model locally. It looks like the Homebrew formula isn’t updated to 0.16.1 yet.

<!-- gh-comment-id:3903133703 --> @ksjoshi commented on GitHub (Feb 15, 2026): I removed the Homebrew installation of Ollama and installed version 0.16.1 from the UI instead. That resolved the issue, and I can now run the model locally. It looks like the Homebrew formula isn’t updated to 0.16.1 yet.
Author
Owner

@Froszen commented on GitHub (Feb 15, 2026):

tank @ksjoshi it works now ! :)

<!-- gh-comment-id:3905294340 --> @Froszen commented on GitHub (Feb 15, 2026): tank @ksjoshi it works now ! :)
Author
Owner

@rcerf commented on GitHub (Mar 13, 2026):

Root Cause Analysis & MLX Workaround

Debugged extensively on M4 Max, macOS 26.0 beta, Homebrew Ollama 0.15.6→0.17.7.

Root Cause

The crash is in Apple's MetalPerformancePrimitives.framework, not in Ollama/ggml. The framework's MPPTensorOpsMatMul2dImpl.h has a static_assert that fails when mixing bfloat and half types in cooperative tensor matmul operations. This breaks ALL ggml Metal shader compilation on affected macOS versions.

What Doesn't Work

  • OLLAMA_LLM_LIBRARY=cpu — ggml still initializes Metal backend before falling back
  • GGML_METAL=0 / GGML_METAL_DISABLE=1 — same, Metal init happens regardless
  • MTL_DEVICE_USE_CPU=1 — no effect
  • num_gpu: 0 in model options — Metal library compile happens before GPU layer assignment
  • Tested across Ollama 0.15.6 through 0.17.7 (all Homebrew) — all fail identically

What Works

Fix 1: Official Ollama .app (per @ksjoshi above) — the .app bundle likely ships pre-compiled Metal libraries built against a working SDK, bypassing the broken local framework.

Fix 2: Apple MLX as drop-in backendMLX uses a completely different Metal code path that is NOT affected by the MetalPerformancePrimitives bug. I wrote an Ollama-compatible HTTP server backed by mlx-lm:

pip install mlx-lm
python3 mlx_server.py         # serves on port 11435
python3 mlx_server.py 11434   # or serve on Ollama's port directly

It maps Ollama model names to HuggingFace MLX-quantized models (e.g., qwen2.5:7bmlx-community/Qwen2.5-7B-Instruct-4bit). Supports /api/generate and /api/chat with streaming.

Gist: https://gist.github.com/rcerf/9b93a1d04dcfe1b1b08aa0c543dabf4c

Verified working with streaming inference on M4 Max, macOS 26.0 beta.

Diagnostic Signature

If you're hitting this, ollama serve stderr shows:

inference compute: id=cpu library=cpu  (no GPU detected)

followed by:

static_assert failed: "Input types must match cooperative tensor types"
ggml_metal_init: error: failed to initialize the Metal library
<!-- gh-comment-id:4052379928 --> @rcerf commented on GitHub (Mar 13, 2026): ## Root Cause Analysis & MLX Workaround Debugged extensively on M4 Max, macOS 26.0 beta, Homebrew Ollama 0.15.6→0.17.7. ### Root Cause The crash is in Apple's `MetalPerformancePrimitives.framework`, not in Ollama/ggml. The framework's `MPPTensorOpsMatMul2dImpl.h` has a `static_assert` that fails when mixing `bfloat` and `half` types in cooperative tensor matmul operations. This breaks ALL ggml Metal shader compilation on affected macOS versions. ### What Doesn't Work - `OLLAMA_LLM_LIBRARY=cpu` — ggml still initializes Metal backend before falling back - `GGML_METAL=0` / `GGML_METAL_DISABLE=1` — same, Metal init happens regardless - `MTL_DEVICE_USE_CPU=1` — no effect - `num_gpu: 0` in model options — Metal library compile happens before GPU layer assignment - Tested across Ollama 0.15.6 through 0.17.7 (all Homebrew) — all fail identically ### What Works **Fix 1: Official Ollama .app** (per @ksjoshi above) — the .app bundle likely ships pre-compiled Metal libraries built against a working SDK, bypassing the broken local framework. **Fix 2: Apple MLX as drop-in backend** — [MLX](https://github.com/ml-explore/mlx) uses a completely different Metal code path that is NOT affected by the `MetalPerformancePrimitives` bug. I wrote an Ollama-compatible HTTP server backed by `mlx-lm`: ```bash pip install mlx-lm python3 mlx_server.py # serves on port 11435 python3 mlx_server.py 11434 # or serve on Ollama's port directly ``` It maps Ollama model names to HuggingFace MLX-quantized models (e.g., `qwen2.5:7b` → `mlx-community/Qwen2.5-7B-Instruct-4bit`). Supports `/api/generate` and `/api/chat` with streaming. **Gist: https://gist.github.com/rcerf/9b93a1d04dcfe1b1b08aa0c543dabf4c** Verified working with streaming inference on M4 Max, macOS 26.0 beta. ### Diagnostic Signature If you're hitting this, `ollama serve` stderr shows: ``` inference compute: id=cpu library=cpu (no GPU detected) ``` followed by: ``` static_assert failed: "Input types must match cooperative tensor types" ggml_metal_init: error: failed to initialize the Metal library ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#70940