[GH-ISSUE #13896] Support for Macbook m5 for image generation (Unable to load kernel affine_qmm…bfloat16…) #55605

Closed
opened 2026-04-29 09:28:30 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @anthropoleo on GitHub (Jan 25, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/13896

What is the issue?

Image generation fails on Apple M5 when using the Ollama image runner with model x/flux2-klein:4b. The image runner starts successfully, loads the model, then fails during “Evaluating setup” with an MLX Metal kernel load error:

MLX error: [metal::Device] Unable to load kernel affine_qmm_t_nax_bfloat16_t_gs_32_b_4_bm64_bn64_bk64_wm2_wn2_alN_true_batch_0

The request returns HTTP 500 from POST /api/generate.

Environment
• Hardware: Apple M5 (Metal)
• GPU detected: Metal description="Apple M5" total VRAM 17.8 GiB
• Ollama version: 0.15.0
• Host: macOS app bundle (/Applications/Ollama.app/...)
• Model: x/flux2-klein:4b
• Image output size: 1024x1024
• Runner backend: MLX (Metal)

Steps to reproduce
1. Start Ollama (macOS app)
2. Confirm server is listening on 127.0.0.1:11434
3. Trigger image generation with x/flux2-klein:4b (for example via POST /api/generate)
4. Observe request fails with 500, and the image-runner logs show the MLX Metal kernel load error.

Generation fails quickly during setup. Server returns HTTP 500. Image runner logs show MLX Metal kernel load error for a bfloat16 affine_qmm kernel.

Relevant log output

time=2026-01-24T20:10:06.840+10:00 level=INFO source=routes.go:1631 msg="server config" env="map[... OLLAMA_HOST:http://127.0.0.1:11434 ...]"
time=2026-01-24T20:10:06.880+10:00 level=INFO source=routes.go:1684 msg="Listening on 127.0.0.1:11434 (version 0.15.0)"
time=2026-01-24T20:10:06.880+10:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-01-24T20:10:12.803+10:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=Metal compute=0.0 name=Metal description="Apple M5" libdirs="" driver=0.0 pci_id="" type=discrete total="17.8 GiB" available="17.8 GiB"
time=2026-01-24T20:10:12.803+10:00 level=INFO source=routes.go:1725 msg="entering low vram mode" "total vram"="17.8 GiB" threshold="20.0 GiB"

[GIN] 2026/01/25 - 13:02:06 | 200 |   50.156375ms | 127.0.0.1 | POST "/api/show"
time=2026-01-25T13:02:06.798+10:00 level=INFO source=server.go:143 msg="starting image runner subprocess" exe=/Applications/Ollama.app/Contents/Resources/ollama model=x/flux2-klein:4b port=55359
time=2026-01-25T13:02:06.831+10:00 level=WARN source=server.go:136 msg=image-runner msg="2026/01/25 13:02:06 runner.go:87: INFO MLX library initialized"
time=2026-01-25T13:02:06.833+10:00 level=WARN source=server.go:136 msg=image-runner msg="2026/01/25 13:02:06 runner.go:92: INFO detected model type type=Flux2KleinPipeline"
time=2026-01-25T13:02:06.833+10:00 level=INFO source=server.go:129 msg=image-runner msg="Loading FLUX.2 Klein model from manifest: x/flux2-klein:4b..."
time=2026-01-25T13:02:06.973+10:00 level=INFO source=server.go:129 msg=image-runner msg="  Loading tokenizer... ✓"
time=2026-01-25T13:02:07.798+10:00 level=INFO source=server.go:129 msg=image-runner msg="  Loading text encoder... ✓"
time=2026-01-25T13:02:08.508+10:00 level=INFO source=server.go:129 msg=image-runner msg="  Loading transformer... ✓"
time=2026-01-25T13:02:08.587+10:00 level=INFO source=server.go:129 msg=image-runner msg="  Loading VAE... ✓"
time=2026-01-25T13:02:08.587+10:00 level=INFO source=server.go:129 msg=image-runner msg="  Evaluating weights... ✓"
time=2026-01-25T13:02:08.587+10:00 level=INFO source=server.go:129 msg=image-runner msg="  Loaded in 1.75s (5.3 GB VRAM)"
time=2026-01-25T13:02:08.588+10:00 level=WARN source=server.go:136 msg=image-runner msg="2026/01/25 13:02:08 runner.go:139: INFO image runner listening addr=127.0.0.1:55359"
time=2026-01-25T13:02:08.600+10:00 level=INFO source=server.go:214 msg="image runner is ready" port=55359
[GIN] 2026/01/25 - 13:02:08 | 200 | 1.843952958s | 127.0.0.1 | POST "/api/generate"

time=2026-01-25T13:02:20.600+10:00 level=INFO source=server.go:129 msg=image-runner msg="  Output: 1024x1024"
time=2026-01-25T13:02:20.603+10:00 level=INFO source=server.go:129 msg=image-runner msg="  Encoding prompt... ✓"
time=2026-01-25T13:02:20.674+10:00 level=INFO source=server.go:129 msg=image-runner msg="  Evaluating setup... MLX error: [metal::Device] Unable to load kernel affine_qmm_t_nax_bfloat16_t_gs_32_b_4_bm64_bn64_bk64_wm2_wn2_alN_true_batch_0"
time=2026-01-25T13:02:20.674+10:00 level=INFO source=server.go:129 msg=image-runner msg=" at /Users/runner/work/ollama/ollama/build/_deps/mlx-c-src/mlx/c/transforms.cpp:73"
[GIN] 2026/01/25 - 13:02:20 | 500 | 263.07925ms | 127.0.0.1 | POST "/api/generate"

OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @anthropoleo on GitHub (Jan 25, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/13896 ### What is the issue? Image generation fails on Apple M5 when using the Ollama image runner with model x/flux2-klein:4b. The image runner starts successfully, loads the model, then fails during “Evaluating setup” with an MLX Metal kernel load error: MLX error: [metal::Device] Unable to load kernel affine_qmm_t_nax_bfloat16_t_gs_32_b_4_bm64_bn64_bk64_wm2_wn2_alN_true_batch_0 The request returns HTTP 500 from POST /api/generate. Environment • Hardware: Apple M5 (Metal) • GPU detected: Metal description="Apple M5" total VRAM 17.8 GiB • Ollama version: 0.15.0 • Host: macOS app bundle (/Applications/Ollama.app/...) • Model: x/flux2-klein:4b • Image output size: 1024x1024 • Runner backend: MLX (Metal) Steps to reproduce 1. Start Ollama (macOS app) 2. Confirm server is listening on 127.0.0.1:11434 3. Trigger image generation with x/flux2-klein:4b (for example via POST /api/generate) 4. Observe request fails with 500, and the image-runner logs show the MLX Metal kernel load error. Generation fails quickly during setup. Server returns HTTP 500. Image runner logs show MLX Metal kernel load error for a bfloat16 affine_qmm kernel. ### Relevant log output ```shell time=2026-01-24T20:10:06.840+10:00 level=INFO source=routes.go:1631 msg="server config" env="map[... OLLAMA_HOST:http://127.0.0.1:11434 ...]" time=2026-01-24T20:10:06.880+10:00 level=INFO source=routes.go:1684 msg="Listening on 127.0.0.1:11434 (version 0.15.0)" time=2026-01-24T20:10:06.880+10:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-01-24T20:10:12.803+10:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=Metal compute=0.0 name=Metal description="Apple M5" libdirs="" driver=0.0 pci_id="" type=discrete total="17.8 GiB" available="17.8 GiB" time=2026-01-24T20:10:12.803+10:00 level=INFO source=routes.go:1725 msg="entering low vram mode" "total vram"="17.8 GiB" threshold="20.0 GiB" [GIN] 2026/01/25 - 13:02:06 | 200 | 50.156375ms | 127.0.0.1 | POST "/api/show" time=2026-01-25T13:02:06.798+10:00 level=INFO source=server.go:143 msg="starting image runner subprocess" exe=/Applications/Ollama.app/Contents/Resources/ollama model=x/flux2-klein:4b port=55359 time=2026-01-25T13:02:06.831+10:00 level=WARN source=server.go:136 msg=image-runner msg="2026/01/25 13:02:06 runner.go:87: INFO MLX library initialized" time=2026-01-25T13:02:06.833+10:00 level=WARN source=server.go:136 msg=image-runner msg="2026/01/25 13:02:06 runner.go:92: INFO detected model type type=Flux2KleinPipeline" time=2026-01-25T13:02:06.833+10:00 level=INFO source=server.go:129 msg=image-runner msg="Loading FLUX.2 Klein model from manifest: x/flux2-klein:4b..." time=2026-01-25T13:02:06.973+10:00 level=INFO source=server.go:129 msg=image-runner msg=" Loading tokenizer... ✓" time=2026-01-25T13:02:07.798+10:00 level=INFO source=server.go:129 msg=image-runner msg=" Loading text encoder... ✓" time=2026-01-25T13:02:08.508+10:00 level=INFO source=server.go:129 msg=image-runner msg=" Loading transformer... ✓" time=2026-01-25T13:02:08.587+10:00 level=INFO source=server.go:129 msg=image-runner msg=" Loading VAE... ✓" time=2026-01-25T13:02:08.587+10:00 level=INFO source=server.go:129 msg=image-runner msg=" Evaluating weights... ✓" time=2026-01-25T13:02:08.587+10:00 level=INFO source=server.go:129 msg=image-runner msg=" Loaded in 1.75s (5.3 GB VRAM)" time=2026-01-25T13:02:08.588+10:00 level=WARN source=server.go:136 msg=image-runner msg="2026/01/25 13:02:08 runner.go:139: INFO image runner listening addr=127.0.0.1:55359" time=2026-01-25T13:02:08.600+10:00 level=INFO source=server.go:214 msg="image runner is ready" port=55359 [GIN] 2026/01/25 - 13:02:08 | 200 | 1.843952958s | 127.0.0.1 | POST "/api/generate" time=2026-01-25T13:02:20.600+10:00 level=INFO source=server.go:129 msg=image-runner msg=" Output: 1024x1024" time=2026-01-25T13:02:20.603+10:00 level=INFO source=server.go:129 msg=image-runner msg=" Encoding prompt... ✓" time=2026-01-25T13:02:20.674+10:00 level=INFO source=server.go:129 msg=image-runner msg=" Evaluating setup... MLX error: [metal::Device] Unable to load kernel affine_qmm_t_nax_bfloat16_t_gs_32_b_4_bm64_bn64_bk64_wm2_wn2_alN_true_batch_0" time=2026-01-25T13:02:20.674+10:00 level=INFO source=server.go:129 msg=image-runner msg=" at /Users/runner/work/ollama/ollama/build/_deps/mlx-c-src/mlx/c/transforms.cpp:73" [GIN] 2026/01/25 - 13:02:20 | 500 | 263.07925ms | 127.0.0.1 | POST "/api/generate" ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the bug label 2026-04-29 09:28:30 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#55605