[GH-ISSUE #14046] FLUX.2 Klein MLX error #34938

Open
opened 2026-04-22 18:57:07 -05:00 by GiteaMirror · 18 comments
Owner

Originally created by @kamenik on GitHub (Feb 3, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14046

What is the issue?

Ollama crashes when creating image using Flux.2 Klein model. x/z-image-turbo is ok.

ollama run x/flux-klein:9b "a cat holding a sign that says hello world"

MLX error: [matmul] Last dimension of first input with shape (1,512,2560) must match second to last dimension of second input with shape (384,3072).

Ollama version
0.15.4

OS / Distro
Ubuntu 24.04 LTS (kernel 6.14.0-37-generic #37~24.04.1-Ubuntu)

GPU / Driver / CUDA
NVIDIA RTX PRO 6000 Blackwell
Driver Version: 580.126.09
CUDA Version: 13.0

Relevant log output


OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

0.15.4

Originally created by @kamenik on GitHub (Feb 3, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14046 ### What is the issue? Ollama crashes when creating image using Flux.2 Klein model. x/z-image-turbo is ok. ``` ollama run x/flux-klein:9b "a cat holding a sign that says hello world" ``` MLX error: [matmul] Last dimension of first input with shape (1,512,2560) must match second to last dimension of second input with shape (384,3072). Ollama version 0.15.4 OS / Distro Ubuntu 24.04 LTS (kernel 6.14.0-37-generic #37~24.04.1-Ubuntu) GPU / Driver / CUDA NVIDIA RTX PRO 6000 Blackwell Driver Version: 580.126.09 CUDA Version: 13.0 ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU AMD ### Ollama version 0.15.4
GiteaMirror added the bug label 2026-04-22 18:57:07 -05:00
Author
Owner

@Digit-al commented on GitHub (Feb 3, 2026):

From https://ollama.com/blog/image-generation I understand that it's currently working only on macOS.
But I am interested to know when it's planned to be delivered for Linux !

<!-- gh-comment-id:3844274483 --> @Digit-al commented on GitHub (Feb 3, 2026): From https://ollama.com/blog/image-generation I understand that it's currently working only on macOS. But I am interested to know when it's planned to be delivered for Linux !
Author
Owner

@rick-github commented on GitHub (Feb 4, 2026):

It's the fp4 and fp8 quants that don't work on Linux, the bf16 (both 4b and 9b) quants work fine.

<!-- gh-comment-id:3844637369 --> @rick-github commented on GitHub (Feb 4, 2026): It's the fp4 and fp8 quants that don't work on Linux, the bf16 (both 4b and 9b) quants work fine.
Author
Owner

@mircomir commented on GitHub (Feb 4, 2026):

It's the fp4 and fp8 quants that don't work on Linux, the bf16 (both 4b and 9b) quants work fine.

I'm not sure if I interpreted what you said correctly, but it doesn't work for me on Linux with the models currently present on Ollama.

ollama run x/flux2-klein:4b "a cat"
Error: 500 Internal Server Error: image runner exited unexpectedly: exit status 255

ollama run x/z-image-turbo:bf16 "a cat"
Error: 500 Internal Server Error: image runner exited unexpectedly: exit status 255

It doesn't work on Windows either.

ollama run x/flux2-klein:9b "a cat"
Error: 500 Internal Server Error: image runner failed: Error: image generation not available: build with mlx tag (exit: exit status 1)

ollama run x/z-image-turbo:bf16 "a cat"
Error: 500 Internal Server Error: image runner failed: Error: image generation not available: build with mlx tag (exit: exit status 1)

On macOS, however, it works correctly as advertised. The problem is that I have a decent card (NVidia) on my PC :)
I therefore assume that image generation support, as announced, is not yet available on Windows and Linux.

<!-- gh-comment-id:3845783705 --> @mircomir commented on GitHub (Feb 4, 2026): > It's the fp4 and fp8 quants that don't work on Linux, the bf16 (both 4b and 9b) quants work fine. I'm not sure if I interpreted what you said correctly, but it doesn't work for me on Linux with the models currently present on Ollama. `ollama run x/flux2-klein:4b "a cat"` Error: 500 Internal Server Error: image runner exited unexpectedly: exit status 255 `ollama run x/z-image-turbo:bf16 "a cat"` Error: 500 Internal Server Error: image runner exited unexpectedly: exit status 255 It doesn't work on Windows either. `ollama run x/flux2-klein:9b "a cat"` Error: 500 Internal Server Error: image runner failed: Error: image generation not available: build with mlx tag (exit: exit status 1) `ollama run x/z-image-turbo:bf16 "a cat"` Error: 500 Internal Server Error: image runner failed: Error: image generation not available: build with mlx tag (exit: exit status 1) On macOS, however, it works correctly as advertised. The problem is that I have a decent card (NVidia) on my PC :) I therefore assume that image generation support, as announced, is not yet available on Windows and Linux.
Author
Owner

@Digit-al commented on GitHub (Feb 4, 2026):

I observe the same and didn't reported it as a bug as it's announced working on macOS only for now.
I have a RTX3090

<!-- gh-comment-id:3845983539 --> @Digit-al commented on GitHub (Feb 4, 2026): I observe the same and didn't reported it as a bug as it's announced working on macOS only for now. I have a RTX3090
Author
Owner

@mircomir commented on GitHub (Feb 4, 2026):

I have a RTX3090

Mine is NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition. If you need any more info or tests, let me know.

<!-- gh-comment-id:3846140313 --> @mircomir commented on GitHub (Feb 4, 2026): > I have a RTX3090 Mine is NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition. If you need any more info or tests, let me know.
Author
Owner

@rick-github commented on GitHub (Feb 4, 2026):

I therefore assume that image generation support, as announced, is not yet available on Windows and Linux.

Windows support is pending work on Windows MLX support. The Linux releases include MLX support, but there are some libraries and includes missing which prevent the release from working out of the box. If you are using docker, you can create a new container that includes the missing pieces. If you are running on bare metal, you need to upgrade to CUDA v13, install the Nvidia dev toolkit, install libquadmath0, and run sudo bash -c 'cd /usr/local/cuda && ln -s cccl/cuda include/cuda'.

<!-- gh-comment-id:3846867534 --> @rick-github commented on GitHub (Feb 4, 2026): > I therefore assume that image generation support, as announced, is not yet available on Windows and Linux. Windows support is pending work on Windows MLX support. The Linux releases include MLX support, but there are some libraries and includes missing which prevent the release from working out of the box. If you are using docker, you can [create a new container](https://github.com/ollama/ollama/issues/14016#issuecomment-3831904450) that includes the missing pieces. If you are running on bare metal, you need to upgrade to CUDA v13, install the [Nvidia dev toolkit]( https://developer.nvidia.com/cuda-13-0-0-download-archive), install libquadmath0, and run `sudo bash -c 'cd /usr/local/cuda && ln -s cccl/cuda include/cuda'`.
Author
Owner

@Digit-al commented on GitHub (Feb 4, 2026):

What's the memory requirement ? I get OOM:

root@ollama:/usr/include# ollama run x/z-image-turbo:latest
Error: failed to load model: 500 Internal Server Error: image runner failed: 2026/02/04 11:34:47 runner.go:92: INFO detected model type type=ZImagePipeline (exit: exit status 255)
root@ollama:/usr/include# journalctl -xeu ollama
Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.117Z level=INFO source=routes.go:1631 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:16384 OLLAMA_>
Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.251Z level=INFO source=images.go:473 msg="total blobs: 3950"
Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.265Z level=INFO source=images.go:480 msg="total unused blobs removed: 0"
Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.265Z level=INFO source=routes.go:1684 msg="Listening on 127.0.0.1:11434 (version 0.15.4)"
Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.266Z level=INFO source=runner.go:67 msg="discovering available GPUs..."
Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.266Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 35243"
Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.547Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 36855"
Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.796Z level=INFO source=runner.go:106 msg="experimental Vulkan support disabled.  To enable, set OLLAMA_VULKAN=1"
Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.797Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 35513"
Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.797Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 39361"
Feb 04 11:34:44 ollama ollama[570825]: time=2026-02-04T11:34:44.084Z level=INFO source=types.go:42 msg="inference compute" id=GPU-72b72b31-1eb2-548f-3d9c-87584844b4db filter_id="" library=CUDA compute=8.6 name=CUDA0 description="NVIDIA GeForce RTX 3090" libdirs=ollama,cuda_v13 driver=13.1 pci_id=0000:00:10.0 type=>
Feb 04 11:34:46 ollama ollama[570825]: [GIN] 2026/02/04 - 11:34:46 | 200 |      92.423µs |       127.0.0.1 | HEAD     "/"
Feb 04 11:34:46 ollama ollama[570825]: [GIN] 2026/02/04 - 11:34:46 | 200 |   92.111652ms |       127.0.0.1 | POST     "/api/show"
Feb 04 11:34:46 ollama ollama[570825]: time=2026-02-04T11:34:46.959Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 36927"
Feb 04 11:34:47 ollama ollama[570825]: time=2026-02-04T11:34:47.224Z level=INFO source=server.go:143 msg="starting image runner subprocess" exe=/usr/local/bin/ollama model=x/z-image-turbo:latest port=43599
Feb 04 11:34:47 ollama ollama[570825]: time=2026-02-04T11:34:47.703Z level=WARN source=server.go:136 msg=image-runner msg="2026/02/04 11:34:47 runner.go:87: INFO MLX library initialized"
Feb 04 11:34:47 ollama ollama[570825]: time=2026-02-04T11:34:47.703Z level=WARN source=server.go:136 msg=image-runner msg="2026/02/04 11:34:47 runner.go:88: INFO starting image runner model=x/z-image-turbo:latest port=43599"
Feb 04 11:34:47 ollama ollama[570825]: time=2026-02-04T11:34:47.710Z level=INFO source=server.go:129 msg=image-runner msg="Loading Z-Image model from manifest: x/z-image-turbo:latest..."
Feb 04 11:34:47 ollama ollama[570825]: time=2026-02-04T11:34:47.710Z level=WARN source=server.go:136 msg=image-runner msg="2026/02/04 11:34:47 runner.go:92: INFO detected model type type=ZImagePipeline"
Feb 04 11:34:48 ollama ollama[570825]: time=2026-02-04T11:34:48.229Z level=INFO source=server.go:129 msg=image-runner msg="  Loading tokenizer... ✓"
Feb 04 11:34:52 ollama ollama[570825]: time=2026-02-04T11:34:52.127Z level=INFO source=server.go:129 msg=image-runner msg="  Loading text encoder... ✓"
Feb 04 11:34:53 ollama ollama[570825]: time=2026-02-04T11:34:53.477Z level=INFO source=server.go:129 msg=image-runner msg="  (11.3 GB, peak 11.3 GB)"
Feb 04 11:34:59 ollama ollama[570825]: time=2026-02-04T11:34:59.375Z level=INFO source=server.go:129 msg=image-runner msg="  Loading transformer... ✓"
Feb 04 11:35:04 ollama ollama[570825]: time=2026-02-04T11:35:04.122Z level=INFO source=server.go:129 msg=image-runner msg="MLX error: cudaMallocAsync(&data, size, stream) failed: out of memory at /go/src/github.com/ollama/ollama/build/_deps/mlx-c-src/mlx/c/transforms.cpp:73"
Feb 04 11:35:04 ollama ollama[570825]: time=2026-02-04T11:35:04.449Z level=INFO source=server.go:321 msg="stopping image runner subprocess" pid=570920
Feb 04 11:35:09 ollama ollama[570825]: [GIN] 2026/02/04 - 11:35:09 | 500 | 22.630603139s |       127.0.0.1 | POST     "/api/generate"
root@ollama:/usr/include# nvidia-smi
Wed Feb  4 11:35:57 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 590.48.01              Driver Version: 590.48.01      CUDA Version: 13.1     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        On  |   00000000:00:10.0 Off |                  N/A |
|  0%   32C    P8             17W /  370W |       1MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
root@ollama:/usr/include# free
               total        used        free      shared  buff/cache   available
Mem:       197855708     5862188    27617128        1324   166156660   191993520
Swap:        8388604           0     8388604
<!-- gh-comment-id:3846945502 --> @Digit-al commented on GitHub (Feb 4, 2026): What's the memory requirement ? I get OOM: ``` root@ollama:/usr/include# ollama run x/z-image-turbo:latest Error: failed to load model: 500 Internal Server Error: image runner failed: 2026/02/04 11:34:47 runner.go:92: INFO detected model type type=ZImagePipeline (exit: exit status 255) root@ollama:/usr/include# journalctl -xeu ollama Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.117Z level=INFO source=routes.go:1631 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:16384 OLLAMA_> Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.251Z level=INFO source=images.go:473 msg="total blobs: 3950" Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.265Z level=INFO source=images.go:480 msg="total unused blobs removed: 0" Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.265Z level=INFO source=routes.go:1684 msg="Listening on 127.0.0.1:11434 (version 0.15.4)" Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.266Z level=INFO source=runner.go:67 msg="discovering available GPUs..." Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.266Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 35243" Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.547Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 36855" Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.796Z level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1" Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.797Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 35513" Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.797Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 39361" Feb 04 11:34:44 ollama ollama[570825]: time=2026-02-04T11:34:44.084Z level=INFO source=types.go:42 msg="inference compute" id=GPU-72b72b31-1eb2-548f-3d9c-87584844b4db filter_id="" library=CUDA compute=8.6 name=CUDA0 description="NVIDIA GeForce RTX 3090" libdirs=ollama,cuda_v13 driver=13.1 pci_id=0000:00:10.0 type=> Feb 04 11:34:46 ollama ollama[570825]: [GIN] 2026/02/04 - 11:34:46 | 200 | 92.423µs | 127.0.0.1 | HEAD "/" Feb 04 11:34:46 ollama ollama[570825]: [GIN] 2026/02/04 - 11:34:46 | 200 | 92.111652ms | 127.0.0.1 | POST "/api/show" Feb 04 11:34:46 ollama ollama[570825]: time=2026-02-04T11:34:46.959Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 36927" Feb 04 11:34:47 ollama ollama[570825]: time=2026-02-04T11:34:47.224Z level=INFO source=server.go:143 msg="starting image runner subprocess" exe=/usr/local/bin/ollama model=x/z-image-turbo:latest port=43599 Feb 04 11:34:47 ollama ollama[570825]: time=2026-02-04T11:34:47.703Z level=WARN source=server.go:136 msg=image-runner msg="2026/02/04 11:34:47 runner.go:87: INFO MLX library initialized" Feb 04 11:34:47 ollama ollama[570825]: time=2026-02-04T11:34:47.703Z level=WARN source=server.go:136 msg=image-runner msg="2026/02/04 11:34:47 runner.go:88: INFO starting image runner model=x/z-image-turbo:latest port=43599" Feb 04 11:34:47 ollama ollama[570825]: time=2026-02-04T11:34:47.710Z level=INFO source=server.go:129 msg=image-runner msg="Loading Z-Image model from manifest: x/z-image-turbo:latest..." Feb 04 11:34:47 ollama ollama[570825]: time=2026-02-04T11:34:47.710Z level=WARN source=server.go:136 msg=image-runner msg="2026/02/04 11:34:47 runner.go:92: INFO detected model type type=ZImagePipeline" Feb 04 11:34:48 ollama ollama[570825]: time=2026-02-04T11:34:48.229Z level=INFO source=server.go:129 msg=image-runner msg=" Loading tokenizer... ✓" Feb 04 11:34:52 ollama ollama[570825]: time=2026-02-04T11:34:52.127Z level=INFO source=server.go:129 msg=image-runner msg=" Loading text encoder... ✓" Feb 04 11:34:53 ollama ollama[570825]: time=2026-02-04T11:34:53.477Z level=INFO source=server.go:129 msg=image-runner msg=" (11.3 GB, peak 11.3 GB)" Feb 04 11:34:59 ollama ollama[570825]: time=2026-02-04T11:34:59.375Z level=INFO source=server.go:129 msg=image-runner msg=" Loading transformer... ✓" Feb 04 11:35:04 ollama ollama[570825]: time=2026-02-04T11:35:04.122Z level=INFO source=server.go:129 msg=image-runner msg="MLX error: cudaMallocAsync(&data, size, stream) failed: out of memory at /go/src/github.com/ollama/ollama/build/_deps/mlx-c-src/mlx/c/transforms.cpp:73" Feb 04 11:35:04 ollama ollama[570825]: time=2026-02-04T11:35:04.449Z level=INFO source=server.go:321 msg="stopping image runner subprocess" pid=570920 Feb 04 11:35:09 ollama ollama[570825]: [GIN] 2026/02/04 - 11:35:09 | 500 | 22.630603139s | 127.0.0.1 | POST "/api/generate" root@ollama:/usr/include# nvidia-smi Wed Feb 4 11:35:57 2026 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 590.48.01 Driver Version: 590.48.01 CUDA Version: 13.1 | +-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 3090 On | 00000000:00:10.0 Off | N/A | | 0% 32C P8 17W / 370W | 1MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+ root@ollama:/usr/include# free total used free shared buff/cache available Mem: 197855708 5862188 27617128 1324 166156660 191993520 Swap: 8388604 0 8388604 ```
Author
Owner

@Digit-al commented on GitHub (Feb 4, 2026):

and with flux2, no OOM but:

root@ollama:/usr/include# ollama run x/flux2-klein:9b "a cat"
Error: 500 Internal Server Error: Post "http://127.0.0.1:33573/completion": EOF

Feb 04 11:41:47 ollama ollama[570825]: time=2026-02-04T11:41:47.378Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 33987"
Feb 04 11:41:47 ollama ollama[570825]: time=2026-02-04T11:41:47.671Z level=INFO source=server.go:143 msg="starting image runner subprocess" exe=/usr/local/bin/ollama model=x/flux2-klein:9b port=33573
Feb 04 11:41:48 ollama ollama[570825]: time=2026-02-04T11:41:48.136Z level=WARN source=server.go:136 msg=image-runner msg="2026/02/04 11:41:48 runner.go:87: INFO MLX library initialized"
Feb 04 11:41:48 ollama ollama[570825]: time=2026-02-04T11:41:48.136Z level=WARN source=server.go:136 msg=image-runner msg="2026/02/04 11:41:48 runner.go:88: INFO starting image runner model=x/flux2-klein:9b port=33573"
Feb 04 11:41:48 ollama ollama[570825]: time=2026-02-04T11:41:48.144Z level=WARN source=server.go:136 msg=image-runner msg="2026/02/04 11:41:48 runner.go:92: INFO detected model type type=Flux2KleinPipeline"
Feb 04 11:41:48 ollama ollama[570825]: time=2026-02-04T11:41:48.144Z level=INFO source=server.go:129 msg=image-runner msg="Loading FLUX.2 Klein model from manifest: x/flux2-klein:9b..."
Feb 04 11:41:48 ollama ollama[570825]: time=2026-02-04T11:41:48.633Z level=INFO source=server.go:129 msg=image-runner msg="  Loading tokenizer... ✓"
Feb 04 11:41:52 ollama ollama[570825]: time=2026-02-04T11:41:52.573Z level=INFO source=server.go:129 msg=image-runner msg="  Loading text encoder... ✓"
Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.160Z level=INFO source=server.go:129 msg=image-runner msg="  Loading transformer... ✓"
Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.354Z level=INFO source=server.go:129 msg=image-runner msg="  Loading VAE... ✓"
Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.356Z level=INFO source=server.go:129 msg=image-runner msg="  Evaluating weights... ✓"
Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.356Z level=INFO source=server.go:129 msg=image-runner msg="  Loaded in 8.21s (11.1 GB VRAM)"
Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.356Z level=WARN source=server.go:136 msg=image-runner msg="2026/02/04 11:41:56 runner.go:139: INFO image runner listening addr=127.0.0.1:33573"
Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.373Z level=INFO source=server.go:214 msg="image runner is ready" port=33573
Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.374Z level=INFO source=server.go:129 msg=image-runner msg="  Output: 1024x1024"
Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.394Z level=INFO source=server.go:129 msg=image-runner msg="  Encoding prompt... MLX error: [matmul] Last dimension of first input with shape (1,512,4096) must match second to last dimension of second input with shape (512,36864). at /go/src/github.com>

<!-- gh-comment-id:3846961576 --> @Digit-al commented on GitHub (Feb 4, 2026): and with flux2, no OOM but: ``` root@ollama:/usr/include# ollama run x/flux2-klein:9b "a cat" Error: 500 Internal Server Error: Post "http://127.0.0.1:33573/completion": EOF Feb 04 11:41:47 ollama ollama[570825]: time=2026-02-04T11:41:47.378Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 33987" Feb 04 11:41:47 ollama ollama[570825]: time=2026-02-04T11:41:47.671Z level=INFO source=server.go:143 msg="starting image runner subprocess" exe=/usr/local/bin/ollama model=x/flux2-klein:9b port=33573 Feb 04 11:41:48 ollama ollama[570825]: time=2026-02-04T11:41:48.136Z level=WARN source=server.go:136 msg=image-runner msg="2026/02/04 11:41:48 runner.go:87: INFO MLX library initialized" Feb 04 11:41:48 ollama ollama[570825]: time=2026-02-04T11:41:48.136Z level=WARN source=server.go:136 msg=image-runner msg="2026/02/04 11:41:48 runner.go:88: INFO starting image runner model=x/flux2-klein:9b port=33573" Feb 04 11:41:48 ollama ollama[570825]: time=2026-02-04T11:41:48.144Z level=WARN source=server.go:136 msg=image-runner msg="2026/02/04 11:41:48 runner.go:92: INFO detected model type type=Flux2KleinPipeline" Feb 04 11:41:48 ollama ollama[570825]: time=2026-02-04T11:41:48.144Z level=INFO source=server.go:129 msg=image-runner msg="Loading FLUX.2 Klein model from manifest: x/flux2-klein:9b..." Feb 04 11:41:48 ollama ollama[570825]: time=2026-02-04T11:41:48.633Z level=INFO source=server.go:129 msg=image-runner msg=" Loading tokenizer... ✓" Feb 04 11:41:52 ollama ollama[570825]: time=2026-02-04T11:41:52.573Z level=INFO source=server.go:129 msg=image-runner msg=" Loading text encoder... ✓" Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.160Z level=INFO source=server.go:129 msg=image-runner msg=" Loading transformer... ✓" Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.354Z level=INFO source=server.go:129 msg=image-runner msg=" Loading VAE... ✓" Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.356Z level=INFO source=server.go:129 msg=image-runner msg=" Evaluating weights... ✓" Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.356Z level=INFO source=server.go:129 msg=image-runner msg=" Loaded in 8.21s (11.1 GB VRAM)" Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.356Z level=WARN source=server.go:136 msg=image-runner msg="2026/02/04 11:41:56 runner.go:139: INFO image runner listening addr=127.0.0.1:33573" Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.373Z level=INFO source=server.go:214 msg="image runner is ready" port=33573 Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.374Z level=INFO source=server.go:129 msg=image-runner msg=" Output: 1024x1024" Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.394Z level=INFO source=server.go:129 msg=image-runner msg=" Encoding prompt... MLX error: [matmul] Last dimension of first input with shape (1,512,4096) must match second to last dimension of second input with shape (512,36864). at /go/src/github.com> ```
Author
Owner

@rick-github commented on GitHub (Feb 4, 2026):

What's the memory requirement ? I get OOM:

The image models currently don't spill to system RAM, so the GPU needs to have:

model nvidia-smi VRAM
x/z-image-turbo:bf16 18396MiB
x/z-image-turbo:fp8 48776MiB
x/flux2-klein:9b-bf16 14504MiB
x/flux2-klein:4b-bf16 15464MiB

I have no idea why the fp8 of ZIT requires so much more than the bf16.

and with flux2, no OOM but:

As mentioned, the fp4 and fp8 quants of flux2 don't work on Linux, you need to use x/flux2-klein:9b-bf16

<!-- gh-comment-id:3847105340 --> @rick-github commented on GitHub (Feb 4, 2026): > What's the memory requirement ? I get OOM: The image models currently don't spill to system RAM, so the GPU needs to have: | model | nvidia-smi VRAM | | -- | -- | | x/z-image-turbo:bf16 | 18396MiB | | x/z-image-turbo:fp8 | 48776MiB | | x/flux2-klein:9b-bf16 | 14504MiB | | x/flux2-klein:4b-bf16 | 15464MiB | I have no idea why the fp8 of ZIT requires so much more than the bf16. > and with flux2, no OOM but: As mentioned, the fp4 and fp8 quants of flux2 don't work on Linux, you need to use x/flux2-klein:9b-bf16
Author
Owner

@iamobservable commented on GitHub (Feb 4, 2026):

What's the memory requirement ? I get OOM:

The image models currently don't spill to system RAM, so the GPU needs to have:

model nvidia-smi VRAM
x/z-image-turbo:bf16 18396MiB
x/z-image-turbo:fp8 48776MiB
x/flux2-klein:9b-bf16 14504MiB
x/flux2-klein:4b-bf16 15464MiB
I have no idea why the fp8 of ZIT requires so much more than the bf16.

and with flux2, no OOM but:

As mentioned, the fp4 and fp8 quants of flux2 don't work on Linux, you need to use x/flux2-klein:9b-bf16

I can confirm the flux2 models above worked with my linux (arch) system with an RTX-3090 installed.

<!-- gh-comment-id:3850313235 --> @iamobservable commented on GitHub (Feb 4, 2026): > > What's the memory requirement ? I get OOM: > > The image models currently don't spill to system RAM, so the GPU needs to have: > > model nvidia-smi VRAM > x/z-image-turbo:bf16 18396MiB > x/z-image-turbo:fp8 48776MiB > x/flux2-klein:9b-bf16 14504MiB > x/flux2-klein:4b-bf16 15464MiB > I have no idea why the fp8 of ZIT requires so much more than the bf16. > > > and with flux2, no OOM but: > > As mentioned, the fp4 and fp8 quants of flux2 don't work on Linux, you need to use x/flux2-klein:9b-bf16 I can confirm the flux2 models above worked with my linux (arch) system with an RTX-3090 installed.
Author
Owner

@Digit-al commented on GitHub (Feb 5, 2026):

Still not working on my side. Now I get:

Feb 04 14:15:31 ollama ollama[570825]: time=2026-02-04T14:15:31.234Z level=INFO source=server.go:129 msg=image-runner msg="MLX error: Failed to compile kernel: nvrtc: error: failed to open libnvrtc-builtins.so.13.0."
Feb 04 14:15:31 ollama ollama[570825]: time=2026-02-04T14:15:31.234Z level=INFO source=server.go:129 msg=image-runner msg="  Make sure that libnvrtc-builtins.so.13.0 is installed correctly.. at /go/src/github.com/ollama/ollama/build/_deps/mlx-c-src/mlx/c/transforms.cpp:73"

Could it be because I have CUDA 13.1 ? (@rick-github, you just mentionned major version of CUDA 13 above "If you are running on bare metal, you need to upgrade to CUDA v13"):

root@ollama:/usr/local/cuda# find . -iname "libnvrtc-builtins.so*"
./targets/x86_64-linux/lib/libnvrtc-builtins.so
./targets/x86_64-linux/lib/libnvrtc-builtins.so.13.1
./targets/x86_64-linux/lib/libnvrtc-builtins.so.13.1.115
<!-- gh-comment-id:3852433990 --> @Digit-al commented on GitHub (Feb 5, 2026): Still not working on my side. Now I get: ``` Feb 04 14:15:31 ollama ollama[570825]: time=2026-02-04T14:15:31.234Z level=INFO source=server.go:129 msg=image-runner msg="MLX error: Failed to compile kernel: nvrtc: error: failed to open libnvrtc-builtins.so.13.0." Feb 04 14:15:31 ollama ollama[570825]: time=2026-02-04T14:15:31.234Z level=INFO source=server.go:129 msg=image-runner msg=" Make sure that libnvrtc-builtins.so.13.0 is installed correctly.. at /go/src/github.com/ollama/ollama/build/_deps/mlx-c-src/mlx/c/transforms.cpp:73" ``` Could it be because I have CUDA 13.1 ? (@rick-github, you just mentionned major version of CUDA 13 above "_If you are running on bare metal, you need to upgrade to CUDA v13_"): ``` root@ollama:/usr/local/cuda# find . -iname "libnvrtc-builtins.so*" ./targets/x86_64-linux/lib/libnvrtc-builtins.so ./targets/x86_64-linux/lib/libnvrtc-builtins.so.13.1 ./targets/x86_64-linux/lib/libnvrtc-builtins.so.13.1.115 ```
Author
Owner

@rick-github commented on GitHub (Feb 5, 2026):

Yes, v13.0 required. You could try just copying those 13.1 files to 13.0, a minor version number change may still be compatible.

<!-- gh-comment-id:3852488496 --> @rick-github commented on GitHub (Feb 5, 2026): Yes, v13.0 required. You could try just copying those 13.1 files to 13.0, a minor version number change may still be compatible.
Author
Owner

@Digit-al commented on GitHub (Feb 5, 2026):

no luck either... I think I will wait the official delivery.

Feb 05 10:41:04 ollama ollama[1327]: time=2026-02-05T10:41:04.840Z level=INFO source=server.go:129 msg=image-runner msg="  Evaluating setup... MLX error: Failed to compile kernel: /usr/local/cuda/include/cuda/std/__cccl/cuda_toolkit.h(39): catastrophic error: #error directive: \"CUDA compiler and CUDA toolkit heade>
Feb 05 10:41:04 ollama ollama[1327]: time=2026-02-05T10:41:04.840Z level=INFO source=server.go:129 msg=image-runner msg="  #    error \"CUDA compiler and CUDA toolkit headers are incompatible, please check your include paths\""
Feb 05 10:41:04 ollama ollama[1327]: time=2026-02-05T10:41:04.840Z level=INFO source=server.go:129 msg=image-runner msg="       ^"
Feb 05 10:41:04 ollama ollama[1327]: time=2026-02-05T10:41:04.840Z level=INFO source=server.go:129 msg=image-runner msg=""
Feb 05 10:41:04 ollama ollama[1327]: time=2026-02-05T10:41:04.840Z level=INFO source=server.go:129 msg=image-runner msg="1 catastrophic error detected in the compilation of \"gather_bfloat16_int32_1.cu\"."
Feb 05 10:41:04 ollama ollama[1327]: time=2026-02-05T10:41:04.840Z level=INFO source=server.go:129 msg=image-runner msg="Compilation terminated."
Feb 05 10:41:04 ollama ollama[1327]: time=2026-02-05T10:41:04.840Z level=INFO source=server.go:129 msg=image-runner msg=". at /go/src/github.com/ollama/ollama/build/_deps/mlx-c-src/mlx/c/transforms.cpp:73"
<!-- gh-comment-id:3852709590 --> @Digit-al commented on GitHub (Feb 5, 2026): no luck either... I think I will wait the official delivery. ``` Feb 05 10:41:04 ollama ollama[1327]: time=2026-02-05T10:41:04.840Z level=INFO source=server.go:129 msg=image-runner msg=" Evaluating setup... MLX error: Failed to compile kernel: /usr/local/cuda/include/cuda/std/__cccl/cuda_toolkit.h(39): catastrophic error: #error directive: \"CUDA compiler and CUDA toolkit heade> Feb 05 10:41:04 ollama ollama[1327]: time=2026-02-05T10:41:04.840Z level=INFO source=server.go:129 msg=image-runner msg=" # error \"CUDA compiler and CUDA toolkit headers are incompatible, please check your include paths\"" Feb 05 10:41:04 ollama ollama[1327]: time=2026-02-05T10:41:04.840Z level=INFO source=server.go:129 msg=image-runner msg=" ^" Feb 05 10:41:04 ollama ollama[1327]: time=2026-02-05T10:41:04.840Z level=INFO source=server.go:129 msg=image-runner msg="" Feb 05 10:41:04 ollama ollama[1327]: time=2026-02-05T10:41:04.840Z level=INFO source=server.go:129 msg=image-runner msg="1 catastrophic error detected in the compilation of \"gather_bfloat16_int32_1.cu\"." Feb 05 10:41:04 ollama ollama[1327]: time=2026-02-05T10:41:04.840Z level=INFO source=server.go:129 msg=image-runner msg="Compilation terminated." Feb 05 10:41:04 ollama ollama[1327]: time=2026-02-05T10:41:04.840Z level=INFO source=server.go:129 msg=image-runner msg=". at /go/src/github.com/ollama/ollama/build/_deps/mlx-c-src/mlx/c/transforms.cpp:73" ```
Author
Owner

@SnowZhangSN commented on GitHub (Feb 13, 2026):

OS version : macOS Tahoe 26.2

ollama version is 0.16.1

ollama run x/flux2-klein:latest

Error: failed to load model: 500 Internal Server Error: mlx runner failed: model.norm.weight (exit: exit status 1)

<!-- gh-comment-id:3894579680 --> @SnowZhangSN commented on GitHub (Feb 13, 2026): OS version : macOS Tahoe 26.2 ollama version is 0.16.1 ollama run x/flux2-klein:latest Error: failed to load model: 500 Internal Server Error: mlx runner failed: model.norm.weight (exit: exit status 1)
Author
Owner

@mircomir commented on GitHub (Feb 13, 2026):

OS version : macOS Tahoe 26.2
Error: failed to load model: 500 Internal Server Error: mlx runner failed: model.norm.weight (exit: exit status 1)

Yes, I have the same issue on my Mac Studio M1 Max

<!-- gh-comment-id:3895177623 --> @mircomir commented on GitHub (Feb 13, 2026): > OS version : macOS Tahoe 26.2 > Error: failed to load model: 500 Internal Server Error: mlx runner failed: model.norm.weight (exit: exit status 1) Yes, I have the same issue on my Mac Studio M1 Max
Author
Owner

@stglasauer commented on GitHub (Feb 13, 2026):

OS version : macOS Tahoe 26.2
Error: failed to load model: 500 Internal Server Error: mlx runner failed: model.norm.weight (exit: exit status 1)

Same error on my Mac mini (M4, 24 GB) with Tahoe 26.2.

Then I ran it today on my MacBook Pro (M1 Pro, 64 GB) laptop with Sequoia 15.7.3 and before upgrading Ollama to Version 0.16.1:

% ollama run x/flux2-klein
pulling manifest 
pulling model: 100% [...] 5.7 GB                         
writing manifest 
success 
>>> a beautiful balloon floating above the coulds
Image saved to: a-beautiful-balloon-floating-above-the-coulds-20260213-125629.png

worked fine!

Then I updated Ollama to Version 0.16.1, and now:

% ollama run x/flux2-klein
Error: failed to load model: 500 Internal Server Error: mlx runner failed:   model.norm.weight (exit: exit status 1)

my conclusion: it's not Tahoe 26.2, but apparently version ollama version 0.16.1

<!-- gh-comment-id:3897964703 --> @stglasauer commented on GitHub (Feb 13, 2026): > OS version : macOS Tahoe 26.2 > Error: failed to load model: 500 Internal Server Error: mlx runner failed: model.norm.weight (exit: exit status 1) Same error on my Mac mini (M4, 24 GB) with Tahoe 26.2. Then I ran it today on my MacBook Pro (M1 Pro, 64 GB) laptop with Sequoia 15.7.3 and before upgrading Ollama to Version 0.16.1: ``` % ollama run x/flux2-klein pulling manifest pulling model: 100% [...] 5.7 GB writing manifest success >>> a beautiful balloon floating above the coulds Image saved to: a-beautiful-balloon-floating-above-the-coulds-20260213-125629.png ```` worked fine! Then I updated Ollama to Version 0.16.1, and now: ``` % ollama run x/flux2-klein Error: failed to load model: 500 Internal Server Error: mlx runner failed: model.norm.weight (exit: exit status 1) ``` my conclusion: it's not Tahoe 26.2, but apparently version ollama version 0.16.1
Author
Owner

@arfjdms1 commented on GitHub (Mar 19, 2026):

Now I feel outclased with only a A6000

<!-- gh-comment-id:4086607883 --> @arfjdms1 commented on GitHub (Mar 19, 2026): Now I feel outclased with only a A6000
Author
Owner

@Daims971 commented on GitHub (Mar 22, 2026):

ollama run x/flux2-klein:4b
pulling manifest
pulling model: 100% ▕██████████████████████████████████████████████████████████████████████████████▏ 5.7 GB
writing manifest
success
Error: failed to load model: 500 Internal Server Error: mlx runner failed: Error: failed to create server: failed to load image model: failed to load flux2 model: text encoder: load weights: load model.layers.6.mlp.up_proj.weight_qbias: failed to load safetensors: D:\Users\Damien\Installations\ollama\models\blobs\sha256-xxxxxxx (exit: exit status 1)

Still have this issue on windows 11.

My system:
CPU: i7 (8 cores)
RAM: 32 Gb
GPU: Intel Iris
VRAM: 16GB

<!-- gh-comment-id:4105988347 --> @Daims971 commented on GitHub (Mar 22, 2026): ollama run x/flux2-klein:4b pulling manifest pulling model: 100% ▕██████████████████████████████████████████████████████████████████████████████▏ 5.7 GB writing manifest success Error: failed to load model: 500 Internal Server Error: mlx runner failed: Error: failed to create server: failed to load image model: failed to load flux2 model: text encoder: load weights: load model.layers.6.mlp.up_proj.weight_qbias: failed to load safetensors: D:\Users\Damien\Installations\ollama\models\blobs\sha256-xxxxxxx (exit: exit status 1) Still have this issue on windows 11. My system: CPU: i7 (8 cores) RAM: 32 Gb GPU: Intel Iris VRAM: 16GB
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#34938