[GH-ISSUE #14046] FLUX.2 Klein MLX error #34938

New Issue

GiteaMirror · 2026-04-22T18:57:07-05:00

GiteaMirror commented

2026-04-22 18:57:07 -05:00

Originally created by @kamenik on GitHub (Feb 3, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14046

What is the issue?

Ollama crashes when creating image using Flux.2 Klein model. x/z-image-turbo is ok.

ollama run x/flux-klein:9b "a cat holding a sign that says hello world"

MLX error: [matmul] Last dimension of first input with shape (1,512,2560) must match second to last dimension of second input with shape (384,3072).

Ollama version
0.15.4

OS / Distro
Ubuntu 24.04 LTS (kernel 6.14.0-37-generic #37~24.04.1-Ubuntu)

GPU / Driver / CUDA
NVIDIA RTX PRO 6000 Blackwell
Driver Version: 580.126.09
CUDA Version: 13.0

Relevant log output

OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

0.15.4

Originally created by @kamenik on GitHub (Feb 3, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14046 ### What is the issue? Ollama crashes when creating image using Flux.2 Klein model. x/z-image-turbo is ok. ``` ollama run x/flux-klein:9b "a cat holding a sign that says hello world" ``` MLX error: [matmul] Last dimension of first input with shape (1,512,2560) must match second to last dimension of second input with shape (384,3072). Ollama version 0.15.4 OS / Distro Ubuntu 24.04 LTS (kernel 6.14.0-37-generic #37~24.04.1-Ubuntu) GPU / Driver / CUDA NVIDIA RTX PRO 6000 Blackwell Driver Version: 580.126.09 CUDA Version: 13.0 ### Relevant log output ```shell ``` ### OS Linux ### GPU Nvidia ### CPU AMD ### Ollama version 0.15.4

GiteaMirror added the bug label 2026-04-22 18:57:07 -05:00

GiteaMirror commented

2026-04-22 18:57:08 -05:00

@Digit-al commented on GitHub (Feb 3, 2026):

From https://ollama.com/blog/image-generation I understand that it's currently working only on macOS.
But I am interested to know when it's planned to be delivered for Linux !

@Digit-al commented on GitHub (Feb 3, 2026): From https://ollama.com/blog/image-generation I understand that it's currently working only on macOS. But I am interested to know when it's planned to be delivered for Linux !

GiteaMirror commented

2026-04-22 18:57:08 -05:00

@rick-github commented on GitHub (Feb 4, 2026):

It's the fp4 and fp8 quants that don't work on Linux, the bf16 (both 4b and 9b) quants work fine.

@rick-github commented on GitHub (Feb 4, 2026): It's the fp4 and fp8 quants that don't work on Linux, the bf16 (both 4b and 9b) quants work fine.

GiteaMirror commented

2026-04-22 18:57:08 -05:00

@mircomir commented on GitHub (Feb 4, 2026):

It's the fp4 and fp8 quants that don't work on Linux, the bf16 (both 4b and 9b) quants work fine.

I'm not sure if I interpreted what you said correctly, but it doesn't work for me on Linux with the models currently present on Ollama.

ollama run x/flux2-klein:4b "a cat"
Error: 500 Internal Server Error: image runner exited unexpectedly: exit status 255

ollama run x/z-image-turbo:bf16 "a cat"
Error: 500 Internal Server Error: image runner exited unexpectedly: exit status 255

It doesn't work on Windows either.

ollama run x/flux2-klein:9b "a cat"
Error: 500 Internal Server Error: image runner failed: Error: image generation not available: build with mlx tag (exit: exit status 1)

ollama run x/z-image-turbo:bf16 "a cat"
Error: 500 Internal Server Error: image runner failed: Error: image generation not available: build with mlx tag (exit: exit status 1)

On macOS, however, it works correctly as advertised. The problem is that I have a decent card (NVidia) on my PC :)
I therefore assume that image generation support, as announced, is not yet available on Windows and Linux.

@mircomir commented on GitHub (Feb 4, 2026): > It's the fp4 and fp8 quants that don't work on Linux, the bf16 (both 4b and 9b) quants work fine. I'm not sure if I interpreted what you said correctly, but it doesn't work for me on Linux with the models currently present on Ollama. `ollama run x/flux2-klein:4b "a cat"` Error: 500 Internal Server Error: image runner exited unexpectedly: exit status 255 `ollama run x/z-image-turbo:bf16 "a cat"` Error: 500 Internal Server Error: image runner exited unexpectedly: exit status 255 It doesn't work on Windows either. `ollama run x/flux2-klein:9b "a cat"` Error: 500 Internal Server Error: image runner failed: Error: image generation not available: build with mlx tag (exit: exit status 1) `ollama run x/z-image-turbo:bf16 "a cat"` Error: 500 Internal Server Error: image runner failed: Error: image generation not available: build with mlx tag (exit: exit status 1) On macOS, however, it works correctly as advertised. The problem is that I have a decent card (NVidia) on my PC :) I therefore assume that image generation support, as announced, is not yet available on Windows and Linux.

GiteaMirror commented

2026-04-22 18:57:09 -05:00

@Digit-al commented on GitHub (Feb 4, 2026):

I observe the same and didn't reported it as a bug as it's announced working on macOS only for now.
I have a RTX3090

@Digit-al commented on GitHub (Feb 4, 2026): I observe the same and didn't reported it as a bug as it's announced working on macOS only for now. I have a RTX3090

GiteaMirror commented

2026-04-22 18:57:09 -05:00

@mircomir commented on GitHub (Feb 4, 2026):

I have a RTX3090

Mine is NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition. If you need any more info or tests, let me know.

@mircomir commented on GitHub (Feb 4, 2026): > I have a RTX3090 Mine is NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition. If you need any more info or tests, let me know.

GiteaMirror commented

2026-04-22 18:57:10 -05:00

@rick-github commented on GitHub (Feb 4, 2026):

I therefore assume that image generation support, as announced, is not yet available on Windows and Linux.

Windows support is pending work on Windows MLX support. The Linux releases include MLX support, but there are some libraries and includes missing which prevent the release from working out of the box. If you are using docker, you can create a new container that includes the missing pieces. If you are running on bare metal, you need to upgrade to CUDA v13, install the Nvidia dev toolkit, install libquadmath0, and run sudo bash -c 'cd /usr/local/cuda && ln -s cccl/cuda include/cuda'.

@rick-github commented on GitHub (Feb 4, 2026): > I therefore assume that image generation support, as announced, is not yet available on Windows and Linux. Windows support is pending work on Windows MLX support. The Linux releases include MLX support, but there are some libraries and includes missing which prevent the release from working out of the box. If you are using docker, you can [create a new container](https://github.com/ollama/ollama/issues/14016#issuecomment-3831904450) that includes the missing pieces. If you are running on bare metal, you need to upgrade to CUDA v13, install the [Nvidia dev toolkit]( https://developer.nvidia.com/cuda-13-0-0-download-archive), install libquadmath0, and run `sudo bash -c 'cd /usr/local/cuda && ln -s cccl/cuda include/cuda'`.

GiteaMirror commented

2026-04-22 18:57:12 -05:00

@Digit-al commented on GitHub (Feb 4, 2026):

What's the memory requirement ? I get OOM:

root@ollama:/usr/include# ollama run x/z-image-turbo:latest
Error: failed to load model: 500 Internal Server Error: image runner failed: 2026/02/04 11:34:47 runner.go:92: INFO detected model type type=ZImagePipeline (exit: exit status 255)
root@ollama:/usr/include# journalctl -xeu ollama
Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.117Z level=INFO source=routes.go:1631 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:16384 OLLAMA_>
Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.251Z level=INFO source=images.go:473 msg="total blobs: 3950"
Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.265Z level=INFO source=images.go:480 msg="total unused blobs removed: 0"
Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.265Z level=INFO source=routes.go:1684 msg="Listening on 127.0.0.1:11434 (version 0.15.4)"
Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.266Z level=INFO source=runner.go:67 msg="discovering available GPUs..."
Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.266Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 35243"
Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.547Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 36855"
Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.796Z level=INFO source=runner.go:106 msg="experimental Vulkan support disabled.  To enable, set OLLAMA_VULKAN=1"
Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.797Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 35513"
Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.797Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 39361"
Feb 04 11:34:44 ollama ollama[570825]: time=2026-02-04T11:34:44.084Z level=INFO source=types.go:42 msg="inference compute" id=GPU-72b72b31-1eb2-548f-3d9c-87584844b4db filter_id="" library=CUDA compute=8.6 name=CUDA0 description="NVIDIA GeForce RTX 3090" libdirs=ollama,cuda_v13 driver=13.1 pci_id=0000:00:10.0 type=>
Feb 04 11:34:46 ollama ollama[570825]: [GIN] 2026/02/04 - 11:34:46 | 200 |      92.423µs |       127.0.0.1 | HEAD     "/"
Feb 04 11:34:46 ollama ollama[570825]: [GIN] 2026/02/04 - 11:34:46 | 200 |   92.111652ms |       127.0.0.1 | POST     "/api/show"
Feb 04 11:34:46 ollama ollama[570825]: time=2026-02-04T11:34:46.959Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 36927"
Feb 04 11:34:47 ollama ollama[570825]: time=2026-02-04T11:34:47.224Z level=INFO source=server.go:143 msg="starting image runner subprocess" exe=/usr/local/bin/ollama model=x/z-image-turbo:latest port=43599
Feb 04 11:34:47 ollama ollama[570825]: time=2026-02-04T11:34:47.703Z level=WARN source=server.go:136 msg=image-runner msg="2026/02/04 11:34:47 runner.go:87: INFO MLX library initialized"
Feb 04 11:34:47 ollama ollama[570825]: time=2026-02-04T11:34:47.703Z level=WARN source=server.go:136 msg=image-runner msg="2026/02/04 11:34:47 runner.go:88: INFO starting image runner model=x/z-image-turbo:latest port=43599"
Feb 04 11:34:47 ollama ollama[570825]: time=2026-02-04T11:34:47.710Z level=INFO source=server.go:129 msg=image-runner msg="Loading Z-Image model from manifest: x/z-image-turbo:latest..."
Feb 04 11:34:47 ollama ollama[570825]: time=2026-02-04T11:34:47.710Z level=WARN source=server.go:136 msg=image-runner msg="2026/02/04 11:34:47 runner.go:92: INFO detected model type type=ZImagePipeline"
Feb 04 11:34:48 ollama ollama[570825]: time=2026-02-04T11:34:48.229Z level=INFO source=server.go:129 msg=image-runner msg="  Loading tokenizer... ✓"
Feb 04 11:34:52 ollama ollama[570825]: time=2026-02-04T11:34:52.127Z level=INFO source=server.go:129 msg=image-runner msg="  Loading text encoder... ✓"
Feb 04 11:34:53 ollama ollama[570825]: time=2026-02-04T11:34:53.477Z level=INFO source=server.go:129 msg=image-runner msg="  (11.3 GB, peak 11.3 GB)"
Feb 04 11:34:59 ollama ollama[570825]: time=2026-02-04T11:34:59.375Z level=INFO source=server.go:129 msg=image-runner msg="  Loading transformer... ✓"
Feb 04 11:35:04 ollama ollama[570825]: time=2026-02-04T11:35:04.122Z level=INFO source=server.go:129 msg=image-runner msg="MLX error: cudaMallocAsync(&data, size, stream) failed: out of memory at /go/src/github.com/ollama/ollama/build/_deps/mlx-c-src/mlx/c/transforms.cpp:73"
Feb 04 11:35:04 ollama ollama[570825]: time=2026-02-04T11:35:04.449Z level=INFO source=server.go:321 msg="stopping image runner subprocess" pid=570920
Feb 04 11:35:09 ollama ollama[570825]: [GIN] 2026/02/04 - 11:35:09 | 500 | 22.630603139s |       127.0.0.1 | POST     "/api/generate"
root@ollama:/usr/include# nvidia-smi
Wed Feb  4 11:35:57 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 590.48.01              Driver Version: 590.48.01      CUDA Version: 13.1     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        On  |   00000000:00:10.0 Off |                  N/A |
|  0%   32C    P8             17W /  370W |       1MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
root@ollama:/usr/include# free
               total        used        free      shared  buff/cache   available
Mem:       197855708     5862188    27617128        1324   166156660   191993520
Swap:        8388604           0     8388604

@Digit-al commented on GitHub (Feb 4, 2026): What's the memory requirement ? I get OOM: ``` root@ollama:/usr/include# ollama run x/z-image-turbo:latest Error: failed to load model: 500 Internal Server Error: image runner failed: 2026/02/04 11:34:47 runner.go:92: INFO detected model type type=ZImagePipeline (exit: exit status 255) root@ollama:/usr/include# journalctl -xeu ollama Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.117Z level=INFO source=routes.go:1631 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:16384 OLLAMA_> Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.251Z level=INFO source=images.go:473 msg="total blobs: 3950" Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.265Z level=INFO source=images.go:480 msg="total unused blobs removed: 0" Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.265Z level=INFO source=routes.go:1684 msg="Listening on 127.0.0.1:11434 (version 0.15.4)" Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.266Z level=INFO source=runner.go:67 msg="discovering available GPUs..." Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.266Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 35243" Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.547Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 36855" Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.796Z level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1" Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.797Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 35513" Feb 04 11:34:43 ollama ollama[570825]: time=2026-02-04T11:34:43.797Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 39361" Feb 04 11:34:44 ollama ollama[570825]: time=2026-02-04T11:34:44.084Z level=INFO source=types.go:42 msg="inference compute" id=GPU-72b72b31-1eb2-548f-3d9c-87584844b4db filter_id="" library=CUDA compute=8.6 name=CUDA0 description="NVIDIA GeForce RTX 3090" libdirs=ollama,cuda_v13 driver=13.1 pci_id=0000:00:10.0 type=> Feb 04 11:34:46 ollama ollama[570825]: [GIN] 2026/02/04 - 11:34:46 | 200 | 92.423µs | 127.0.0.1 | HEAD "/" Feb 04 11:34:46 ollama ollama[570825]: [GIN] 2026/02/04 - 11:34:46 | 200 | 92.111652ms | 127.0.0.1 | POST "/api/show" Feb 04 11:34:46 ollama ollama[570825]: time=2026-02-04T11:34:46.959Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 36927" Feb 04 11:34:47 ollama ollama[570825]: time=2026-02-04T11:34:47.224Z level=INFO source=server.go:143 msg="starting image runner subprocess" exe=/usr/local/bin/ollama model=x/z-image-turbo:latest port=43599 Feb 04 11:34:47 ollama ollama[570825]: time=2026-02-04T11:34:47.703Z level=WARN source=server.go:136 msg=image-runner msg="2026/02/04 11:34:47 runner.go:87: INFO MLX library initialized" Feb 04 11:34:47 ollama ollama[570825]: time=2026-02-04T11:34:47.703Z level=WARN source=server.go:136 msg=image-runner msg="2026/02/04 11:34:47 runner.go:88: INFO starting image runner model=x/z-image-turbo:latest port=43599" Feb 04 11:34:47 ollama ollama[570825]: time=2026-02-04T11:34:47.710Z level=INFO source=server.go:129 msg=image-runner msg="Loading Z-Image model from manifest: x/z-image-turbo:latest..." Feb 04 11:34:47 ollama ollama[570825]: time=2026-02-04T11:34:47.710Z level=WARN source=server.go:136 msg=image-runner msg="2026/02/04 11:34:47 runner.go:92: INFO detected model type type=ZImagePipeline" Feb 04 11:34:48 ollama ollama[570825]: time=2026-02-04T11:34:48.229Z level=INFO source=server.go:129 msg=image-runner msg=" Loading tokenizer... ✓" Feb 04 11:34:52 ollama ollama[570825]: time=2026-02-04T11:34:52.127Z level=INFO source=server.go:129 msg=image-runner msg=" Loading text encoder... ✓" Feb 04 11:34:53 ollama ollama[570825]: time=2026-02-04T11:34:53.477Z level=INFO source=server.go:129 msg=image-runner msg=" (11.3 GB, peak 11.3 GB)" Feb 04 11:34:59 ollama ollama[570825]: time=2026-02-04T11:34:59.375Z level=INFO source=server.go:129 msg=image-runner msg=" Loading transformer... ✓" Feb 04 11:35:04 ollama ollama[570825]: time=2026-02-04T11:35:04.122Z level=INFO source=server.go:129 msg=image-runner msg="MLX error: cudaMallocAsync(&data, size, stream) failed: out of memory at /go/src/github.com/ollama/ollama/build/_deps/mlx-c-src/mlx/c/transforms.cpp:73" Feb 04 11:35:04 ollama ollama[570825]: time=2026-02-04T11:35:04.449Z level=INFO source=server.go:321 msg="stopping image runner subprocess" pid=570920 Feb 04 11:35:09 ollama ollama[570825]: [GIN] 2026/02/04 - 11:35:09 | 500 | 22.630603139s | 127.0.0.1 | POST "/api/generate" root@ollama:/usr/include# nvidia-smi Wed Feb 4 11:35:57 2026 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 590.48.01 Driver Version: 590.48.01 CUDA Version: 13.1 | +-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 3090 On | 00000000:00:10.0 Off | N/A | | 0% 32C P8 17W / 370W | 1MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+ root@ollama:/usr/include# free total used free shared buff/cache available Mem: 197855708 5862188 27617128 1324 166156660 191993520 Swap: 8388604 0 8388604 ```

GiteaMirror commented

2026-04-22 18:57:14 -05:00

@Digit-al commented on GitHub (Feb 4, 2026):

and with flux2, no OOM but:

root@ollama:/usr/include# ollama run x/flux2-klein:9b "a cat"
Error: 500 Internal Server Error: Post "http://127.0.0.1:33573/completion": EOF

Feb 04 11:41:47 ollama ollama[570825]: time=2026-02-04T11:41:47.378Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 33987"
Feb 04 11:41:47 ollama ollama[570825]: time=2026-02-04T11:41:47.671Z level=INFO source=server.go:143 msg="starting image runner subprocess" exe=/usr/local/bin/ollama model=x/flux2-klein:9b port=33573
Feb 04 11:41:48 ollama ollama[570825]: time=2026-02-04T11:41:48.136Z level=WARN source=server.go:136 msg=image-runner msg="2026/02/04 11:41:48 runner.go:87: INFO MLX library initialized"
Feb 04 11:41:48 ollama ollama[570825]: time=2026-02-04T11:41:48.136Z level=WARN source=server.go:136 msg=image-runner msg="2026/02/04 11:41:48 runner.go:88: INFO starting image runner model=x/flux2-klein:9b port=33573"
Feb 04 11:41:48 ollama ollama[570825]: time=2026-02-04T11:41:48.144Z level=WARN source=server.go:136 msg=image-runner msg="2026/02/04 11:41:48 runner.go:92: INFO detected model type type=Flux2KleinPipeline"
Feb 04 11:41:48 ollama ollama[570825]: time=2026-02-04T11:41:48.144Z level=INFO source=server.go:129 msg=image-runner msg="Loading FLUX.2 Klein model from manifest: x/flux2-klein:9b..."
Feb 04 11:41:48 ollama ollama[570825]: time=2026-02-04T11:41:48.633Z level=INFO source=server.go:129 msg=image-runner msg="  Loading tokenizer... ✓"
Feb 04 11:41:52 ollama ollama[570825]: time=2026-02-04T11:41:52.573Z level=INFO source=server.go:129 msg=image-runner msg="  Loading text encoder... ✓"
Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.160Z level=INFO source=server.go:129 msg=image-runner msg="  Loading transformer... ✓"
Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.354Z level=INFO source=server.go:129 msg=image-runner msg="  Loading VAE... ✓"
Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.356Z level=INFO source=server.go:129 msg=image-runner msg="  Evaluating weights... ✓"
Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.356Z level=INFO source=server.go:129 msg=image-runner msg="  Loaded in 8.21s (11.1 GB VRAM)"
Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.356Z level=WARN source=server.go:136 msg=image-runner msg="2026/02/04 11:41:56 runner.go:139: INFO image runner listening addr=127.0.0.1:33573"
Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.373Z level=INFO source=server.go:214 msg="image runner is ready" port=33573
Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.374Z level=INFO source=server.go:129 msg=image-runner msg="  Output: 1024x1024"
Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.394Z level=INFO source=server.go:129 msg=image-runner msg="  Encoding prompt... MLX error: [matmul] Last dimension of first input with shape (1,512,4096) must match second to last dimension of second input with shape (512,36864). at /go/src/github.com>

@Digit-al commented on GitHub (Feb 4, 2026): and with flux2, no OOM but: ``` root@ollama:/usr/include# ollama run x/flux2-klein:9b "a cat" Error: 500 Internal Server Error: Post "http://127.0.0.1:33573/completion": EOF Feb 04 11:41:47 ollama ollama[570825]: time=2026-02-04T11:41:47.378Z level=INFO source=server.go:429 msg="starting runner" cmd="/usr/local/bin/ollama runner --ollama-engine --port 33987" Feb 04 11:41:47 ollama ollama[570825]: time=2026-02-04T11:41:47.671Z level=INFO source=server.go:143 msg="starting image runner subprocess" exe=/usr/local/bin/ollama model=x/flux2-klein:9b port=33573 Feb 04 11:41:48 ollama ollama[570825]: time=2026-02-04T11:41:48.136Z level=WARN source=server.go:136 msg=image-runner msg="2026/02/04 11:41:48 runner.go:87: INFO MLX library initialized" Feb 04 11:41:48 ollama ollama[570825]: time=2026-02-04T11:41:48.136Z level=WARN source=server.go:136 msg=image-runner msg="2026/02/04 11:41:48 runner.go:88: INFO starting image runner model=x/flux2-klein:9b port=33573" Feb 04 11:41:48 ollama ollama[570825]: time=2026-02-04T11:41:48.144Z level=WARN source=server.go:136 msg=image-runner msg="2026/02/04 11:41:48 runner.go:92: INFO detected model type type=Flux2KleinPipeline" Feb 04 11:41:48 ollama ollama[570825]: time=2026-02-04T11:41:48.144Z level=INFO source=server.go:129 msg=image-runner msg="Loading FLUX.2 Klein model from manifest: x/flux2-klein:9b..." Feb 04 11:41:48 ollama ollama[570825]: time=2026-02-04T11:41:48.633Z level=INFO source=server.go:129 msg=image-runner msg=" Loading tokenizer... ✓" Feb 04 11:41:52 ollama ollama[570825]: time=2026-02-04T11:41:52.573Z level=INFO source=server.go:129 msg=image-runner msg=" Loading text encoder... ✓" Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.160Z level=INFO source=server.go:129 msg=image-runner msg=" Loading transformer... ✓" Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.354Z level=INFO source=server.go:129 msg=image-runner msg=" Loading VAE... ✓" Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.356Z level=INFO source=server.go:129 msg=image-runner msg=" Evaluating weights... ✓" Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.356Z level=INFO source=server.go:129 msg=image-runner msg=" Loaded in 8.21s (11.1 GB VRAM)" Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.356Z level=WARN source=server.go:136 msg=image-runner msg="2026/02/04 11:41:56 runner.go:139: INFO image runner listening addr=127.0.0.1:33573" Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.373Z level=INFO source=server.go:214 msg="image runner is ready" port=33573 Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.374Z level=INFO source=server.go:129 msg=image-runner msg=" Output: 1024x1024" Feb 04 11:41:56 ollama ollama[570825]: time=2026-02-04T11:41:56.394Z level=INFO source=server.go:129 msg=image-runner msg=" Encoding prompt... MLX error: [matmul] Last dimension of first input with shape (1,512,4096) must match second to last dimension of second input with shape (512,36864). at /go/src/github.com> ```

GiteaMirror commented

2026-04-22 18:57:14 -05:00

@rick-github commented on GitHub (Feb 4, 2026):

What's the memory requirement ? I get OOM:

The image models currently don't spill to system RAM, so the GPU needs to have:

model	nvidia-smi VRAM
x/z-image-turbo:bf16	18396MiB
x/z-image-turbo:fp8	48776MiB
x/flux2-klein:9b-bf16	14504MiB
x/flux2-klein:4b-bf16	15464MiB

I have no idea why the fp8 of ZIT requires so much more than the bf16.

and with flux2, no OOM but:

As mentioned, the fp4 and fp8 quants of flux2 don't work on Linux, you need to use x/flux2-klein:9b-bf16

@rick-github commented on GitHub (Feb 4, 2026): > What's the memory requirement ? I get OOM: The image models currently don't spill to system RAM, so the GPU needs to have: | model | nvidia-smi VRAM | | -- | -- | | x/z-image-turbo:bf16 | 18396MiB | | x/z-image-turbo:fp8 | 48776MiB | | x/flux2-klein:9b-bf16 | 14504MiB | | x/flux2-klein:4b-bf16 | 15464MiB | I have no idea why the fp8 of ZIT requires so much more than the bf16. > and with flux2, no OOM but: As mentioned, the fp4 and fp8 quants of flux2 don't work on Linux, you need to use x/flux2-klein:9b-bf16

GiteaMirror commented

2026-04-22 18:57:15 -05:00

@iamobservable commented on GitHub (Feb 4, 2026):

What's the memory requirement ? I get OOM:

The image models currently don't spill to system RAM, so the GPU needs to have:

model nvidia-smi VRAM
x/z-image-turbo:bf16 18396MiB
x/z-image-turbo:fp8 48776MiB
x/flux2-klein:9b-bf16 14504MiB
x/flux2-klein:4b-bf16 15464MiB
I have no idea why the fp8 of ZIT requires so much more than the bf16.

and with flux2, no OOM but:

As mentioned, the fp4 and fp8 quants of flux2 don't work on Linux, you need to use x/flux2-klein:9b-bf16

I can confirm the flux2 models above worked with my linux (arch) system with an RTX-3090 installed.

@iamobservable commented on GitHub (Feb 4, 2026): > > What's the memory requirement ? I get OOM: > > The image models currently don't spill to system RAM, so the GPU needs to have: > > model nvidia-smi VRAM > x/z-image-turbo:bf16 18396MiB > x/z-image-turbo:fp8 48776MiB > x/flux2-klein:9b-bf16 14504MiB > x/flux2-klein:4b-bf16 15464MiB > I have no idea why the fp8 of ZIT requires so much more than the bf16. > > > and with flux2, no OOM but: > > As mentioned, the fp4 and fp8 quants of flux2 don't work on Linux, you need to use x/flux2-klein:9b-bf16 I can confirm the flux2 models above worked with my linux (arch) system with an RTX-3090 installed.

GiteaMirror commented

2026-04-22 18:57:17 -05:00

@Digit-al commented on GitHub (Feb 5, 2026):

Still not working on my side. Now I get:

Feb 04 14:15:31 ollama ollama[570825]: time=2026-02-04T14:15:31.234Z level=INFO source=server.go:129 msg=image-runner msg="MLX error: Failed to compile kernel: nvrtc: error: failed to open libnvrtc-builtins.so.13.0."
Feb 04 14:15:31 ollama ollama[570825]: time=2026-02-04T14:15:31.234Z level=INFO source=server.go:129 msg=image-runner msg="  Make sure that libnvrtc-builtins.so.13.0 is installed correctly.. at /go/src/github.com/ollama/ollama/build/_deps/mlx-c-src/mlx/c/transforms.cpp:73"

Could it be because I have CUDA 13.1 ? (@rick-github, you just mentionned major version of CUDA 13 above "If you are running on bare metal, you need to upgrade to CUDA v13"):

root@ollama:/usr/local/cuda# find . -iname "libnvrtc-builtins.so*"
./targets/x86_64-linux/lib/libnvrtc-builtins.so
./targets/x86_64-linux/lib/libnvrtc-builtins.so.13.1
./targets/x86_64-linux/lib/libnvrtc-builtins.so.13.1.115

@Digit-al commented on GitHub (Feb 5, 2026): Still not working on my side. Now I get: ``` Feb 04 14:15:31 ollama ollama[570825]: time=2026-02-04T14:15:31.234Z level=INFO source=server.go:129 msg=image-runner msg="MLX error: Failed to compile kernel: nvrtc: error: failed to open libnvrtc-builtins.so.13.0." Feb 04 14:15:31 ollama ollama[570825]: time=2026-02-04T14:15:31.234Z level=INFO source=server.go:129 msg=image-runner msg=" Make sure that libnvrtc-builtins.so.13.0 is installed correctly.. at /go/src/github.com/ollama/ollama/build/_deps/mlx-c-src/mlx/c/transforms.cpp:73" ``` Could it be because I have CUDA 13.1 ? (@rick-github, you just mentionned major version of CUDA 13 above "_If you are running on bare metal, you need to upgrade to CUDA v13_"): ``` root@ollama:/usr/local/cuda# find . -iname "libnvrtc-builtins.so*" ./targets/x86_64-linux/lib/libnvrtc-builtins.so ./targets/x86_64-linux/lib/libnvrtc-builtins.so.13.1 ./targets/x86_64-linux/lib/libnvrtc-builtins.so.13.1.115 ```

GiteaMirror commented

2026-04-22 18:57:18 -05:00

@rick-github commented on GitHub (Feb 5, 2026):

Yes, v13.0 required. You could try just copying those 13.1 files to 13.0, a minor version number change may still be compatible.

@rick-github commented on GitHub (Feb 5, 2026): Yes, v13.0 required. You could try just copying those 13.1 files to 13.0, a minor version number change may still be compatible.

GiteaMirror commented

2026-04-22 18:57:19 -05:00

@Digit-al commented on GitHub (Feb 5, 2026):

no luck either... I think I will wait the official delivery.

Feb 05 10:41:04 ollama ollama[1327]: time=2026-02-05T10:41:04.840Z level=INFO source=server.go:129 msg=image-runner msg="  Evaluating setup... MLX error: Failed to compile kernel: /usr/local/cuda/include/cuda/std/__cccl/cuda_toolkit.h(39): catastrophic error: #error directive: \"CUDA compiler and CUDA toolkit heade>
Feb 05 10:41:04 ollama ollama[1327]: time=2026-02-05T10:41:04.840Z level=INFO source=server.go:129 msg=image-runner msg="  #    error \"CUDA compiler and CUDA toolkit headers are incompatible, please check your include paths\""
Feb 05 10:41:04 ollama ollama[1327]: time=2026-02-05T10:41:04.840Z level=INFO source=server.go:129 msg=image-runner msg="       ^"
Feb 05 10:41:04 ollama ollama[1327]: time=2026-02-05T10:41:04.840Z level=INFO source=server.go:129 msg=image-runner msg=""
Feb 05 10:41:04 ollama ollama[1327]: time=2026-02-05T10:41:04.840Z level=INFO source=server.go:129 msg=image-runner msg="1 catastrophic error detected in the compilation of \"gather_bfloat16_int32_1.cu\"."
Feb 05 10:41:04 ollama ollama[1327]: time=2026-02-05T10:41:04.840Z level=INFO source=server.go:129 msg=image-runner msg="Compilation terminated."
Feb 05 10:41:04 ollama ollama[1327]: time=2026-02-05T10:41:04.840Z level=INFO source=server.go:129 msg=image-runner msg=". at /go/src/github.com/ollama/ollama/build/_deps/mlx-c-src/mlx/c/transforms.cpp:73"

@Digit-al commented on GitHub (Feb 5, 2026): no luck either... I think I will wait the official delivery. ``` Feb 05 10:41:04 ollama ollama[1327]: time=2026-02-05T10:41:04.840Z level=INFO source=server.go:129 msg=image-runner msg=" Evaluating setup... MLX error: Failed to compile kernel: /usr/local/cuda/include/cuda/std/__cccl/cuda_toolkit.h(39): catastrophic error: #error directive: \"CUDA compiler and CUDA toolkit heade> Feb 05 10:41:04 ollama ollama[1327]: time=2026-02-05T10:41:04.840Z level=INFO source=server.go:129 msg=image-runner msg=" # error \"CUDA compiler and CUDA toolkit headers are incompatible, please check your include paths\"" Feb 05 10:41:04 ollama ollama[1327]: time=2026-02-05T10:41:04.840Z level=INFO source=server.go:129 msg=image-runner msg=" ^" Feb 05 10:41:04 ollama ollama[1327]: time=2026-02-05T10:41:04.840Z level=INFO source=server.go:129 msg=image-runner msg="" Feb 05 10:41:04 ollama ollama[1327]: time=2026-02-05T10:41:04.840Z level=INFO source=server.go:129 msg=image-runner msg="1 catastrophic error detected in the compilation of \"gather_bfloat16_int32_1.cu\"." Feb 05 10:41:04 ollama ollama[1327]: time=2026-02-05T10:41:04.840Z level=INFO source=server.go:129 msg=image-runner msg="Compilation terminated." Feb 05 10:41:04 ollama ollama[1327]: time=2026-02-05T10:41:04.840Z level=INFO source=server.go:129 msg=image-runner msg=". at /go/src/github.com/ollama/ollama/build/_deps/mlx-c-src/mlx/c/transforms.cpp:73" ```

GiteaMirror commented

2026-04-22 18:57:20 -05:00

@SnowZhangSN commented on GitHub (Feb 13, 2026):

OS version : macOS Tahoe 26.2

ollama version is 0.16.1

ollama run x/flux2-klein:latest

Error: failed to load model: 500 Internal Server Error: mlx runner failed: model.norm.weight (exit: exit status 1)

@SnowZhangSN commented on GitHub (Feb 13, 2026): OS version : macOS Tahoe 26.2 ollama version is 0.16.1 ollama run x/flux2-klein:latest Error: failed to load model: 500 Internal Server Error: mlx runner failed: model.norm.weight (exit: exit status 1)

GiteaMirror commented

2026-04-22 18:57:23 -05:00

@mircomir commented on GitHub (Feb 13, 2026):

OS version : macOS Tahoe 26.2
Error: failed to load model: 500 Internal Server Error: mlx runner failed: model.norm.weight (exit: exit status 1)

Yes, I have the same issue on my Mac Studio M1 Max

@mircomir commented on GitHub (Feb 13, 2026): > OS version : macOS Tahoe 26.2 > Error: failed to load model: 500 Internal Server Error: mlx runner failed: model.norm.weight (exit: exit status 1) Yes, I have the same issue on my Mac Studio M1 Max

GiteaMirror commented

2026-04-22 18:57:25 -05:00

@stglasauer commented on GitHub (Feb 13, 2026):

OS version : macOS Tahoe 26.2
Error: failed to load model: 500 Internal Server Error: mlx runner failed: model.norm.weight (exit: exit status 1)

Same error on my Mac mini (M4, 24 GB) with Tahoe 26.2.

Then I ran it today on my MacBook Pro (M1 Pro, 64 GB) laptop with Sequoia 15.7.3 and before upgrading Ollama to Version 0.16.1:

% ollama run x/flux2-klein
pulling manifest 
pulling model: 100% [...] 5.7 GB                         
writing manifest 
success 
>>> a beautiful balloon floating above the coulds
Image saved to: a-beautiful-balloon-floating-above-the-coulds-20260213-125629.png

worked fine!

Then I updated Ollama to Version 0.16.1, and now:

% ollama run x/flux2-klein
Error: failed to load model: 500 Internal Server Error: mlx runner failed:   model.norm.weight (exit: exit status 1)

my conclusion: it's not Tahoe 26.2, but apparently version ollama version 0.16.1

@stglasauer commented on GitHub (Feb 13, 2026): > OS version : macOS Tahoe 26.2 > Error: failed to load model: 500 Internal Server Error: mlx runner failed: model.norm.weight (exit: exit status 1) Same error on my Mac mini (M4, 24 GB) with Tahoe 26.2. Then I ran it today on my MacBook Pro (M1 Pro, 64 GB) laptop with Sequoia 15.7.3 and before upgrading Ollama to Version 0.16.1: ``` % ollama run x/flux2-klein pulling manifest pulling model: 100% [...] 5.7 GB writing manifest success >>> a beautiful balloon floating above the coulds Image saved to: a-beautiful-balloon-floating-above-the-coulds-20260213-125629.png ```` worked fine! Then I updated Ollama to Version 0.16.1, and now: ``` % ollama run x/flux2-klein Error: failed to load model: 500 Internal Server Error: mlx runner failed: model.norm.weight (exit: exit status 1) ``` my conclusion: it's not Tahoe 26.2, but apparently version ollama version 0.16.1

GiteaMirror commented

2026-04-22 18:57:27 -05:00

@arfjdms1 commented on GitHub (Mar 19, 2026):

Now I feel outclased with only a A6000

@arfjdms1 commented on GitHub (Mar 19, 2026): Now I feel outclased with only a A6000

GiteaMirror commented

2026-04-22 18:57:29 -05:00

@Daims971 commented on GitHub (Mar 22, 2026):

ollama run x/flux2-klein:4b
pulling manifest
pulling model: 100% ▕██████████████████████████████████████████████████████████████████████████████▏ 5.7 GB
writing manifest
success
Error: failed to load model: 500 Internal Server Error: mlx runner failed: Error: failed to create server: failed to load image model: failed to load flux2 model: text encoder: load weights: load model.layers.6.mlp.up_proj.weight_qbias: failed to load safetensors: D:\Users\Damien\Installations\ollama\models\blobs\sha256-xxxxxxx (exit: exit status 1)

Still have this issue on windows 11.

My system:
CPU: i7 (8 cores)
RAM: 32 Gb
GPU: Intel Iris
VRAM: 16GB

@Daims971 commented on GitHub (Mar 22, 2026): ollama run x/flux2-klein:4b pulling manifest pulling model: 100% ▕██████████████████████████████████████████████████████████████████████████████▏ 5.7 GB writing manifest success Error: failed to load model: 500 Internal Server Error: mlx runner failed: Error: failed to create server: failed to load image model: failed to load flux2 model: text encoder: load weights: load model.layers.6.mlp.up_proj.weight_qbias: failed to load safetensors: D:\Users\Damien\Installations\ollama\models\blobs\sha256-xxxxxxx (exit: exit status 1) Still have this issue on windows 11. My system: CPU: i7 (8 cores) RAM: 32 Gb GPU: Intel Iris VRAM: 16GB

Sign in to join this conversation.

Branches Tags

main

hoyyeva/anthropic-local-image-path

dhiltgen/ci

dhiltgen/llama-runner

parth-remove-claude-desktop-launch

hoyyeva/anthropic-reference-images-path

parth-anthropic-reference-images-path

brucemacd/download-before-remove

hoyyeva/editor-config-repair

parth-mlx-decode-checkpoints

parth-launch-codex-app

hoyyeva/fix-codex-model-metadata-warning

hoyyeva/qwen

parth/hide-claude-desktop-till-release

hoyyeva/opencode-image-modality

parth-add-claude-code-autoinstall

release_v0.22.0

pdevine/manifest-list

codex/fix-codex-model-metadata-warning

pdevine/addressable-manifest

brucemacd/launch-fetch-reccomended

jmorganca/llama-compat

launch-copilot-cli

hoyyeva/opencode-thinking

release_v0.20.7

parth-auto-save-backup

parth-test

jmorganca/gemma4-audio-replacements

fix-manifest-digest-on-pull

hoyyeva/vscode-improve

brucemacd/install-server-wait

parth/update-claude-docs

brucemac/start-ap-install

pdevine/mlx-update

pdevine/qwen35_vision

drifkin/api-show-fallback

mintlify/image-generation-1773352582

hoyyeva/server-context-length-local-config

jmorganca/faster-reptition-penalties

jmorganca/convert-nemotron

parth-pi-thinking

pdevine/sampling-penalties

jmorganca/fix-create-quantization-memory

dongchen/resumable_transfer_fix

pdevine/sampling-cache-error

jessegross/mlx-usage

hoyyeva/openclaw-config

hoyyeva/app-html

pdevine/qwen3next

brucemacd/sign-sh-install

brucemacd/tui-update

brucemacd/usage-api

jmorganca/launch-empty

fix-app-dist-embed

mxyng/mlx-compile

mxyng/mlx-quant

mxyng/mlx-glm4.7

mxyng/mlx

brucemacd/simplify-model-picker

jmorganca/qwen3-concurrent

fix-glm-4.7-flash-mla-config

drifkin/qwen3-coder-opening-tag

brucemacd/usage-cli

fix-cuda12-fattn-shmem

ollama-imagegen-docs

parth/fix-multiline-inputs

brucemacd/config-docs

mxyng/model-files

mxyng/simple-execute

fix-imagegen-ollama-models

mxyng/async-upload

jmorganca/lazy-no-dtype-changes

imagegen-auto-detect-create

parth/decrease-concurrent-download-hf

fix-mlx-quantize-init

jmorganca/x-cleanup

usage

imagegen-readme

jmorganca/glm-image

mlx-gpu-cd

jmorganca/imagegen-modelfile

parth/agent-skills

parth/agent-allowlist

parth/signed-in-offline

parth/agents

parth/fix-context-chopping

improve-cloud-flow

parth/add-models-websearch

parth/prompt-renderer-mcp

jmorganca/native-settings

jmorganca/download-stream-hash

jmorganca/client2-rebased

brucemacd/oai-chat-req-multipart

jessegross/multi_chunk_reserve

grace/additional-omit-empty

grace/mistral-3-large

mxyng/tokenizer2

mxyng/tokenizer

jessegross/flash

hoyyeva/windows-nacked-app

mxyng/cleanup-attention

grace/deepseek-parser

hoyyeva/remember-unsent-prompt

parth/add-lfs-pointer-error-conversion

parth/olmo2-test2

hoyyeva/ollama-launchagent-plist

nicole/olmo-model

parth/olmo-test

mxyng/remove-embedded

parth/render-template

jmorganca/intellect-3

parth/remove-prealloc-linter

jmorganca/cmd-eval

nicole/nomic-embed-text-fix

mxyng/lint-2

hoyyeva/add-gemini-3-pro-preview

hoyyeva/load-model-list

mxyng/expand-path

mxyng/environ-2

hoyyeva/deeplink-json-encoding

parth/improve-tool-calling-tests

hoyyeva/conversation

hoyyeva/assistant-edit-response

hoyyeva/thinking

origin/brucemacd/invalid-char-i-err

parth/improve-tool-calling

jmorganca/required-omitempty

grace/qwen3-vl-tests

mxyng/iter-client

parth/docs-readme

nicole/embed-test

pdevine/integration-benchstat

parth/remove-generate-cmd

parth/add-toolcall-id

mxyng/server-tests

jmorganca/glm-4.6

jmorganca/gin-h-compat

drifkin/stable-tool-args

pdevine/qwen3-more-thinking

parth/add-websearch-client

nicole/websearch_local

jmorganca/qwen3-coder-updates

grace/deepseek-v3-migration-tests

mxyng/fix-create

jmorganca/cloud-errors

pdevine/parser-tidy

revert-12233-parth/simplify-entrypoints-runner

parth/enable-so-gpt-oss

brucemacd/qwen3vl

jmorganca/readme-simplify

parth/gpt-oss-structured-outputs

revert-12039-jmorganca/tools-braces

mxyng/embeddings

mxyng/gguf

mxyng/benchmark

mxyng/types-null

parth/move-parsing

mxyng/gemma2

jmorganca/docs

mxyng/16-bit

mxyng/create-stdin

pdevine/authorizedkeys

mxyng/quant

parth/opt-in-error-context-window

brucemacd/cache-models

brucemacd/runner-completion

jmorganca/llama-update-6

brucemacd/benchmark-list

brucemacd/partial-read-caps

parth/deepseek-r1-tools

mxyng/omit-array

parth/tool-prefix-temp

brucemacd/runner-test

jmorganca/qwen25vl

brucemacd/model-forward-test-ext

parth/python-function-parsing

jmorganca/cuda-compression-none

drifkin/num-parallel

drifkin/chat-truncation-fix

jmorganca/sync

parth/python-tools-calling

drifkin/array-head-count

brucemacd/create-no-loop

parth/server-enable-content-stream-with-tools

qwen25omni

mxyng/v3

brucemacd/ropeconfig

jmorganca/silence-tokenizer

parth/sample-so-test

parth/sampling-structured-outputs

brucemacd/doc-go-engine

parth/constrained-sampling-json

jmorganca/mistral-wip

brucemacd/mistral-small-convert

parth/sample-unmarshal-json-for-params

brucemacd/jomorganca/mistral

pdevine/bfloat16

jmorganca/mistral

brucemacd/mistral

pdevine/logging

parth/sample-correctness-fix

parth/sample-fix-sorting

jmorgan/sample-fix-sorting-extras

jmorganca/temp-0-images

brucemacd/parallel-embed-models

brucemacd/shim-grammar

jmorganca/fix-gguf-error

bmizerany/nameswork

jmorganca/faster-releases

bmizerany/validatenames

brucemacd/err-no-vocab

brucemacd/rope-config

brucemacd/err-hint

brucemacd/qwen2_5

brucemacd/logprobs

brucemacd/new_runner_graph_bench

progress-flicker

brucemacd/forward-test

brucemacd/go_qwen2

pdevine/gemma2

jmorganca/add-missing-symlink-eval

mxyng/next-debug

parth/set-context-size-openai

brucemacd/next-bpe-bench

brucemacd/next-bpe-test

brucemacd/new_runner_e2e

brucemacd/new_runner_qwen2

pdevine/convert-cohere2

brucemacd/convert-cli

parth/log-probs

mxyng/next-mlx

mxyng/cmd-history

parth/templating

parth/tokenize-detokenize

brucemacd/check-key-register

bmizerany/grammar

jmorganca/vendor-081b29bd

mxyng/func-checks

jmorganca/fix-null-format

parth/fix-default-to-warn-json

jmorganca/qwen2vl

jmorganca/no-concat

parth/cmd-cleanup-SO

brucemacd/check-key-register-structured-err

parth/openai-stream-usage

parth/fix-referencing-so

stream-tools-stop

jmorganca/degin-1

brucemacd/install-path-clean

brucemacd/push-name-validation

brucemacd/browser-key-register

jmorganca/openai-fix-first-message

jmorganca/fix-proxy

jessegross/sample

parth/disallow-streaming-tools

dhiltgen/remove_submodule

jmorganca/ga

jmorganca/mllama

pdevine/newlines

pdevine/geems-2b

jmorganca/llama-bump

mxyng/modelname-7

mxyng/gin-slog

mxyng/modelname-6

jyan/convert-prog

jyan/quant5

paligemma-support

pdevine/import-docs

jmorganca/openai-context

jyan/paligemma

jyan/p2

jyan/palitest

bmizerany/embedspeedup

jmorganca/llama-vit

brucemacd/allow-ollama

royh/ep-methods

royh/whisper

mxyng/api-models

mxyng/fix-memory

jyan/q4_4/8

jyan/ollama-v

royh/stream-tools

roy-embed-parallel

bmizerany/hrm

revert-5963-revert-5924-mxyng/llama3.1-rope

royh/embed-viz

jyan/local2

jyan/auth

jyan/local

jyan/parse-temp

jmorganca/template-mistral

jyan/reord-g

royh-openai-suffixdocs

royh-imgembed

royh-embed-parallel

jyan/quant4

royh-precision

jyan/progress

pdevine/fix-template

jyan/quant3

pdevine/ggla

mxyng/update-registry-domain

jmorganca/ggml-static

mxyng/create-context

jyan/v0.146

mxyng/layers-from-files

build_dist

bmizerany/noseek

royh-ls

royh-name

timeout

mxyng/server-timestamp

bmizerany/nosillyggufslurps

royh-params

jmorganca/llama-cpp-7c26775

royh-openai-delete

royh-show-rigid

jmorganca/enable-fa

jmorganca/no-error-template

jyan/format

royh-testdelete

bmizerany/fastverify

language_support

pdevine/ps-glitches

brucemacd/tokenize

bruce/iq-quants

bmizerany/filepathwithcoloninhost

mxyng/split-bin

bmizerany/client-registry

jmorganca/if-none-match

native

jmorganca/native

jmorganca/batch-embeddings

jmorganca/initcmake

jmorganca/mm

pdevine/showggmlinfo

modenameenforcealphanum

bmizerany/modenameenforcealphanum

jmorganca/done-reason

jmorganca/llama-cpp-8960fe8

ollama.com

bmizerany/filepathnobuild

bmizerany/types/model/defaultfix

rmdisplaylong

nogogen

bmizerany/x

modelfile-readme

bmizerany/replacecolon

jmorganca/limit

jmorganca/execstack

jmorganca/replace-assets

mxyng/tune-concurrency

jmorganca/testing

whitespace-detection

jmorganca/options

upgrade-all

scratch

cuda-search

mattw/airenamer

mattw/allmodelsonhuggingface

mattw/quantcontext

mattw/whatneedstorun

brucemacd/llama-mem-calc

mattw/faq-context

mattw/communitylinks

mattw/noprune

mattw/python-functioncalling

rename

mxyng/install

pulse

remove-first

editor

mattw/selfqueryingretrieval

cgo

mattw/howtoquant

api

matt/streamingapi

format-config

mxyng/extra-args

shell

update-nous-hermes

cp-model

upload-progress

fix-unknown-model

fix-model-names

delete-fix

insecure-registry

ls

deletemodels

progressbar

readme-updates

license-layers

skip-list

list-models

modelpath

matt/examplemodelfiles

distribution

go-opts

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ollama#34938