[GH-ISSUE #14613] Regression in running flux2-klein on iMac since v0.15.5 #55982

Open
opened 2026-04-29 10:06:19 -05:00 by GiteaMirror · 5 comments
Owner

Originally created by @drews54 on GitHub (Mar 4, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14613

What is the issue?

I am able to successfully run x/flux2-klein:9b-bf16 on my iMac M4 with 32 GB of memory when using ollama version 0.15.4 (see server-1.log). However, starting with ollama version 0.15.5 onwards (recently tested on 0.17.5 with the same result) it refuses to start due to "insufficient memory" (see server.log).

Relevant log output

$ ollama run --verbose x/flux2-klein:9b-bf16
Error: 500 Internal Server Error: mlx runner failed: Error: failed to create server: failed to load image model: insufficient memory for image generation: need 32 GB, have 30 GB (exit: exit status 1)

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.15.4

Originally created by @drews54 on GitHub (Mar 4, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14613 ### What is the issue? I am able to successfully run `x/flux2-klein:9b-bf16` on my iMac M4 with 32 GB of memory when using ollama version 0.15.4 (see [server-1.log](https://github.com/user-attachments/files/25737299/server-1.log)). However, starting with ollama version 0.15.5 onwards (recently tested on 0.17.5 with the same result) it refuses to start due to "insufficient memory" (see [server.log](https://github.com/user-attachments/files/25737298/server.log)). ### Relevant log output ```shell $ ollama run --verbose x/flux2-klein:9b-bf16 Error: 500 Internal Server Error: mlx runner failed: Error: failed to create server: failed to load image model: insufficient memory for image generation: need 32 GB, have 30 GB (exit: exit status 1) ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.15.4
GiteaMirror added the bug label 2026-04-29 10:06:19 -05:00
Author
Owner

@drews54 commented on GitHub (Mar 12, 2026):

UPD: same on 0.17.7
However now I suspect that the problem might be within mlx version included with the app: versions included with ollama 0.15.4 were mlx 0.30.3 and mlx-c 0.4.1, the current ones being 0.31.1 and 0.5.0 respectively.
I would try filing an issue within mlx repository, but I’m afraid I’m not qualified enough to ascertain whether the root of the problem is actually within mlx or ollama itself. Any help would be greatly appreciated.

<!-- gh-comment-id:4047315211 --> @drews54 commented on GitHub (Mar 12, 2026): UPD: same on 0.17.7 However now I suspect that the problem might be within mlx version included with the app: versions included with ollama 0.15.4 were mlx 0.30.3 and mlx-c 0.4.1, the current ones being 0.31.1 and 0.5.0 respectively. I would try filing an issue within mlx repository, but I’m afraid I’m not qualified enough to ascertain whether the root of the problem is actually within mlx or ollama itself. Any help would be greatly appreciated.
Author
Owner

@noma4i commented on GitHub (Apr 9, 2026):

ln -s brew --prefix mlx-c/lib/libmlxc.dylib brew --prefix ollama/bin temp fix for mac

<!-- gh-comment-id:4211439756 --> @noma4i commented on GitHub (Apr 9, 2026): `ln -s `brew --prefix mlx-c`/lib/libmlxc.dylib `brew --prefix ollama`/bin` temp fix for mac
Author
Owner

@drews54 commented on GitHub (Apr 9, 2026):

@noma4i Not entirely sure what exactly that’s supposed to fix since I believe ollama uses homebrew's linked mlx-c library by default anyway, but I did try it nonetheless – no dice.

P.S. I should also mention that starting from version 0.19 the error message has changed to the following: Error: failed to load model: 500 Internal Server Error: model requires 32.3 GiB but only 24.5 GiB are available (after 512.0 MiB overhead)
Found an old issue that suggested setting sysctl iogpu.wired_limit_mb=32768, which bumped the 24.5 GiB number up to 31.5 GiB, but still not enough to run the model.

Any help is still appreciated.

<!-- gh-comment-id:4215017061 --> @drews54 commented on GitHub (Apr 9, 2026): @noma4i Not entirely sure what exactly that’s supposed to fix since I believe ollama uses homebrew's linked mlx-c library by default anyway, but I did try it nonetheless – no dice. P.S. I should also mention that starting from version 0.19 the error message has changed to the following: `Error: failed to load model: 500 Internal Server Error: model requires 32.3 GiB but only 24.5 GiB are available (after 512.0 MiB overhead)` Found an old issue that suggested setting `sysctl iogpu.wired_limit_mb=32768`, which bumped the 24.5 GiB number up to 31.5 GiB, but still not enough to run the model. Any help is still appreciated.
Author
Owner

@drews54 commented on GitHub (Apr 10, 2026):

So I managed to make ChatGPT Codex patch the main branch (based on commit 80d3744 – forked here using Codex)
As far as I understand, it has bypassed two levels of memory protection (first the one mentioned in the comment above, then the one mentioned in the initial issue at the top)
Image generation works properly, however image editing seems broken, though it seems like that is a separate issue ongoing since another version after 0.15
Once again, I'm not qualified enough to make a proper PR with these changes, nor do I think that they will be accepted at all anyway, so I'll keep porting the patch on my forked repo for as long as it works with upstream's HEAD.

<!-- gh-comment-id:4222124896 --> @drews54 commented on GitHub (Apr 10, 2026): So I managed to make ChatGPT Codex patch the main branch (based on commit 80d3744 – forked [here](https://github.com/drews54/ollama/tree/codex/add-envconfig-switch-for-image-generation) using Codex) As far as I understand, it has bypassed two levels of memory protection (first the one mentioned in the comment above, then the one mentioned in the initial issue at the top) Image generation works properly, however image editing seems broken, though it seems like that is a separate issue ongoing since another version after 0.15 Once again, I'm not qualified enough to make a proper PR with these changes, nor do I think that they will be accepted at all anyway, so I'll keep porting the patch on my forked repo for as long as it works with upstream's HEAD.
Author
Owner

@noma4i commented on GitHub (Apr 11, 2026):

@drews54 Oh sorry. I have missed that the issue is the RAM size. For me the issue was the lmmlxc is not working out of the box. Have to do the symlink.

<!-- gh-comment-id:4229300562 --> @noma4i commented on GitHub (Apr 11, 2026): @drews54 Oh sorry. I have missed that the issue is the RAM size. For me the issue was the lmmlxc is not working out of the box. Have to do the symlink.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#55982