[GH-ISSUE #13845] x/flux2-klein:4b Error: insufficient memory for image generation: need 21 GB #55579

Open
opened 2026-04-29 09:26:58 -05:00 by GiteaMirror · 10 comments
Owner

Originally created by @dmitry-sukhoruchkin on GitHub (Jan 22, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/13845

What is the issue?

ollama run x/flux2-klein:4b
pulling manifest
pulling model: 100% ▕█████████████████████████▏ 5.7 GB
writing manifest
success
Error: failed to load model: 500 Internal Server Error: image runner failed: Error: insufficient memory for image generation: need 21 GB, have 11 GB (exit: exit status 1)

Relevant log output


OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @dmitry-sukhoruchkin on GitHub (Jan 22, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/13845 ### What is the issue? ollama run x/flux2-klein:4b pulling manifest pulling model: 100% ▕█████████████████████████▏ 5.7 GB writing manifest success Error: failed to load model: 500 Internal Server Error: image runner failed: Error: insufficient memory for image generation: need 21 GB, have 11 GB (exit: exit status 1) ### Relevant log output ```shell ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the macosbug labels 2026-04-29 09:26:58 -05:00
Author
Owner

@mchiang0610 commented on GitHub (Jan 22, 2026):

hey @dmitry-sukhoruchkin so sorry about this. Looking into this.

<!-- gh-comment-id:3786181989 --> @mchiang0610 commented on GitHub (Jan 22, 2026): hey @dmitry-sukhoruchkin so sorry about this. Looking into this.
Author
Owner

@SJTheGreat06 commented on GitHub (Jan 22, 2026):

I'm facing the same issue with my M4 MacBook Air. I'm looking forward to running image generation on my machine once it's been fixed

<!-- gh-comment-id:3786769010 --> @SJTheGreat06 commented on GitHub (Jan 22, 2026): I'm facing the same issue with my M4 MacBook Air. I'm looking forward to running image generation on my machine once it's been fixed
Author
Owner

@slackdesk commented on GitHub (Jan 22, 2026):

I got the same error on Slackware Linux as others are describing for mac "need 21 GB"...

<!-- gh-comment-id:3787381088 --> @slackdesk commented on GitHub (Jan 22, 2026): I got the same error on Slackware Linux as others are describing for mac "need 21 GB"...
Author
Owner

@duatom commented on GitHub (Jan 23, 2026):

same!

<!-- gh-comment-id:3789532088 --> @duatom commented on GitHub (Jan 23, 2026): same!
Author
Owner

@FuFusi commented on GitHub (Jan 23, 2026):

same here

<!-- gh-comment-id:3789641922 --> @FuFusi commented on GitHub (Jan 23, 2026): same here
Author
Owner

@rick-github commented on GitHub (Jan 24, 2026):

Should be fixed in 0.15.0.

<!-- gh-comment-id:3794368123 --> @rick-github commented on GitHub (Jan 24, 2026): Should be fixed in [0.15.0](https://github.com/ollama/ollama/releases/tag/v0.15.0).
Author
Owner

@SJTheGreat06 commented on GitHub (Jan 24, 2026):

Should be fixed in 0.15.0.

I can confirm that it's fixed! Thank you to the team for their swift and prompt action. Thank you very much!

<!-- gh-comment-id:3794537994 --> @SJTheGreat06 commented on GitHub (Jan 24, 2026): > Should be fixed in [0.15.0](https://github.com/ollama/ollama/releases/tag/v0.15.0). I can confirm that it's fixed! Thank you to the team for their swift and prompt action. Thank you very much!
Author
Owner

@mundotv789123 commented on GitHub (Jan 24, 2026):

I'm using a 5060 TI 16gb vram.
After updated to 0.15.0 a recived this error.

Error: 500 Internal Server Error: Post "http://127.0.0.1:42099/completion": EOF

logs:
time=2026-01-24T10:40:06.517-03:00 level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=309.917819ms
time=2026-01-24T10:40:06.519-03:00 level=DEBUG source=sched.go:195 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1
time=2026-01-24T10:40:06.519-03:00 level=DEBUG source=server.go:105 msg="mlx subprocess library path" LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/mlx_cuda_v13
time=2026-01-24T10:40:06.527-03:00 level=INFO source=server.go:143 msg="starting image runner subprocess" exe=/usr/local/bin/ollama model=x/flux2-klein:4b-fp8 port=44501
time=2026-01-24T10:40:06.902-03:00 level=WARN source=server.go:136 msg=image-runner msg="2026/01/24 10:40:06 runner.go:87: INFO MLX library initialized"
time=2026-01-24T10:40:06.903-03:00 level=WARN source=server.go:136 msg=image-runner msg="2026/01/24 10:40:06 runner.go:88: INFO starting image runner model=x/flux2-klein:4b-fp8 port=44501"
time=2026-01-24T10:40:06.907-03:00 level=WARN source=server.go:136 msg=image-runner msg="2026/01/24 10:40:06 runner.go:92: INFO detected model type type=Flux2KleinPipeline"
time=2026-01-24T10:40:06.907-03:00 level=INFO source=server.go:129 msg=image-runner msg="Loading FLUX.2 Klein model from manifest: x/flux2-klein:4b-fp8..."
time=2026-01-24T10:40:07.327-03:00 level=INFO source=server.go:129 msg=image-runner msg=" Loading tokenizer... ✓"
time=2026-01-24T10:40:12.640-03:00 level=INFO source=server.go:129 msg=image-runner msg=" Loading text encoder... ✓"
time=2026-01-24T10:40:15.888-03:00 level=INFO source=server.go:129 msg=image-runner msg=" Loading transformer... ✓"
time=2026-01-24T10:40:16.100-03:00 level=INFO source=server.go:129 msg=image-runner msg=" Loading VAE... ✓"
time=2026-01-24T10:40:16.101-03:00 level=INFO source=server.go:129 msg=image-runner msg=" Evaluating weights... ✓"
time=2026-01-24T10:40:16.101-03:00 level=INFO source=server.go:129 msg=image-runner msg=" Loaded in 9.19s (8.8 GB VRAM)"
time=2026-01-24T10:40:16.101-03:00 level=WARN source=server.go:136 msg=image-runner msg="2026/01/24 10:40:16 runner.go:139: INFO image runner listening addr=127.0.0.1:44501"
time=2026-01-24T10:40:16.128-03:00 level=INFO source=server.go:214 msg="image runner is ready" port=44501
time=2026-01-24T10:40:16.129-03:00 level=INFO source=server.go:129 msg=image-runner msg=" Output: 1024x1024"
time=2026-01-24T10:40:16.144-03:00 level=INFO source=server.go:129 msg=image-runner msg=" Encoding prompt... MLX error: [matmul] Last dimension of first input with shape (1,512,2560) must match second to last dimension of second input with shape (768,18432). at /go/src/github.com/ollama/ollama/build/_deps/mlx-c-src/mlx/c/ops.cpp:1943"
[GIN] 2026/01/24 - 10:40:16 | 500 | 10.107061199s | 127.0.0.1 | POST "/api/generate"
time=2026-01-24T10:40:16.163-03:00 level=DEBUG source=sched.go:394 msg="context for request finished" runner.name=registry.ollama.ai/x/flux2-klein:4b-fp8 runner.size="8.8 GiB" runner.vram="8.8 GiB" runner.parallel=0 runner.pid=0 runner.model="" runner.num_ctx=4096
time=2026-01-24T10:40:16.163-03:00 level=DEBUG source=sched.go:299 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/x/flux2-klein:4b-fp8 runner.size="8.8 GiB" runner.vram="8.8 GiB" runner.parallel=0 runner.pid=0 runner.model="" runner.num_ctx=4096 duration=5m0s
time=2026-01-24T10:40:16.163-03:00 level=DEBUG source=sched.go:317 msg="after processing request finished event" runner.name=registry.ollama.ai/x/flux2-klein:4b-fp8 runner.size="8.8 GiB" runner.vram="8.8 GiB" runner.parallel=0 runner.pid=0 runner.model="" runner.num_ctx=4096 refCount=0

<!-- gh-comment-id:3794644527 --> @mundotv789123 commented on GitHub (Jan 24, 2026): I'm using a 5060 TI 16gb vram. After updated to 0.15.0 a recived this error. Error: 500 Internal Server Error: Post "http://127.0.0.1:42099/completion": EOF logs: time=2026-01-24T10:40:06.517-03:00 level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=309.917819ms time=2026-01-24T10:40:06.519-03:00 level=DEBUG source=sched.go:195 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1 time=2026-01-24T10:40:06.519-03:00 level=DEBUG source=server.go:105 msg="mlx subprocess library path" LD_LIBRARY_PATH=/usr/local/lib/ollama:/usr/local/lib/ollama/mlx_cuda_v13 time=2026-01-24T10:40:06.527-03:00 level=INFO source=server.go:143 msg="starting image runner subprocess" exe=/usr/local/bin/ollama model=x/flux2-klein:4b-fp8 port=44501 time=2026-01-24T10:40:06.902-03:00 level=WARN source=server.go:136 msg=image-runner msg="2026/01/24 10:40:06 runner.go:87: INFO MLX library initialized" time=2026-01-24T10:40:06.903-03:00 level=WARN source=server.go:136 msg=image-runner msg="2026/01/24 10:40:06 runner.go:88: INFO starting image runner model=x/flux2-klein:4b-fp8 port=44501" time=2026-01-24T10:40:06.907-03:00 level=WARN source=server.go:136 msg=image-runner msg="2026/01/24 10:40:06 runner.go:92: INFO detected model type type=Flux2KleinPipeline" time=2026-01-24T10:40:06.907-03:00 level=INFO source=server.go:129 msg=image-runner msg="Loading FLUX.2 Klein model from manifest: x/flux2-klein:4b-fp8..." time=2026-01-24T10:40:07.327-03:00 level=INFO source=server.go:129 msg=image-runner msg=" Loading tokenizer... ✓" time=2026-01-24T10:40:12.640-03:00 level=INFO source=server.go:129 msg=image-runner msg=" Loading text encoder... ✓" time=2026-01-24T10:40:15.888-03:00 level=INFO source=server.go:129 msg=image-runner msg=" Loading transformer... ✓" time=2026-01-24T10:40:16.100-03:00 level=INFO source=server.go:129 msg=image-runner msg=" Loading VAE... ✓" time=2026-01-24T10:40:16.101-03:00 level=INFO source=server.go:129 msg=image-runner msg=" Evaluating weights... ✓" time=2026-01-24T10:40:16.101-03:00 level=INFO source=server.go:129 msg=image-runner msg=" Loaded in 9.19s (8.8 GB VRAM)" time=2026-01-24T10:40:16.101-03:00 level=WARN source=server.go:136 msg=image-runner msg="2026/01/24 10:40:16 runner.go:139: INFO image runner listening addr=127.0.0.1:44501" time=2026-01-24T10:40:16.128-03:00 level=INFO source=server.go:214 msg="image runner is ready" port=44501 time=2026-01-24T10:40:16.129-03:00 level=INFO source=server.go:129 msg=image-runner msg=" Output: 1024x1024" time=2026-01-24T10:40:16.144-03:00 level=INFO source=server.go:129 msg=image-runner msg=" Encoding prompt... MLX error: [matmul] Last dimension of first input with shape (1,512,2560) must match second to last dimension of second input with shape (768,18432). at /go/src/github.com/ollama/ollama/build/_deps/mlx-c-src/mlx/c/ops.cpp:1943" [GIN] 2026/01/24 - 10:40:16 | 500 | 10.107061199s | 127.0.0.1 | POST "/api/generate" time=2026-01-24T10:40:16.163-03:00 level=DEBUG source=sched.go:394 msg="context for request finished" runner.name=registry.ollama.ai/x/flux2-klein:4b-fp8 runner.size="8.8 GiB" runner.vram="8.8 GiB" runner.parallel=0 runner.pid=0 runner.model="" runner.num_ctx=4096 time=2026-01-24T10:40:16.163-03:00 level=DEBUG source=sched.go:299 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/x/flux2-klein:4b-fp8 runner.size="8.8 GiB" runner.vram="8.8 GiB" runner.parallel=0 runner.pid=0 runner.model="" runner.num_ctx=4096 duration=5m0s time=2026-01-24T10:40:16.163-03:00 level=DEBUG source=sched.go:317 msg="after processing request finished event" runner.name=registry.ollama.ai/x/flux2-klein:4b-fp8 runner.size="8.8 GiB" runner.vram="8.8 GiB" runner.parallel=0 runner.pid=0 runner.model="" runner.num_ctx=4096 refCount=0
Author
Owner

@Acharab commented on GitHub (Jan 24, 2026):

Got the same error Error: 500 Internal Server Error: Post "http://127.0.0.1:34967/completion": EOF

<!-- gh-comment-id:3794803589 --> @Acharab commented on GitHub (Jan 24, 2026): Got the same error Error: 500 Internal Server Error: Post "http://127.0.0.1:34967/completion": EOF
Author
Owner

@SJTheGreat06 commented on GitHub (Jan 24, 2026):

I'm using a 5060 TI 16gb vram.
After updated to 0.15.0 a recived this error.

As of now, image generation is only supported on MacOS, not Windows or Linux

<!-- gh-comment-id:3795202724 --> @SJTheGreat06 commented on GitHub (Jan 24, 2026): > I'm using a 5060 TI 16gb vram. > After updated to 0.15.0 a recived this error. As of now, image generation is only supported on MacOS, not Windows or Linux
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#55579