[GH-ISSUE #15341] panic: mlx.Unpin: negative pin count on array "CONTIGUOUS" #71874

Closed
opened 2026-05-05 02:47:51 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @chigkim on GitHub (Apr 5, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15341

What is the issue?

I tried to import a finetuned Qwen3.5-35b safetensor to mlx mxfp8, but I got the error.
Importing to q8_0 worked fine.

Relevant log output

Screen 1:
GOMAXPROCS=1 ollama serve

Screen 2:
GOMAXPROCS=1 ollama create qwen3.5:35b-a3b-heretic-mxfp8 --experimental -f q35-35b.modelfile -q mxfp8
importing safetensors model
importing model-00001-of-00002.safetensors (22199 tensors, quantizing to mxfp8)
importing model-00002-of-00002.safetensors (9467 tensors, quantizing to mxfp8)
packing language_model.model.layers.0.mlp.experts (768 tensors)
panic: mlx.Unpin: negative pin count on array "CONTIGUOUS"

goroutine 1 [running, locked to thread]:
github.com/ollama/ollama/x/mlxrunner/mlx.Unpin({0x1400165e008?, 0x10500a404?, 0x140006e1ce0?})
        /Users/runner/work/ollama/ollama/x/mlxrunner/mlx/array.go:141 +0xa8
github.com/ollama/ollama/x/create/client.stackAndQuantizeExpertGroup.func1()
        /Users/runner/work/ollama/ollama/x/create/client/quantize.go:342 +0x2c
github.com/ollama/ollama/x/create/client.stackAndQuantizeExpertGroup({0x140004b8ab0, 0x29?}, 0x140030b8ba0, {0x1054c92b0, 0x5})
        /Users/runner/work/ollama/ollama/x/create/client/quantize.go:435 +0x984
github.com/ollama/ollama/x/create/client.quantizePackedGroup({0x140004b8ab0, 0x29}, {0x14000714000, 0x300, 0x14000044958?})
        /Users/runner/work/ollama/ollama/x/create/client/quantize.go:178 +0xd8
github.com/ollama/ollama/x/create/client.CreateModel.newPackedTensorLayerCreator.func5({0x140004b8ab0, 0x29}, {0x14000714000?, 0x2?, 0x3a2})
        /Users/runner/work/ollama/ollama/x/create/client/create.go:293 +0xc8
github.com/ollama/ollama/x/create.CreateSafetensorsModel({0x16bba785e, 0x1d}, {0x14000396340, 0x32}, {0x16bba78a3, 0x5}, 0x105a7c378, 0x105a7c390, 0x140000457c8, 0x14000045840, ...)
        /Users/runner/work/ollama/ollama/x/create/create.go:849 +0x950
github.com/ollama/ollama/x/create/client.CreateModel({{0x16bba785e, 0x1d}, {0x14000396340, 0x32}, {0x16bba78a3, 0x5}, 0x14000202000}, 0x1400046de40)
        /Users/runner/work/ollama/ollama/x/create/client/create.go:161 +0x2d8
github.com/ollama/ollama/cmd.CreateHandler(0x1400046f208, {0x14000202660, 0x1, 0x6?})
        /Users/runner/work/ollama/ollama/cmd/cmd.go:206 +0x874
github.com/spf13/cobra.(*Command).execute(0x1400046f208, {0x14000202600, 0x6, 0x6})
        /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:940 +0x648
github.com/spf13/cobra.(*Command).ExecuteC(0x1400046ef08)
        /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068 +0x320
github.com/spf13/cobra.(*Command).Execute(...)
        /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
        /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
        /Users/runner/work/ollama/ollama/main.go:12 +0x54

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.20.2

Originally created by @chigkim on GitHub (Apr 5, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15341 ### What is the issue? I tried to import a finetuned Qwen3.5-35b safetensor to mlx mxfp8, but I got the error. Importing to q8_0 worked fine. ### Relevant log output ```shell Screen 1: GOMAXPROCS=1 ollama serve Screen 2: GOMAXPROCS=1 ollama create qwen3.5:35b-a3b-heretic-mxfp8 --experimental -f q35-35b.modelfile -q mxfp8 importing safetensors model importing model-00001-of-00002.safetensors (22199 tensors, quantizing to mxfp8) importing model-00002-of-00002.safetensors (9467 tensors, quantizing to mxfp8) packing language_model.model.layers.0.mlp.experts (768 tensors) panic: mlx.Unpin: negative pin count on array "CONTIGUOUS" goroutine 1 [running, locked to thread]: github.com/ollama/ollama/x/mlxrunner/mlx.Unpin({0x1400165e008?, 0x10500a404?, 0x140006e1ce0?}) /Users/runner/work/ollama/ollama/x/mlxrunner/mlx/array.go:141 +0xa8 github.com/ollama/ollama/x/create/client.stackAndQuantizeExpertGroup.func1() /Users/runner/work/ollama/ollama/x/create/client/quantize.go:342 +0x2c github.com/ollama/ollama/x/create/client.stackAndQuantizeExpertGroup({0x140004b8ab0, 0x29?}, 0x140030b8ba0, {0x1054c92b0, 0x5}) /Users/runner/work/ollama/ollama/x/create/client/quantize.go:435 +0x984 github.com/ollama/ollama/x/create/client.quantizePackedGroup({0x140004b8ab0, 0x29}, {0x14000714000, 0x300, 0x14000044958?}) /Users/runner/work/ollama/ollama/x/create/client/quantize.go:178 +0xd8 github.com/ollama/ollama/x/create/client.CreateModel.newPackedTensorLayerCreator.func5({0x140004b8ab0, 0x29}, {0x14000714000?, 0x2?, 0x3a2}) /Users/runner/work/ollama/ollama/x/create/client/create.go:293 +0xc8 github.com/ollama/ollama/x/create.CreateSafetensorsModel({0x16bba785e, 0x1d}, {0x14000396340, 0x32}, {0x16bba78a3, 0x5}, 0x105a7c378, 0x105a7c390, 0x140000457c8, 0x14000045840, ...) /Users/runner/work/ollama/ollama/x/create/create.go:849 +0x950 github.com/ollama/ollama/x/create/client.CreateModel({{0x16bba785e, 0x1d}, {0x14000396340, 0x32}, {0x16bba78a3, 0x5}, 0x14000202000}, 0x1400046de40) /Users/runner/work/ollama/ollama/x/create/client/create.go:161 +0x2d8 github.com/ollama/ollama/cmd.CreateHandler(0x1400046f208, {0x14000202660, 0x1, 0x6?}) /Users/runner/work/ollama/ollama/cmd/cmd.go:206 +0x874 github.com/spf13/cobra.(*Command).execute(0x1400046f208, {0x14000202600, 0x6, 0x6}) /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:940 +0x648 github.com/spf13/cobra.(*Command).ExecuteC(0x1400046ef08) /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068 +0x320 github.com/spf13/cobra.(*Command).Execute(...) /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) /Users/runner/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:985 main.main() /Users/runner/work/ollama/ollama/main.go:12 +0x54 ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.20.2
GiteaMirror added the mlxmacosbug labels 2026-05-05 02:47:52 -05:00
Author
Owner

@chigkim commented on GitHub (Apr 7, 2026):

I was able to import the official model from qwen repo. It must be a problem with the finetuned I'm trying to import.
It's weird that it worked for q8_0, not mxfp8 though.
Closing it for now.

<!-- gh-comment-id:4195867084 --> @chigkim commented on GitHub (Apr 7, 2026): I was able to import the official model from qwen repo. It must be a problem with the finetuned I'm trying to import. It's weird that it worked for q8_0, not mxfp8 though. Closing it for now.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#71874