[GH-ISSUE #15359] Bonsai 8B Support #9826

Open
opened 2026-04-12 22:41:47 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @yutokun on GitHub (Apr 6, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15359

Please support Bonsai 8B, the new 1-bit LLM. https://ollama.com/digitsflow/bonsai-8b exists, but the following error is displayed.

Error: 500 Internal Server Error: model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details

Here's a log.

goroutine 37 gp=0x140003068c0 m=nil [IO wait]:
runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x104f37060?)
        /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400030fd80 sp=0x1400030fd60 pc=0x104f132c8
runtime.netpollblock(0x0?, 0x0?, 0x0?)
        /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:575 +0x158 fp=0x1400030fdc0 sp=0x1400030fd80 pc=0x104ed8d28
internal/poll.runtime_pollWait(0x1341355f8, 0x72)
        /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:351 +0xa0 fp=0x1400030fdf0 sp=0x1400030fdc0 pc=0x104f12480
internal/poll.(*pollDesc).wait(0x14000693080?, 0x14000610041?, 0x0)
        /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x1400030fe20 sp=0x1400030fdf0 pc=0x104f93418
internal/poll.(*pollDesc).waitRead(...)
        /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0x14000693080, {0x14000610041, 0x1, 0x1})
        /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_unix.go:165 +0x1fc fp=0x1400030fec0 sp=0x1400030fe20 pc=0x104f946cc
net.(*netFD).Read(0x14000693080, {0x14000610041?, 0x0?, 0x0?})
        /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/fd_posix.go:55 +0x28 fp=0x1400030ff10 sp=0x1400030fec0 pc=0x105006528
net.(*conn).Read(0x1400012aa60, {0x14000610041?, 0x0?, 0x0?})
        /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/net.go:194 +0x34 fp=0x1400030ff60 sp=0x1400030ff10 pc=0x1050133f4
net/http.(*connReader).backgroundRead(0x14000610030)
        /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:690 +0x40 fp=0x1400030ffb0 sp=0x1400030ff60 pc=0x1051d4370
net/http.(*connReader).startBackgroundRead.gowrap2()
        /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:686 +0x28 fp=0x1400030ffd0 sp=0x1400030ffb0 pc=0x1051d4258
runtime.goexit({})
        /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400030ffd0 sp=0x1400030ffd0 pc=0x104f1b844
created by net/http.(*connReader).startBackgroundRead in goroutine 9
        /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:686 +0xc4

r0      0x0
r1      0x0
r2      0x0
r3      0x0
r4      0x182ed18b7
r5      0x16bf6dce0
r6      0x38
r7      0x0
r8      0xd5628037e15a24be
r9      0xd56280368aacd4be
r10     0x2
r11     0xfffffffd
r12     0x0
r13     0x0
r14     0x0
r15     0x0
r16     0x148
r17     0x1f0409f20
r18     0x0
r19     0x6
r20     0x1c03
r21     0x16bf6f0e0
r22     0x0
r23     0x0
r24     0x0
r25     0x14000067c08
r26     0x1066bf288
r27     0x818
r28     0x14000002fc0
r29     0x16bf6e5d0
lr      0x182fc78d8
sp      0x16bf6e5b0
pc      0x182f8c5e8
fault   0x182f8c5e8
time=2026-04-06T16:55:08.018+09:00 level=ERROR source=server.go:1207 msg="do load request" error="Post \"http://127.0.0.1:58008/load\": EOF"
time=2026-04-06T16:55:08.018+09:00 level=ERROR source=server.go:304 msg="llama runner terminated" error="exit status 2"
time=2026-04-06T16:55:08.018+09:00 level=ERROR source=server.go:1207 msg="do load request" error="Post \"http://127.0.0.1:58008/load\": dial tcp 127.0.0.1:58008: connect: connection refused"
time=2026-04-06T16:55:08.018+09:00 level=INFO source=sched.go:511 msg="Load failed" model=/Users/yuto/.ollama/models/blobs/sha256-ead25897bc034fa52569d0c6d054ce38216f95db09900c8add8f6bbfb370cff1 error="model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details"
Originally created by @yutokun on GitHub (Apr 6, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15359 Please support Bonsai 8B, the new 1-bit LLM. https://ollama.com/digitsflow/bonsai-8b exists, but the following error is displayed. ```sh Error: 500 Internal Server Error: model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details ``` Here's a log. ``` goroutine 37 gp=0x140003068c0 m=nil [IO wait]: runtime.gopark(0xffffffffffffffff?, 0xffffffffffffffff?, 0x23?, 0x0?, 0x104f37060?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/proc.go:435 +0xc8 fp=0x1400030fd80 sp=0x1400030fd60 pc=0x104f132c8 runtime.netpollblock(0x0?, 0x0?, 0x0?) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:575 +0x158 fp=0x1400030fdc0 sp=0x1400030fd80 pc=0x104ed8d28 internal/poll.runtime_pollWait(0x1341355f8, 0x72) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/netpoll.go:351 +0xa0 fp=0x1400030fdf0 sp=0x1400030fdc0 pc=0x104f12480 internal/poll.(*pollDesc).wait(0x14000693080?, 0x14000610041?, 0x0) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:84 +0x28 fp=0x1400030fe20 sp=0x1400030fdf0 pc=0x104f93418 internal/poll.(*pollDesc).waitRead(...) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Read(0x14000693080, {0x14000610041, 0x1, 0x1}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/internal/poll/fd_unix.go:165 +0x1fc fp=0x1400030fec0 sp=0x1400030fe20 pc=0x104f946cc net.(*netFD).Read(0x14000693080, {0x14000610041?, 0x0?, 0x0?}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/fd_posix.go:55 +0x28 fp=0x1400030ff10 sp=0x1400030fec0 pc=0x105006528 net.(*conn).Read(0x1400012aa60, {0x14000610041?, 0x0?, 0x0?}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/net.go:194 +0x34 fp=0x1400030ff60 sp=0x1400030ff10 pc=0x1050133f4 net/http.(*connReader).backgroundRead(0x14000610030) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:690 +0x40 fp=0x1400030ffb0 sp=0x1400030ff60 pc=0x1051d4370 net/http.(*connReader).startBackgroundRead.gowrap2() /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:686 +0x28 fp=0x1400030ffd0 sp=0x1400030ffb0 pc=0x1051d4258 runtime.goexit({}) /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/runtime/asm_arm64.s:1223 +0x4 fp=0x1400030ffd0 sp=0x1400030ffd0 pc=0x104f1b844 created by net/http.(*connReader).startBackgroundRead in goroutine 9 /Users/runner/hostedtoolcache/go/1.24.1/arm64/src/net/http/server.go:686 +0xc4 r0 0x0 r1 0x0 r2 0x0 r3 0x0 r4 0x182ed18b7 r5 0x16bf6dce0 r6 0x38 r7 0x0 r8 0xd5628037e15a24be r9 0xd56280368aacd4be r10 0x2 r11 0xfffffffd r12 0x0 r13 0x0 r14 0x0 r15 0x0 r16 0x148 r17 0x1f0409f20 r18 0x0 r19 0x6 r20 0x1c03 r21 0x16bf6f0e0 r22 0x0 r23 0x0 r24 0x0 r25 0x14000067c08 r26 0x1066bf288 r27 0x818 r28 0x14000002fc0 r29 0x16bf6e5d0 lr 0x182fc78d8 sp 0x16bf6e5b0 pc 0x182f8c5e8 fault 0x182f8c5e8 time=2026-04-06T16:55:08.018+09:00 level=ERROR source=server.go:1207 msg="do load request" error="Post \"http://127.0.0.1:58008/load\": EOF" time=2026-04-06T16:55:08.018+09:00 level=ERROR source=server.go:304 msg="llama runner terminated" error="exit status 2" time=2026-04-06T16:55:08.018+09:00 level=ERROR source=server.go:1207 msg="do load request" error="Post \"http://127.0.0.1:58008/load\": dial tcp 127.0.0.1:58008: connect: connection refused" time=2026-04-06T16:55:08.018+09:00 level=INFO source=sched.go:511 msg="Load failed" model=/Users/yuto/.ollama/models/blobs/sha256-ead25897bc034fa52569d0c6d054ce38216f95db09900c8add8f6bbfb370cff1 error="model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details" ```
GiteaMirror added the model label 2026-04-12 22:41:47 -05:00
Author
Owner

@rick-github commented on GitHub (Apr 6, 2026):

https://github.com/ggml-org/llama.cpp/pull/21273

<!-- gh-comment-id:4191305235 --> @rick-github commented on GitHub (Apr 6, 2026): https://github.com/ggml-org/llama.cpp/pull/21273
Author
Owner

@marcelocecin commented on GitHub (Apr 6, 2026):

Hi team,
I'm encountering a SIGABRT crash when trying to run the new 1-bit quantized models, specifically the Bonsai-8B-GGUF (Q1_0_g128).
The model pulls correctly, but fails during the load phase with an internal server error (500). Checking the server logs, it seems the underlying ggml version in Ollama does not yet recognize this specific tensor type, triggering a failed assertion.
Error Log:

ggml.c:1676: GGML_ASSERT(type >= 0 && type < GGML_TYPE_COUNT) failed
SIGABRT: abort
signal arrived during cgo execution
time=... level=ERROR source=server.go:304 msg="llama runner terminated" error="exit status 2"

Model Details:

  • Model: hf.co/prism-ml/Bonsai-8B-gguf
  • Quantization: Q1_0_g128 (1-bit)
  • Environment: Docker (ollama/ollama:latest)
  • Hardware: CPU with AVX512 support (as seen in logs)

It appears that llama.cpp recently added support for these kernels, but Ollama's internal GGML_TYPE_COUNT might still be outdated. Could you please look into updating the backend to support these ultra-low bit quantizations?

<!-- gh-comment-id:4192543689 --> @marcelocecin commented on GitHub (Apr 6, 2026): Hi team, I'm encountering a `SIGABRT` crash when trying to run the new 1-bit quantized models, specifically the Bonsai-8B-GGUF (Q1_0_g128). The model pulls correctly, but fails during the load phase with an internal server error (500). Checking the server logs, it seems the underlying `ggml` version in Ollama does not yet recognize this specific tensor type, triggering a failed assertion. Error Log: ``` ggml.c:1676: GGML_ASSERT(type >= 0 && type < GGML_TYPE_COUNT) failed SIGABRT: abort signal arrived during cgo execution time=... level=ERROR source=server.go:304 msg="llama runner terminated" error="exit status 2" ``` Model Details: * Model: `hf.co/prism-ml/Bonsai-8B-gguf` * Quantization: `Q1_0_g128` (1-bit) * Environment: Docker (ollama/ollama:latest) * Hardware: CPU with AVX512 support (as seen in logs) It appears that `llama.cpp` recently added support for these kernels, but Ollama's internal `GGML_TYPE_COUNT` might still be outdated. Could you please look into updating the backend to support these ultra-low bit quantizations?
Author
Owner

@rick-github commented on GitHub (Apr 6, 2026):

Next vendor sync.

<!-- gh-comment-id:4192944161 --> @rick-github commented on GitHub (Apr 6, 2026): Next vendor sync.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9826