[GH-ISSUE #15266] Qwen 35b nvfp4 mlx Infinity thinking loop #9764

Open
opened 2026-04-12 22:39:16 -05:00 by GiteaMirror · 0 comments
Owner

Originally created by @Urcherd on GitHub (Apr 3, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/15266

What is the issue?

Command: ollama run qwen3.5:35b-a3b-coding-nvfp4 --verbose <<< "Write history in 500 lines"
Result: Error: mlx runner failed: time=2026-04-03T12:31:06.328+03:00 level=INFO source=cache.go:126 msg="cache hit" total=18 matched=18 cached=17 left=1
Env: Apple M4 Max | 64GB | Tahoe 26.3 (25D125)

Relevant log output

Server Log:
[GIN] 2026/04/03 - 12:31:06 | 200 |      53.625µs |       127.0.0.1 | HEAD     "/"
[GIN] 2026/04/03 - 12:31:06 | 200 |   85.233083ms |       127.0.0.1 | POST     "/api/show"
time=2026-04-03T12:31:06.327+03:00 level=INFO source=server.go:183 msg=ServeHTTP method=GET path=/v1/status took=14.833µs status="200 OK"
time=2026-04-03T12:31:06.328+03:00 level=INFO source=cache.go:126 msg="cache hit" total=18 matched=18 cached=17 left=1
[GIN] 2026/04/03 - 12:41:06 | 200 |         10m0s |       127.0.0.1 | POST     "/api/generate"
time=2026-04-03T12:41:06.335+03:00 level=INFO source=server.go:183 msg=ServeHTTP method=POST path=/v1/completions took=10m0.004317333s status="200 OK"
time=2026-04-03T12:41:06.419+03:00 level=INFO source=pipeline.go:55 msg="peak memory" size="21.20 GiB"
time=2026-04-03T12:41:06.419+03:00 level=INFO source=runner.go:149 msg="Request terminated" error="context canceled"

Output Log:
 *(Wait, I'll write the history now).*
    *(Okay).*
    *(Wait, I'll number them).*
    *(Okay).*
    *(Wait, I'll start).*
    *(Okay).*

    *(Wait, I need to be careful not to produce a wall of text that looks b
bad).*
    *(I'll make it readable).*
    *(Okay).*

    *(Wait, I'll start the output).*
    *(Okay).*
    *(Wait, I'll write the lines).*
    *(Okay).*
    *(Wait, I'll make sure I reach 500).*
    *(Okay).*

    *(Wait, I'll write the lines now).*
    *(Okay).*

    *(Wait, I'll start).*
    *(Okay).*

    *(Wait, I'll write the lines).*
    *(Okay).*

And last 3000 lines is:
    *(Wait, I'll write the lines).*
    *(Okay).*

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.20.0

Originally created by @Urcherd on GitHub (Apr 3, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/15266 ### What is the issue? **Command:** ollama run qwen3.5:35b-a3b-coding-nvfp4 --verbose <<< "Write history in 500 lines" **Result:** Error: mlx runner failed: time=2026-04-03T12:31:06.328+03:00 level=INFO source=cache.go:126 msg="cache hit" total=18 matched=18 cached=17 left=1 Env: Apple M4 Max | 64GB | Tahoe 26.3 (25D125) ### Relevant log output ```shell Server Log: [GIN] 2026/04/03 - 12:31:06 | 200 | 53.625µs | 127.0.0.1 | HEAD "/" [GIN] 2026/04/03 - 12:31:06 | 200 | 85.233083ms | 127.0.0.1 | POST "/api/show" time=2026-04-03T12:31:06.327+03:00 level=INFO source=server.go:183 msg=ServeHTTP method=GET path=/v1/status took=14.833µs status="200 OK" time=2026-04-03T12:31:06.328+03:00 level=INFO source=cache.go:126 msg="cache hit" total=18 matched=18 cached=17 left=1 [GIN] 2026/04/03 - 12:41:06 | 200 | 10m0s | 127.0.0.1 | POST "/api/generate" time=2026-04-03T12:41:06.335+03:00 level=INFO source=server.go:183 msg=ServeHTTP method=POST path=/v1/completions took=10m0.004317333s status="200 OK" time=2026-04-03T12:41:06.419+03:00 level=INFO source=pipeline.go:55 msg="peak memory" size="21.20 GiB" time=2026-04-03T12:41:06.419+03:00 level=INFO source=runner.go:149 msg="Request terminated" error="context canceled" Output Log: *(Wait, I'll write the history now).* *(Okay).* *(Wait, I'll number them).* *(Okay).* *(Wait, I'll start).* *(Okay).* *(Wait, I need to be careful not to produce a wall of text that looks b bad).* *(I'll make it readable).* *(Okay).* *(Wait, I'll start the output).* *(Okay).* *(Wait, I'll write the lines).* *(Okay).* *(Wait, I'll make sure I reach 500).* *(Okay).* *(Wait, I'll write the lines now).* *(Okay).* *(Wait, I'll start).* *(Okay).* *(Wait, I'll write the lines).* *(Okay).* And last 3000 lines is: *(Wait, I'll write the lines).* *(Okay).* ``` ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.20.0
GiteaMirror added the bug label 2026-04-12 22:39:16 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#9764