[GH-ISSUE #16046] Regression: Severe queue delay + tool loop hang in Ollama v0.23.2 (MLX / macOS) #87902

Open
opened 2026-05-10 06:33:58 -05:00 by GiteaMirror · 1 comment
Owner

Originally created by @charlesdrakon-cmyk on GitHub (May 8, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/16046

What is the issue?

Summary

Ollama v0.23.2 introduces two major regressions on macOS (Apple Silicon, MLX backend):

Severe queue delay — up to 2–3 minutes between tasks
Tool loop hang — model enters repeated search_web / fetch_url loop with no completion

These issues were not present in v0.23.1 and render v0.23.2 unsuitable for production use.

Environment
OS: macOS (Apple Silicon)
Hardware: Apple system (MLX backend in use)
Ollama version: 0.23.2
Previous working version: 0.23.1
Interface: Open WebUI
Model: qwen3.6:35b-a3b-mlx-bf16
Issue 1 — Queue Delay Regression
Behavior
After completing a request, the next request is delayed significantly
Observed delay: 2–3 minutes between tasks
Occurs even with:
no concurrent jobs
idle system
sufficient RAM (no memory pressure)
Expected Behavior
Immediate or near-immediate task start (as in v0.23.1)
Actual Behavior
Requests sit idle before execution begins
Appears to be queueing or scheduling regression
Issue 2 — Tool Loop Hang (search_web / fetch_url)
Behavior
Model enters repeated tool calls:
search_web
fetch_url
No final response is produced
Loop continues indefinitely
Observations
Occurs without any other jobs pending
Seen in Open WebUI tool activity panel:
multiple search calls
growing source list (e.g., 10+ sources)
Requires manual interruption
Failure Rate
Observed in 2 out of 5 prompts (~40%)
Expected Behavior
Model completes tool use and returns a final answer
Actual Behavior
Infinite or long-running tool loop with no completion
Reproduction
Queue Delay

Run a prompt using:

ollama run qwen3.6:35b-a3b-mlx-bf16
After completion, immediately submit another prompt
Observe delay before execution begins (up to several minutes)
Tool Loop
Use Open WebUI with tool-enabled model
Submit a prompt requiring web lookup
Observe repeated search_web / fetch_url calls
No final response returned
Impact
Breaks multi-user and sequential workflows
Makes system appear unresponsive
Requires manual intervention to stop tool loops
Not suitable for production environments
Additional Notes
System shows no resource constraints (RAM healthy, minimal swap)
No concurrency required to reproduce
Appears related to:
scheduling / queue handling
tool execution loop control
Request

Please investigate:

task scheduling / queue handling changes in 0.23.2
tool execution termination conditions
interaction between MLX backend and tool loop handling

Happy to provide additional logs or run targeted tests if needed.

Relevant log output


OS

No response

GPU

No response

CPU

No response

Ollama version

No response

Originally created by @charlesdrakon-cmyk on GitHub (May 8, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/16046 ### What is the issue? Summary Ollama v0.23.2 introduces two major regressions on macOS (Apple Silicon, MLX backend): Severe queue delay — up to 2–3 minutes between tasks Tool loop hang — model enters repeated search_web / fetch_url loop with no completion These issues were not present in v0.23.1 and render v0.23.2 unsuitable for production use. Environment OS: macOS (Apple Silicon) Hardware: Apple system (MLX backend in use) Ollama version: 0.23.2 Previous working version: 0.23.1 Interface: Open WebUI Model: qwen3.6:35b-a3b-mlx-bf16 Issue 1 — Queue Delay Regression Behavior After completing a request, the next request is delayed significantly Observed delay: 2–3 minutes between tasks Occurs even with: no concurrent jobs idle system sufficient RAM (no memory pressure) Expected Behavior Immediate or near-immediate task start (as in v0.23.1) Actual Behavior Requests sit idle before execution begins Appears to be queueing or scheduling regression Issue 2 — Tool Loop Hang (search_web / fetch_url) Behavior Model enters repeated tool calls: search_web fetch_url No final response is produced Loop continues indefinitely Observations Occurs without any other jobs pending Seen in Open WebUI tool activity panel: multiple search calls growing source list (e.g., 10+ sources) Requires manual interruption Failure Rate Observed in 2 out of 5 prompts (~40%) Expected Behavior Model completes tool use and returns a final answer Actual Behavior Infinite or long-running tool loop with no completion Reproduction Queue Delay Run a prompt using: ollama run qwen3.6:35b-a3b-mlx-bf16 After completion, immediately submit another prompt Observe delay before execution begins (up to several minutes) Tool Loop Use Open WebUI with tool-enabled model Submit a prompt requiring web lookup Observe repeated search_web / fetch_url calls No final response returned Impact Breaks multi-user and sequential workflows Makes system appear unresponsive Requires manual intervention to stop tool loops Not suitable for production environments Additional Notes System shows no resource constraints (RAM healthy, minimal swap) No concurrency required to reproduce Appears related to: scheduling / queue handling tool execution loop control Request Please investigate: task scheduling / queue handling changes in 0.23.2 tool execution termination conditions interaction between MLX backend and tool loop handling Happy to provide additional logs or run targeted tests if needed. ### Relevant log output ```shell ``` ### OS _No response_ ### GPU _No response_ ### CPU _No response_ ### Ollama version _No response_
GiteaMirror added the bug label 2026-05-10 06:33:58 -05:00
Author
Owner

@charlesdrakon-cmyk commented on GitHub (May 8, 2026):

Update: I rebooted the macOS host and retested. Both issues persist after reboot:

Severe queue delay remains present
Web-search/tool loop hang still occurs

This does not appear to be caused by stale daemon state or a transient service condition.

<!-- gh-comment-id:4407967196 --> @charlesdrakon-cmyk commented on GitHub (May 8, 2026): Update: I rebooted the macOS host and retested. Both issues persist after reboot: Severe queue delay remains present Web-search/tool loop hang still occurs This does not appear to be caused by stale daemon state or a transient service condition.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#87902