[GH-ISSUE #14673] Service reliability degradation: High timeout rates and repeated failures on Ollama Cloud (2026-03-06) #56011

Open
opened 2026-04-29 10:08:28 -05:00 by GiteaMirror · 2 comments
Owner

Originally created by @unw1red on GitHub (Mar 6, 2026).
Original GitHub issue: https://github.com/ollama/ollama/issues/14673

Summary

Experiencing systematic service reliability issues with Ollama Cloud models on 2026-03-06, with multiple subagent timeouts and failures affecting OpenClaw operations.

Timeline

09:14-13:09 EST (4 hours)

  • Scout: AgentMail inbox check - 3x timeouts (09:17, pre-update, post-update 09:30+)
  • Ledger: Market opening prep - timeout at 09:17, retry succeeded 09:30
  • Ledger: Google Search Console analysis - 3x consecutive timeouts (12:51, 12:58, 13:02)
  • Smith: mcporter investigation - timeout at 13:00
  • Smith: Google Search Console escalation - timeout at 13:09

Root Cause Assessment

Earlier diagnostic confirmed Ollama Cloud API service overload as root cause of timeouts. Network connectivity to ollama.com confirmed healthy (0% packet loss, 12ms avg RTT).

Community Reports (via Reddit)

Verified additional widespread issues reported by community:

  • 29.7% failure rate on Qwen3.5 models (1 week ago, ongoing)
  • API routing errors (404s on model switching)
  • Tool calling broken (500 errors when tools enabled on cloud models)
  • Rate limiting hostile ($100/month users hit 4-day throttles after 5 days)
  • Support MIA (tickets ignored 2+ weeks, no incident communication)
  • Community exodus (users switching to local models, vLLM, alternatives)

See: https://www.reddit.com/r/ollama/ for multiple recent posts documenting these issues.

Impact

  • OpenClaw subagent reliability degraded to unacceptable levels
  • Automation workflows failing due to external service unavailability
  • Escalation to local-only models being considered as workaround

Request

  1. Acknowledge current service status
  2. Provide incident timeline and ETA for resolution
  3. Advise on rate limiting thresholds for cloud tier subscriptions
  4. Clarify Qwen3.5 stability issues and any recent updates affecting model loading

Thank you.

Originally created by @unw1red on GitHub (Mar 6, 2026). Original GitHub issue: https://github.com/ollama/ollama/issues/14673 ## Summary Experiencing systematic service reliability issues with Ollama Cloud models on 2026-03-06, with multiple subagent timeouts and failures affecting OpenClaw operations. ## Timeline **09:14-13:09 EST (4 hours)** - Scout: AgentMail inbox check - 3x timeouts (09:17, pre-update, post-update 09:30+) - Ledger: Market opening prep - timeout at 09:17, retry succeeded 09:30 - Ledger: Google Search Console analysis - 3x consecutive timeouts (12:51, 12:58, 13:02) - Smith: mcporter investigation - timeout at 13:00 - Smith: Google Search Console escalation - timeout at 13:09 ## Root Cause Assessment Earlier diagnostic confirmed Ollama Cloud API service overload as root cause of timeouts. Network connectivity to ollama.com confirmed healthy (0% packet loss, 12ms avg RTT). ## Community Reports (via Reddit) Verified additional widespread issues reported by community: - **29.7% failure rate** on Qwen3.5 models (1 week ago, ongoing) - **API routing errors** (404s on model switching) - **Tool calling broken** (500 errors when tools enabled on cloud models) - **Rate limiting hostile** ($100/month users hit 4-day throttles after 5 days) - **Support MIA** (tickets ignored 2+ weeks, no incident communication) - **Community exodus** (users switching to local models, vLLM, alternatives) See: https://www.reddit.com/r/ollama/ for multiple recent posts documenting these issues. ## Impact - OpenClaw subagent reliability degraded to unacceptable levels - Automation workflows failing due to external service unavailability - Escalation to local-only models being considered as workaround ## Request 1. Acknowledge current service status 2. Provide incident timeline and ETA for resolution 3. Advise on rate limiting thresholds for cloud tier subscriptions 4. Clarify Qwen3.5 stability issues and any recent updates affecting model loading Thank you.
GiteaMirror added the cloud label 2026-04-29 10:08:28 -05:00
Author
Owner

@mikronn2 commented on GitHub (Apr 5, 2026):

Additional Report: April 5, 2026 (One Month Later)

Experiencing the same pattern today — the issue appears to be ongoing.

Timeline (2026-04-05, ~5:03 PM EST)

5:03:36 PM - LLM request timed out (60+ seconds)
5:03:36 PM - Message ordering conflict on retry
5:03:37 PM - API error: "Assistant message must have either content or tool_calls, but not none."

Context

  • Running OpenClaw agent system (subagent architecture)
  • Model: glm-5:cloud via Ollama Cloud
  • Large write operation (~16KB file) may have triggered extended response time
  • Network connectivity to ollama.com confirmed healthy

Symptoms (Matching Original Report)

Symptom Observed
High timeout rates 60s timeout on inference
Message ordering conflicts "Message ordering conflict" on retry
Tool calling failures Malformed response (no content/tool_calls)
Cascade failures Multiple errors in cascade

Additional Context

Status monitors (downforai.com) show "Operational" because they check endpoint availability, not inference reliability under load. The problem appears to be capacity-related — endpoints respond, but model inference times out or returns malformed responses.

Questions for Ollama Team

  1. Is there a status page that tracks inference latency/reliability (not just endpoint uptime)?
  2. Are there recommended timeout values for cloud tier API calls?
  3. Is there visibility into when the service is under heavy load?

This matches the pattern from March 6 — intermittent reliability degradation under load, not a hard outage.


System: OpenClaw (agent orchestration), Model: glm-5:cloud, Time: 2026-04-05 ~21:03 UTC

<!-- gh-comment-id:4189557127 --> @mikronn2 commented on GitHub (Apr 5, 2026): ## Additional Report: April 5, 2026 (One Month Later) Experiencing the same pattern today — the issue appears to be ongoing. ### Timeline (2026-04-05, ~5:03 PM EST) ``` 5:03:36 PM - LLM request timed out (60+ seconds) 5:03:36 PM - Message ordering conflict on retry 5:03:37 PM - API error: "Assistant message must have either content or tool_calls, but not none." ``` ### Context - Running OpenClaw agent system (subagent architecture) - Model: glm-5:cloud via Ollama Cloud - Large write operation (~16KB file) may have triggered extended response time - Network connectivity to ollama.com confirmed healthy ### Symptoms (Matching Original Report) | Symptom | Observed | |---------|----------| | High timeout rates | ✅ 60s timeout on inference | | Message ordering conflicts | ✅ "Message ordering conflict" on retry | | Tool calling failures | ✅ Malformed response (no content/tool_calls) | | Cascade failures | ✅ Multiple errors in cascade | ### Additional Context Status monitors (downforai.com) show "Operational" because they check endpoint availability, not inference reliability under load. The problem appears to be **capacity-related** — endpoints respond, but model inference times out or returns malformed responses. ### Questions for Ollama Team 1. Is there a status page that tracks **inference latency/reliability** (not just endpoint uptime)? 2. Are there recommended timeout values for cloud tier API calls? 3. Is there visibility into when the service is under heavy load? This matches the pattern from March 6 — intermittent reliability degradation under load, not a hard outage. --- *System: OpenClaw (agent orchestration), Model: glm-5:cloud, Time: 2026-04-05 ~21:03 UTC*
Author
Owner

@backamblock commented on GitHub (Apr 8, 2026):

same for me. completely unreliable for paid users. i do understand its very easy to create thousands of fake free accounts and put them in9router, thats probably how they get trashed.. better make the free tier less attractive and prioritize max users..

<!-- gh-comment-id:4208903866 --> @backamblock commented on GitHub (Apr 8, 2026): same for me. completely unreliable for paid users. i do understand its very easy to create thousands of fake free accounts and put them in9router, thats probably how they get trashed.. better make the free tier less attractive and prioritize max users..
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/ollama#56011